Visual Attention Detection

Subspace estimation and analysis

We describe a new framework to extract visual attention regions in images using robust subspace estimation and analysis techniques. We use simple features like hue and intensity endowed with scale adaptivity in order to represent smooth and textured areas in an image. A polar transformation maps homogeneity in the features into a linear subspace that also encodes spatial information of a region. A new subspace estimation algorithm based on the Generalized Principal Component Analysis (GPCA) is proposed to estimate multiple linear subspaces. Sensitivity to outliers is achieved by weighted least squares estimate of the subspaces in which weights calculated from the distribution of K nearest neighbors are assigned to data points. Iterative refinement of the weights is proposed to handle the issue of estimation bias when the number of data points in each subspace is very different. A new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and spatial geometric properties of the regions. Compared with existing visual attention detection methods, the proposed method directly measures global visual attention at the region level as opposed to pixel level.

(a) Saliency map from Itti et al. (PAMI98) and (b) corresponding bounding box. (c) Attention region using subspace analysis and (d) corresponding bounding box.

Y. Hu, D. Rajan and L. T. Chia, Detection of visual attention regions in images using robust subspace analysis, Journal of Visual Communications and Image Representation, vol. 19, no. 3, pp.199-216, 2008.

Random walks on graphs

The problem of salient region detection in images is formulated as Markov random walks performed on images represented as graphs. While the global properties of the image are extracted from the random walk on a complete graph, the local properties are extracted from a k-regular graph.The most salient node is selected as the one which is globally most isolated but falls on a compact object. The equilibrium hitting times of the ergodic Markov chain holds the key for identifying the most salient node. The background nodes which are farthest from the most salient node are also identified based on the hitting times calculated from the random walk. Finally, a seeded salient region identification mechanism is developed to identify the salient parts of the image. The robustness of the proposed algorithm is objectively demonstrated with experiments carried out on a large image database annotated with ’ground-truth’ salient regions.

Left: Original image; Middle: Most salient node marked by the red dot; Right: Final binary saliency map

V. Gopalakrishnan, Y. Hu and D. Rajan, Random walks on graphs for salient object detection in images, IEEE Trans. on Image Processing (accepted).

V. Gopalakrishnan, Y. Hu and D. Rajan, Random walks on graphs to model saliency in images, Computer Vision and Pattern Recognition (CVPR), Miami, 2009.

Attention-from-Motion

We introduce the notion of attention-from-motion in which the objective is to identify, from an image sequence, only those object in motions that capture visual attention (VA). Following the important concept in film production, viz, the tracking shot, we define the attention object in motion (AOM) as those that are tracked by the camera. Three components are proposed to form an attention-from-motion framework: (i) a new factorization form of the measurement matrix to describe dynamic geometry of moving object observed by moving camera; (ii) determination of single AOM based on the analysis of certain structure on the motion matrix; (iii) an iterative framework for detecting multiple AOMs. The proposed analysis of structure from factorization enables the detection of AOMs even when only partial data is available due to occlusion and over-segmentation. Without recovering the motion of either object or camera, the proposed method can detect AOM robustly from any combination of camera motion and object motion and even for degenerate motion.


Two video sequences - Top row: All feature points (in blue); Bottom row: Detected Attention Objects in Motion (AOM)
Y. Hu, D. Rajan and L. T. Chia, Attention from Motion: A factorization approach for detecting attention objects in motion, Computer Vision and Image Understanding, vol. 113, no. 3, pp.319-331, March 2009.

Feature combination via context suppression

Visual attention is obtained through determination of contrasts of low level features or attention cues like intensity, color etc. We propose a new texture attention cue that is shown to be more effective for images where the salient object regions and background have similar visual characteristics. Current visual attention models do not consider local contextual information to highlight attention regions. We also propose a feature combination strategy by suppressing saliency based on context information that is effective in determining the true attention region. We compare our approach with other visual attention models using a novel Average Discrimination Ratio measure.

Left: Context suppression process; Right: Original image, Saliency map using proposed method and Itti's method.

Y. Hu, D. Rajan and L. T. Chia, Adaptive Local Context Suppression of Multiple Cues for Salient Visual
Attention Detection, ICME, 2005.