The Feature-Based Approach

The method of finding image displacements which is easiest tounderstand is the feature-based approach. This finds features (forexample, image edges, corners, and other structures well localized intwo dimensions) and tracks these as they move from frame to frame.This involves two stages. Firstly, the features are found in two ormore consecutive images. The act of feature extraction, if done well,will both reduce the amount of information to be processed (and soreduce the workload), and also go some way towards obtaining a higherlevel of understanding of the scene, by its very nature of eliminatingthe unimportant parts. Secondly, these features are matched betweenthe frames. In the simplest and commonest case, two frames are usedand two sets of features are matched to give a single set of motionvectors. Alternatively, the features in one frame can be used as seedpoints at which to use other methods (for example, gradient-basedmethods -- see the following section) to find the flow.

The two stages of feature-based flow estimation each have their ownproblems. The feature detection stage requires features to be locatedaccurately and reliably. This has proved to be a non-trivial task, andmuch work has been carried out on feature detectors; see the reviewsin Sections 3 and 4. If a human is shown,instead of the original image sequence, a sequence of the detectedfeatures (drawn onto an empty image), then a smoothly moving setof features should be observable, with little feature flicker. Thefeature matching stage has the well known correspondence problem ofambiguous potential matches occurring; unless image displacement isknown to be smaller than the distance between features, some methodmust be found to choose between different potential matches.

Finding optic flow using edges has the advantage (over using twodimensional features) that edge detection theory is well advanced,compared with that of two dimensional feature detection. It has theadvantage over approaches which attempt to find flow everywhere in theimage, such as the method developed by Horn and Schunk in[52] (see Section 1.2), that these other methodsare poorly conditioned except at image edges; thus it is sensible toonly find flow at the edges in the first place.

Most edge detection methods which have been used in the past forfinding optic flow are related to either finding maxima in the firstimage derivative, see [21], or finding zero-crossings in theLaplacian of a Gaussian of the image, see [59]. The majorproblem with using edges is that unless assumptions about the natureof the flow are made, and further processing (after the matching) isperformed, only the component of flow perpendicular to each edgeelement can be found. Unfortunately, the assumptions made willinvariably lead to inaccuracies in the estimated flow, particularly atmotion boundaries, etc.

The first major work taking this approach was that of Hildreth; see[51]. The edges are found using a Laplacian(

) of a Gaussian edge detector, and the image motion isfound at these edges by using the brightness change constraintequation (see the following section). Various additional constraintsnecessary to recover the motion parallel to the edge direction arediscussed. The constraints all make the basic assumption that theimage flow is smoothly varying across the image. Thus each minimizessome measure of flow variation. The measure chosen is the variation invelocity along the edge contours. If this is completely minimized, thedata is largely ignored, so a balancing term is added to theminimization equation which gives a measure of the fit of theestimated flow to the data. Thus the following expression isminimized;

where

is the estimated optic flow,

isthe unit vector perpendicular to the edge,

is the measuredcomponent of flow perpendicular to the edge, s is the ``arclength''parameter and

is the weighting factor which determines therelative importance of the two terms. The first term expresses thevariation in estimated flow, and the second term expresses the qualityof the fit of the estimated flow to the measured edge displacements.The expression is minimized using the conjugate gradient approach,also described in [51]. The overall algorithm isintrinsically sequential, and is fairly computationally expensive.

The use of edges not just in space but in space-time was introduced byBuxton and Buxton; the theory is developed in [18], andtesting is reported in [17]. Here they use thed'Alembertian (

) of a spatiotemporal Gaussian, instead of the Laplacian. Thevelocity parameter u and the scale of the Gaussian filter need to beapplied at different channels to search for different motionpossibilities. Although noise suppression was claimed to be greaterthan that achieved using Hildreth's approach due to the extendeddimensionality of the filter, a corresponding increase incomputational expense also resulted. In [19] this work isextended; the negative metric is replaced with the positive one (i.e.

) toreduce noise sensitivity and removing ``curious'' cross-over effects(mentioned in [18], but never utilized). In[19], instead of using Hildreth's velocity smoothnessconstraint, an object's motion and position parameters are estimateddirectly from the measurements of the normal components of flow.However, this ``3D solution to the aperture problem'', solved with aleast squares fit to the normal flow, assumes that the objects inquestion are planar, and that the image is already segmented intodifferently moving objects.

In [22] Castelow et. al. use Canny edgels (edgeelements found using the Canny edge detector) to find optic flow atedges. In this method, developed originally by Scott (see[90]), the flow parallel to the edges is recovered byiterative relaxation of the edgel matching surface; the edgels nearthe high curvature parts of the contours can have both components offlow found from the data, so these cues to the full flow arepropagated along the contour into less highly curved sections byrelaxation. Thus local support of optic flow is used to improve themotion estimate iteratively. The results presented did not appear tobe as good as those shown in [51] and [19],although the method has been successfully used by Murray et. al. toreconstruct three dimensional objects in structure from motion work;see[70] and [68]. This method of finding optic flowhas also been used to track cataracts, in [45].

In [44], Gong and Brady found the full flow at points of highcurvature on zero-crossing contours using their ``curve motionconstraint equation''. The flow is then propagated along the contoursby using a wave/diffusion process. This method was developed furtherby Wang et. al. in [119]. Edges are found using Canny's edgedetector, and are then segmented and grouped. ``Corners'' in the edgesare found by fitting local B-splines, and the edges at these pointsare matched from one frame to the next (using a cross-correlationmethod to achieve correct matches) to give the full flow. This flow isthen propagated along the edges using the wave/diffusion process. Thealgorithm was implemented in parallel, and so achieved an impressivespeed compared with the original algorithm, and compared withHildreth's approach. The results obtained were good (for an edge-basedmethod), although the full flow was not always correctly estimatedalong straight portions of edges. The use of high curvature parts ofedges as two dimensional features can lead to the rejection offeatures which are well localized in two dimensions (and thereforeuseful for recovery of optic flow) but are not placed on well definededges. ASSET [103,102,100] finds twodimensional features directly from the image, and not via anintermediate edge finding stage, and therefore avoids this problem.

Some of the ways in which edges have been used to find optic flow fordifferent purposes are now given.

In [30] Deriche and Faugeras track straight edges intwo dimensions using Kalman filters for each edge. The edges are foundusing the Deriche filter (see [29]) and are represented bytheir length, orientation and position. The resulting matched andtracked line segments are fed into a structure from motion algorithm;however, this approach is limited to fairly structured environments,as it only uses straight lines.

In [12] Bolles et. al. use spatiotemporal edge filters asdeveloped by Buxton and Buxton (see earlier) to track features insingle raster lines over time. The tracks of features within theseraster lines create ``epipolar-plane images''; these containinformation about the three dimensional structure of the world. Theassumption is made that the images are taken in quick successioncompared with the motion of the features, so that there is nocorrespondence problem. In [112] Tomasi and Kanade trackedges found using the Laplacian of a Gaussian in an approach verysimilar to that of Bolles. From the tracked edges the shape of anobject is computed ``without first computing depth''. Very accurateresults are obtained, using highly constrained camera motion.

In [89] Sandini and Tistarelli use Laplacian of a Gaussianedge contours, tracked over time by a camera moving with constrainedmotion. The tracked contours are used to construct the threedimensional structure of objects in view. In [8] Blakeand Cipolla (again, given a deliberate, constrained and known cameramotion) use the deformation of edge contours to recover surfacecurvature in the world. In [36] Faugeras discusses theproblems of tracking image edges associated with non-rigid bodies. In[24] Chen and Huang match straight line segments in stereosequences for object motion estimation.

Optic flow methods based on the detection of two dimensional features(which shall be known as corners for the purpose of this discussion)have the advantage that the full optic flow is known at everymeasurement position. The (related) disadvantage is that only a sparseset of measurements is available, compared with methods which give ameasurement at every image position. However, if the features arefound appropriately then the maximum possible number of differentoptic flow measurements in the image should be recoverable, withoutthe degradation in flow estimate quality introduced by the variousassumptions and constraints required to recover the full flow usingother methods. Note that arguments have often been made thatgradient-based (and related) methods have their flow estimationsachieving the most well conditioned results near edges. In the sameway, flow estimates found at edges are most accurate near corners. Inthe ASSET [103,102,100] system, corners areused for finding optic flow, and image segmentation is based on this;the segmentation results are then further refined by using``intra-frame'' image edges. This appears to be a sensible use of theimage data, since the flow is found at those places where this can bemost accurately achieved, and remaining (instantaneous) imageinformation about possible object boundaries can be used to makeadjustments to the interpretation made using the flow. Anotheradvantage of using corners is that the early data reduction achievedshould allow computationally efficient algorithms to be developed.

Some of the published discussions on optic flow found from movingcorners and some of the ways in which this information has beenrecovered and used are now described.

Much work on structure from motion has assumed that the flow field isonly recovered at relatively sparse image positions. The work haslargely consisted of theory relating how much scene information can begiven from different numbers of views of varying numbers of ``worldpoints''. For example, in [38] Faugeras and Maybankshow that in general the least number of point matches (found usingtwo viewpoints) which allows only a finite number of interpretationsof the camera positions (relative to the world) is five. With thisnumber of matches, there are a maximum of ten possible interpretationsof the camera's motion.

With regard to the problem of matching corners from one frame to thenext, early work was undertaken by Barnard and Thompson; see[4]. Here the Moravec interest operator is used to findpoints which are well localized in two dimensions in consecutiveimages. Next, the (computationally expensive) method of iterativerelaxation of the matching surface is used to find the optimal set ofpoint matches. Each point in the first frame is given a probability ofmatching to each point in the second; the initial probability isinversely proportional to the sum of the squares of the differencebetween the brightness of each point within a small patch centred onthe ``corner'' in the first frame and a small patch centred on the``corner'' in the second. Then relaxation is applied to force the flowfield to vary smoothly. (This matching validation approach could beoverly restrictive in certain situations, where flow discontinuitiesare important sources of information.) In [111] thismethod is used to successfully detect moving objects as seen by amoving camera; this work is discussed later.

In [23] Charnley and Blissett describe the vision systemDROID. DROID uses two dimensional features found with the Plesseyfeature detector (see Section 4), and tracks these overmany frames in three dimensions using a Kalman filter for eachfeature. Time-symmetric matching is performed on the basis of featureattributes, using (where available) predicted feature positions fromthe filters. The tracked features are used to construct a threedimensional surface representing the world. The constraint is appliedthat the world must contain no moving objects.

In [93] L.S. Shapiro et. al. describe a corner trackerusing the Wang corner detector (see [118]). Small inter-framemotion is assumed. Corners are matched using correlation of smallpatches. Where no matches are found, motion is predicted, using againthe correlation of small patches, in a manner very similar to themethod of Lawton (see the following paragraph). Tracking is achievedover many frames using simple two dimensional filters which modelimage velocity and (if required) image acceleration.

In [56] Lawton uses optic flow to recover world structureassuming a static environment and translational camera motion. Imageflow is found by taking corners from one frame only, and usingcorrelation of small patches centred at the features with patchesplaced at all possible positions in the second frame. Corners arefound by using an approach similar to that of Moravec, (seeSection 4) but are only looked for along Laplacian ofGaussian edges. The flow is found to sub-pixel accuracy by usinginter-pixel interpolation in both frames, once the approximatedisplacement has been found. Again, cross-correlation is used forthis. In [1] Adiv uses Lawton's method to find the structureand motion of independently moving planar objects; the planar spatialregions are later fused into multi-facet objects.

In [95] Sinclair et. al. use the motion of corners tofind three dimensional planes in the world (via projective invariants)and also the direction of the camera's motion. Burger and Bhanu, in[14] use the motion of ``endpoints and corners of lines aswell as centers of small closed regions'' to find the direction oftranslation of the camera. Work on image segmentation using the motionof two dimensional features includes [10],[11], [120] and [28].

Some work has been carried out which attempts to integrate the motionof corners and edges. In [37] Faugeras et. al. comparerecovered three dimensional structure from the motion of points andedges theoretically and experimentally. In [107] Stephensand Harris used edges and corners to recover the three dimensionalstructure of a static environment (in a project related to the DROIDsystem). They integrated the tracked features with edges in an attemptto provide wire-frame models of static objects in the world. Thisintegration proved difficult, and was dropped in later versions ofDROID, where edges were not used.

The details of how ASSET recovers a useful image flow field from setsof two dimensional features are given in[103,102,100], where some justification ofthe approach taken is also given.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。