1995/6 Research Journal
Image, Speech and Intelligent Systems
N. D. Matthews, P. E. An
and
C. J. Harris
The introduction of new technologies such as autonomous intelligent cruise control or collision avoidance schemes to road vehicles necessitates a high degree of robustness and reliability. Whilst very accurate range estimates may be recovered using conventional sensors, e.g. millimetric radar, these typically suffer from both low bearing resolution and potential ambiguities through, for example, false alarms.
This work details a novel two-stage vehicle detection and recognition algorithm which combines an image processing area of interest (AOI) designator to cue a secondary recognition process implemented using principal component analysis (PCA) as input to a Multi-Layered Perceptron (MLP) classifier. The combination of an initial detection phase, followed by a recognition process has allowed the classifier design to be greatly simplified. In turn the classifier performance has allowed some of the image processing assumptions to be relaxed, whilst maintaining a high signal to noise ratio (SNR). Both the image processing system and MLP classifier have been designed for real-time implementation and data-fusion with other information sources such as a range/range rate radar.
Road-vehicles, especially cars and to a lesser extent lorries, comprise a large number of horizontal structures, particularly when viewed approximately from the rear, e.g. rear-window, boot, bumper (Figure 2). Whilst, there are obviously other potential sources of horizontal structure, both natural, e.g. horizon, and man-made, e.g.\ road-signs, it is generally possible to mask areas within the image where these may occur and it is physically impossible for vehicles to be present, such as above the horizon. Practical experience suggests that clusters of horizontal edges are a significant cue for areas of interest (AOI's) which potentially contain vehicles.
In practice man-made objects, e.g. vehicles, exhibit a stronger edge response than ``natural'' objects. Hence, an appropriate technique is to consider the total horizontal edge response over some AOI, rather than using an edge threshold. However, it is extremely difficult to define an appropriate AOI, thus an additional assumption is made that vehicles do not occlude each other greatly, i.e. overlap image columns significantly (Figure 2); this assumption is often implicit in many symmetry based techniques.
Applying the horizontal overlap assumption each image column may be considered as a potential AOI to sum over. Firstly, the horizontal edge response in each image column is summed and smoothed with a triangular filter (Figure 1)
Figure 1: Filtered edge response column sums
-- for
images an appropriate support is found to be 10
pixels.
Potential vehicle locations are then obtained from locally maximal
peaks which are extracted from the smoothed column responses in decreasing
order, with an additional constraint to discard maxima which are ``too
close'' to a previously extracted, i.e. higher, peak -- for
images maxima within 25 pixels of a previously extracted peak are
ignored.
Figure 2: Detected candidate vehicle columns
Although this technique has a number of limitations, such as the triangular filter support width and the proximity constraint which have implications for the potential size of recovered AOI, it is invariant to camera pitching, for example when changing gear.
The vehicle's horizontal extent is determined in a similar manner from an edge following technique applied to the horizontal edge response image. Edges are linked left and right from the candidate vehicle horizontal position on each row until the horizontal edge response drops below a given threshold -- a threshold of 2 has been found to be effective. The vehicle width is determined by the leftmost and rightmost linked edge response pixels (Figure 3).
Figure 3: Detected candidate vehicle widths
Although this technique will overestimate the vehicle width if a large horizontal features crosses the candidate vehicle location, in practice it tends to underestimate the true horizontal vehicle extent since only horizontal edges are considered, whereas a vehicle boundary is usually at a vertical edge.
An important vehicle cue in daylight imagery is
the shadow under a vehicle.
Consider a small area designated as a road ``template''
at the bottom of the original input grey level image whose
pixel grey levels are assumed to obey
a Gaussian distribution, together with some extraneous pixels belonging
to e.g. road markings.
The mean
of the underlying Gaussian
distribution is equivalenced to the mode of the grey level patch --
the mean of a purely Gaussian distribution is by definition equivalent to its mode.
The standard deviation
of the underlying normal distribution is
given by Equation (1)
where x is the position which has a value equal to half of the response observed at
modal value. Pixels in the original grey level image in the range
are classed as due to a potential under-vehicle shadow.
Areas corresponding to under-vehicle shadows are located by considering each potential vehicle location in turn. Pixels within a given candidate image strip are considered row by row, starting at the bottom of the image, i.e. closest. If more pixels on any given row within a given candidate vehicle's image strip are considered to be shadow than non-shadow, then the row is interpreted as part of an under-vehicle shadow and the vertical location noted as the bottom of the candidate vehicle (Figure 4).
Figure 4: Detected under-vehicle shadows
The candidate vehicle location is discarded if no location can be obtained to satisfy the under-vehicle shadow criterion.
Obviously, to extract an AOI for subsequent classification an estimate is required for the location of the top of the vehicle. As there are no consistent cues associated with a vehicle roof, a heuristic is applied that rears of cars are approximately square, subject to digitisation. Hence, the vehicle height is equivalenced to its width and thus the top of the vehicle may be estimated from the vehicle width and the location of its under-vehicle shadow (Figure 5).
Figure 5: Detected candidate vehicle AOI's
Whilst the vehicle vertical location derived from the under-vehicle shadow is generally temporally stable, it may fail when road characteristics differ greatly from those in the template patch, for example due to shadows cast by overtaking vehicles. Therefore a second vehicle vertical location algorithm has been developed, based on horizontal edge clustering.
A vehicle-sized patch may be defined by applying the same aspect ratio heuristic used to derive the vehicle height. Within a given candidate vehicle strip (Figure 3) the vehicle vertical location is given by the maximum horizontal edge response summed over all vehicle-size patches (figure 6).
Figure 6: Detected candidate vehicle AOI's
Although this technique generally finds an area close to a vehicle it is not as temporally stable as the under-vehicle shadow. Since each image strip will by definition generate a maximum vertical patch position there is an explicit requirement for the car/road classifier.
The candidate AOI's are dilated slightly to ensure that any
vehicle is completely extracted. The AOI
is then scaled to a constant size to maintain scale
invariance for the classifier. The scaled AOI size is
pixels to reduce the dimensionality of the classification
problem; any further reduction is likely to ``oversmooth'' car
edges and other useful features.
AOI's extracted using the under-vehicle shadow
are shown in Figure 7 and
AOI's extracted using horizontal edge clustering
are shown in Figure 8.
Figure 7: AOI's for MLP classification
Figure 8: AOI's for MLP classification -- 2
Pattern classification tasks for many image applications are generally considered difficult problems because of the associated high dimensional pattern space and poor signal-to-noise ratio (SNR) due to redundancy of pattern components. Feature extraction is an essential procedure to reduce the problem dimensionality whilst retaining the required SNR. In this combined detection/recognition system a high degree of data reduction has already been performed by the initial image processing stage. With this information, the AOI considered by a classifier may be kept to a reasonable size, greatly improving the SNR and enabling accurate pattern classification.
Since the statistical characteristics of many road patterns tend to be
stationary, i.e. a set of localised eigenvectors defined in a
smaller dimension can be used to reconstruct the entire pattern by
forming an union of local patches, a local principal
components analysis (PCA) is generally more robust when
reconstructing a pattern which is not in S. This has been verified
in this application, where M = 90, L = 25, N = 5. This is an
important property for car detection where real-time classification
at (near) frame rate is required. An example of such localised
eigenvectors or eigenmasks (
pixels) corresponding to the
ten largest eigenvalues derived from ninety road/car images (Figure
9).
Figure 9: Eigenmasks corresponding to the ten largest eigenvalues (in descending, raster order)
Given a set of properly extracted features, criteria for selecting proper classifiers are commonly based on Bayesian Maximum A Posteriori (MAP) or Maximum Likelihood (ML) principles so that the average error probability is minimised. A nonlinear interpolative model, Multi-layered Perceptron (MLP), is frequently used to approximate the posterior class distribution because of its nonlinear modelling capability. Although the MLP is generally inappropriate for on-line control applications because of its slow adaptation process and non-unique solution minimum, it generalises efficiently in a high dimensional space by means of globally extent basis functions, which is essential for most classification applications. The MLP is chosen as the classifier for this study due to the high dimensionality of the input space.
Car/road identification is performed by the MLP using
features extracted by local PCA from a given scaled AOI (fig.\
7 and 8).
The scaled
pixel AOI is
partitioned into
subpatterns, each of size
pixels (L=25). Local PCA is performed using the eigenmasks
(Figure 9) on the individual subpatterns, and the
principal components associated with the five most dominant
eigenvalues are computed (N = 5) for each of the subpatterns. The
set of principal components forms a transformed pattern (80
dimension). These transformed patterns are then used to adapt the MLP
network via backpropagation to estimate the car/road decision
boundary.
To avoid the common problem of overfitting/parameter
drift due to modelling error and measurement noise, the pattern
classifier is only trained off-line, and hence its rate of convergence
is less critical (this off-line condition obviously requires the
training patterns to be truly representative).
By comparing the performance among different pattern distributions, an optimal set of network parameters can be chosen at the point where the classification performance based on the validation set begins to deteriorate (10,000 cycles for 1 hidden node; 9,000 cycles for both 3 and 5 hidden nodes). The mis-classification of car and road patterns was also evaluated to ensure that the classifier was not biased toward any particular class (table 1). Further details are given in [1].
Table 1: MLP classification of training data.
These results indicate that the bias is relatively insignificant (or
uniform) for an MLP classifier of 1, 3 or 5 hidden nodes and
that classification performance is acceptable, i.e.\
within
error, provided that the image characteristics
are similar to those in the training patterns.
The integrated system generates a number of candidate AOI's (Figures 7 and 8), potentially two per vertical strip, one due to recovered under-vehicle shadow position and one due to maximum horizontal edge response location, and invokes the MLP classifier on all candidate AOI's. If more than one potential vehicle is detected in a given vertical strip then the AOI with the highest classifier confidence is accepted (figure 12).
For the example image the MLP classifications are shown in figures 10 and 11.
Figure 10: AOI's classified as road (76.2%, 100% and 100%)
Figure 11: AOI's classified as car (27.07%, 40.58%, 88.78%, 93.72%, 94.89%, 95.40% and 96.63%)
A background and lorry tail-gate AOI are categorised as ``car'' since they are significantly ``not-road'', due to their high degree of internal structure, although the classifier has a low (27.07%) confidence in the classification of the background area.
Figure 12: Detected and classified vehicle targets
The hybrid detection and recognition system has proved to be remarkably successful. The use of the image processing AOI detection enables multiple potential vehicles in a single image to be classified and rejects most extraneous background information from the classification process. Additionally, the use of AOI's has allowed the classifier to be both scale invariant and independent of the AOI position within the input image. The success of the classification system has allowed some of the image processing constraints to be relaxed and hence has enabled an extra degree of robustness to be incorporated into the combined system.
The authors thank Lucas Automotive, Ford-Jaguar and Pilkington for their support of this PROMETHEUS research project.
Click here to download a PostScript (.ps) copy of the paper.
Click here to download an Acrobat (.pdf) version of the paper.
Click here to request a copy of the Research Journal on CD-ROM.
Copyright (c) 1996 University of Southampton, June 1996.