(c) University of Southampton 1994
Contents
In this paper we describe a computer based system for classifying archaeological objects. The system has two parts. In the first part, image analysis is used to extract representations of the artefacts from digitised images obtained using a high resolution scanner on photographs of the artefacts. The representations form the basis of the classification, and in the second part, statistical techniques are used to cluster the representations into classes. The aim is to achieve a classification which is similar to the classifications obtained by a skilled archaeologist.
When using the GHT for establishing if a target shape, extracted from
image 1 say, is present in another image, image 2, an accumulator matrix
is used to accumulate evidence for the possible centroid position for
the shape in image 2. The location of the highest peak in the
accumulator gives the centroid of the most likely match, and the peak height
relative to the number of edge segments in the shape from image 1 gives
a measure of the similarity of the match. Thus, we can define a
similarity measure
, for the similarity of image 1 with image 2.
For a perfect match, each edge point in image 1 will produce one increment in the peak cell of the accumulator, making the fraction in Equation 1 equal to 1. Less perfect matches give smaller values. (From our experience, values greater than 0.4 indicate good matches).
It should be noted that, unlike Euclidean distance measures, the
similarity measure is, in general, asymmetrical, ie
.
The measure indicates the extent to which the shape in image 1 is found
in image 2, which, with the exception of a perfect match, may well
differ from the extent to which the shape in 2 is found in 1.
In order to develop a representation of the collection of artefacts
to be classified, for each artefact,
, we calculated the set of
similarities
where
was the total number of artefacts in the collection. This set
of similarities was then treated as the set of observations
characterising image,
, within the set. It may be thought of as an n
dimensional feature vector representing artefact
. As a final step
in the extraction of the artefact representation, the dimensionality of
the feature space was reduced by a principal component analysis of the
data matrix.
A matrix of Euclidean distances between the objects in variable space is calculated from the raw similarity data described above. The unsquared distances are then used with the Group Average method to produce the clustering described below.
A scree plot is made to examine the cluster fusions produced by the method. Marked 'elbows' in the plot indicate where a fusion joins two relatively dissimilar clusters, and hence indicates that those clusters are likely to have some validity.
The classification generated by the computer system was compared to two human classifications of the same pots. The first of these was made by an archaeologist on the basis of the pots' profiles (SJ Shennan - the 'SJS' classification), the second is that used by the wholesaler (the 'P&P' classification).
In the first instance, we tried to classify the pots within the two major categories separately.
Code Pot file Cluster SJS classification P&P classification 1 pp231 A 2 826 3 pp235 A 2 826 8 pp310 A 2 826 6 pp305 A 2 826 11 pp315 A 2 826 2 pp233 B 1 826 7 pp308 B 1 826 12 pp318 B 1 826 13 pp320 B 1 826 9 pp311 B 1 826 5 pp304 B 1 826 10 pp313 B 1 826 4 pp302 B 1 826 14 pp321 C 4 826old 15 pp323 C 4 826old 16 pp326 C 3 826old 17 pp327 C 3 826old 18 pp330 D 5 836 19 pp331 D 5 836 20 pp334 D 5 836 21 pp336 D 5 836
The clusters agree very well with the two manual classifications. Compared to the SJS classification, cluster A was equivalent to group 2, cluster B to group 1, cluster C to groups 3 and 4, and cluster D to group 5. It is interesting to note that although cluster C combines two of Shennan's groups, those groups are resolved by the clustering algorithm at a higher resolution. This means that the difference between these two groups is smaller than between the other groups, but the human classification uses different levels of resolution simultaneously.
The P&P classification ostensibly divides these pots into two types. Type 836 coincides with cluster D with the other three clusters all being of type 826. However, further enquiry established that pots within certain types were different because they came from different batches (i.e. different firings). Interestingly, the most recent year's pots had come from two batches which corresponded exactly with clusters A and B. The two batches argument was supported by the fact that pots in cluster A have a white surface colour, whereas those in cluster B are much redder, although the automated classification method does not yet take colour into account. The pots referred to as 'the old style', those from previous years, were those in cluster C. (the fact that the old style could be split into two types, as Shennan did, was attributed to their 'being made on different days' by the same potter)
Code Pot file Cluster SJS classification P&P classification
13 pp128 A 19 (iv) 984 phaestos
30 pp226 A 20 (iv) 944 beehive
1 pp102 A 15 (iv) n/a 'big pithos'
14 pp130 B 17 (iv) n/a
15 pp132 C 18 (iv) 844 Souda oil jar
12 pp126 D 8 (ii) 972 pitharaki
2 pp106 E 14 (iii) 847 pitharaki
4 pp109 E 14 (iii) 847 pitharaki
10 pp122 E 13 (iii) 901 koroni
21 pp209 F 11 (i) 915 koroni-micro
24 pp214 F 9 (i) 915 koroni-micro
26 pp218 F 9 (i) 915 koroni-micro
28 pp223 F 9 (i) 915 koroni-micro
25 pp217 F 9 (i) 915 koroni-micro
22 pp210 F 10 (i) 915 koroni-micro
27 pp221 F 9 (i) 915 koroni-micro
18 pp202 G 6 (ii) 907 koroni-micro
20 pp207 G 7 (ii) 915 koroni-micro
19 pp205 G 6 (ii) 907 koroni-micro
23 pp213 G 6 (ii) 915 koroni-micro
8 pp118 H 13 (iii) 972 pitharaki
17 pp201 H 12 (iii) n/a 'small pithos'
16 pp134 H 16 (iv) 978 Minoan jar
5 pp112 K 13 (iii) 905 koroniotiko
6 pp114 K 13 (iii) 972 pitharaki
9 pp120 L 13 (iii) 972 pitharaki
11 pp124 L 13 (iii) 972 pitharaki
29 pp224 L 6 (ii) 915 koroni-micro
3 pp108 M 14 (iii) 847 pitharaki
7 pp115 M 13 (iii) 905 koroniotiko
n/a = not available
The pithoi have been grouped into 11 clusters. There are two main groups of clusters: A, B, C,& D; and E, F, G, H, K, L & M. The first group has 4 small clusters containing 6 pots. The common feature of these is that they are unlike the other, larger group of pots!
Shennan split the pithoi into 10 related groups and 5 individuals, giving 15 classes. However, only 4 of these classes had more than a single member (9, 6, 13, 14), and many of these were "similar to but separate from" one of the bigger classes. For the purposes of this study, the Shennan classes were manually combined to give 4 groups (i to iv).
It is worth noting that the major division in the result is to separate group IV from the rest. This group contains those pithoi that Shennan described as individuals, unrelated to each other or any of the other classes. On the whole the correspondence between the automatic classification and Shennan's classes is good.
A similar result is obtained when the clusters are compared to the P&P classification. As with Shennan's classification, there are a few anomalies but in general the correspondence is good.
As can be observed in Figure 1, the amphorae and pithoi classes were successfully and distinctly separated by the method. The two halves of the dendrogram exhibit a roughly similar structure to the dendrograms created separately. The differences, which were at a low (detailed) level, are probably due to faint similarities between the shapes of the pithoi and amphorae.
Cluster analysis has been used before in archaeology to classify artefacts and when concerned with classifying according to shape, these approaches have used a set of physical measurements of the pots such as height and width as input to the cluster analysis. Such variables describe 'intrinsic' attributes, that is properties of the pot itself.
The present method differs from previous work both in the automated nature of the image analysis and also by considering how a pot relates to each of the others in the collection. Under this definition, similar pots are those whose relationships to other pots are similar, not merely those which are similar to each other.
The comparison of the automatic classifications with human classifications has shown remarkably similar results. The minor differences show that the level of resolution between classes distinguished by humans is not consistent. This is most marked in the amphora classification. Shennan's groups 3,4 appear very similar to the machine at the resolution that separates the main clusters, but the human distinguishes them easily. The machine does distinguish between these two classes eventually, but at a much more sensitive resolution. This is because the human homes in on the marked difference between the rim shapes of the two types. As the rim is only a small proportion of the overall shape, this difference is not as important to the machine method, which uses the whole of the pot's shape. This shows both the power and weakness of the human mind for the classification task: it can work simultaneously on several levels (and of course uses more information than shape alone), but those levels are personal and shifting.