Classification of Archaeological Artefacts using Shape

P. Durham, P. H. Lewis and S. J. Shennan

(c) University of Southampton 1994


Contents

  1. Introduction
  2. Artefact representation
  3. Clustering
  4. Test data
  5. Results and comparison with human classifications
    1. Amphorae
    2. Pithoi
    3. All pots
  6. Conclusions
  7. References

1. Introduction

The classification of archaeological artefacts involves the arrangement of artefacts into meaningful groups perhaps on the basis of time period, country of origin or the individual craftsmen responsible for the manufacture of the artefact. Traditionally the process of classification has been very time consuming and at least partially subjective, relying on the skills of individual archaeologists to identify features of the artefacts which are the basis of the grouping criteria.

In this paper we describe a computer based system for classifying archaeological objects. The system has two parts. In the first part, image analysis is used to extract representations of the artefacts from digitised images obtained using a high resolution scanner on photographs of the artefacts. The representations form the basis of the classification, and in the second part, statistical techniques are used to cluster the representations into classes. The aim is to achieve a classification which is similar to the classifications obtained by a skilled archaeologist.

2. Artefact representation

In a previous paper [1] we have described the use of the Generalised Hough Transform (GHT) [2] for comparing two images of artefacts in the SMART system (System for Matching ARTefacts).

When using the GHT for establishing if a target shape, extracted from image 1 say, is present in another image, image 2, an accumulator matrix is used to accumulate evidence for the possible centroid position for the shape in image 2. The location of the highest peak in the accumulator gives the centroid of the most likely match, and the peak height relative to the number of edge segments in the shape from image 1 gives a measure of the similarity of the match. Thus, we can define a similarity measure , for the similarity of image 1 with image 2.

For a perfect match, each edge point in image 1 will produce one increment in the peak cell of the accumulator, making the fraction in Equation 1 equal to 1. Less perfect matches give smaller values. (From our experience, values greater than 0.4 indicate good matches).

It should be noted that, unlike Euclidean distance measures, the similarity measure is, in general, asymmetrical, ie . The measure indicates the extent to which the shape in image 1 is found in image 2, which, with the exception of a perfect match, may well differ from the extent to which the shape in 2 is found in 1.

In order to develop a representation of the collection of artefacts to be classified, for each artefact, , we calculated the set of similarities

where was the total number of artefacts in the collection. This set of similarities was then treated as the set of observations characterising image, , within the set. It may be thought of as an n dimensional feature vector representing artefact . As a final step in the extraction of the artefact representation, the dimensionality of the feature space was reduced by a principal component analysis of the data matrix.

3. Clustering

Scatterplots for the first two principal components are examined to determine whether there is sufficient indication of clustering to warrant proceeding to the cluster analysis stage. The clustering method chosen was the hierarchical agglomerative method, partly because it facilitates the extraction of groupings and sub-groupings and partly because the method is well known and its strengths and weaknesses are well understood.

A matrix of Euclidean distances between the objects in variable space is calculated from the raw similarity data described above. The unsquared distances are then used with the Group Average method to produce the clustering described below.

A scree plot is made to examine the cluster fusions produced by the method. Marked 'elbows' in the plot indicate where a fusion joins two relatively dissimilar clusters, and hence indicates that those clusters are likely to have some validity.

4. Test data

In order to test the method we used a set of known objects, which were a collection of Cretan pots. These had several advantages. A British wholesaler, 'Pots and Pithoi' (P&P) have a large number of these pots in stock, and we were able to choose several different types of greater or lesser similarity, and to get several examples of each type. This is important as it enables the variation between pots of the same type to be isolated from the variation between types. The pots are made by hand and there may be significant variation within a given type.

The classification generated by the computer system was compared to two human classifications of the same pots. The first of these was made by an archaeologist on the basis of the pots' profiles (SJ Shennan - the 'SJS' classification), the second is that used by the wholesaler (the 'P&P' classification).

5. Results and comparison with human classifications

Figure 1: Dendrogram showing relationships of pithoi (left) and amphorae (right)

The pots chosen from 'Pots & Pithoi' fell into two broad categories: amphorae and pithoi. 'Amphora' is a technical name for a type of small jug with no spout and two large handles. 'Pithos' is a technical name for a large, wide-mouthed storage jar.

In the first instance, we tried to classify the pots within the two major categories separately.

Code   Pot file    Cluster    SJS classification    P&P classification
  1      pp231        A                2                    826
  3      pp235        A                2                    826
  8      pp310        A                2                    826
  6      pp305        A                2                    826
 11      pp315        A                2                    826
  2      pp233        B                1                    826
  7      pp308        B                1                    826
 12      pp318        B                1                    826
 13      pp320        B                1                    826
  9      pp311        B                1                    826
  5      pp304        B                1                    826
 10      pp313        B                1                    826
  4      pp302        B                1                    826
 14      pp321        C                4                   826old
 15      pp323        C                4                   826old
 16      pp326        C                3                   826old
 17      pp327        C                3                   826old
 18      pp330        D                5                    836
 19      pp331        D                5                    836
 20      pp334        D                5                    836
 21      pp336        D                5                    836

Table 1: Amphora classifications

5.1. Amphorae

A scatterplot of the first two principal components suggested that the amphorae lay in several discrete groups. The cluster analysis put these pots into four clusters, as shown in table 1. The cluster A has 5 members, which all have a squared-off rim and a tall, thin body. The cluster B has 8 members. These are very similar to those in cluster 1, but the rim is sloping and simpler. The cluster C has four members. They have squatter bodies and heavier handles than the previous two groups. The cluster D has four members. These are generally rounder than the others, and have a rounded base where the others have flat bases.

The clusters agree very well with the two manual classifications. Compared to the SJS classification, cluster A was equivalent to group 2, cluster B to group 1, cluster C to groups 3 and 4, and cluster D to group 5. It is interesting to note that although cluster C combines two of Shennan's groups, those groups are resolved by the clustering algorithm at a higher resolution. This means that the difference between these two groups is smaller than between the other groups, but the human classification uses different levels of resolution simultaneously.

The P&P classification ostensibly divides these pots into two types. Type 836 coincides with cluster D with the other three clusters all being of type 826. However, further enquiry established that pots within certain types were different because they came from different batches (i.e. different firings). Interestingly, the most recent year's pots had come from two batches which corresponded exactly with clusters A and B. The two batches argument was supported by the fact that pots in cluster A have a white surface colour, whereas those in cluster B are much redder, although the automated classification method does not yet take colour into account. The pots referred to as 'the old style', those from previous years, were those in cluster C. (the fact that the old style could be split into two types, as Shennan did, was attributed to their 'being made on different days' by the same potter)

Code   Pot file    Cluster    SJS classification    P&P classification
 13      pp128        A           19    (iv)         984   phaestos
 30      pp226        A           20    (iv)         944   beehive
  1      pp102        A           15    (iv)         n/a   'big pithos'
 14      pp130        B           17    (iv)         n/a          
 15      pp132        C           18    (iv)         844   Souda oil jar
 12      pp126        D            8    (ii)         972   pitharaki
  2      pp106        E           14   (iii)         847   pitharaki
  4      pp109        E           14   (iii)         847   pitharaki
 10      pp122        E           13   (iii)         901   koroni
 21      pp209        F           11     (i)         915   koroni-micro
 24      pp214        F            9     (i)         915   koroni-micro
 26      pp218        F            9     (i)         915   koroni-micro
 28      pp223        F            9     (i)         915   koroni-micro
 25      pp217        F            9     (i)         915   koroni-micro
 22      pp210        F           10     (i)         915   koroni-micro
 27      pp221        F            9     (i)         915   koroni-micro
 18      pp202        G            6    (ii)         907   koroni-micro
 20      pp207        G            7    (ii)         915   koroni-micro
 19      pp205        G            6    (ii)         907   koroni-micro
 23      pp213        G            6    (ii)         915   koroni-micro
  8      pp118        H           13   (iii)         972   pitharaki
 17      pp201        H           12   (iii)         n/a   'small pithos'
 16      pp134        H           16    (iv)         978   Minoan jar
  5      pp112        K           13   (iii)         905   koroniotiko
  6      pp114        K           13   (iii)         972   pitharaki
  9      pp120        L           13   (iii)         972   pitharaki
 11      pp124        L           13   (iii)         972   pitharaki
 29      pp224        L            6    (ii)         915   koroni-micro
  3      pp108        M           14   (iii)         847   pitharaki
  7      pp115        M           13   (iii)         905   koroniotiko

                                                     n/a = not available

Table 2: Pithos classifications

5.2. Pithoi

A scatterplot of the first two principal components did not show such marked clustering for the pithoi as that for the amphorae, but still suggested that clustering was present. It can be seen from Table 2 that these pots don't fall easily into a few large, distinct clusters as the amphorae did, but this is also reflected in the situation for the manual classifier, whose assignment to classes was more cautious with the pithoi than with the amphorae.

The pithoi have been grouped into 11 clusters. There are two main groups of clusters: A, B, C,& D; and E, F, G, H, K, L & M. The first group has 4 small clusters containing 6 pots. The common feature of these is that they are unlike the other, larger group of pots!

Shennan split the pithoi into 10 related groups and 5 individuals, giving 15 classes. However, only 4 of these classes had more than a single member (9, 6, 13, 14), and many of these were "similar to but separate from" one of the bigger classes. For the purposes of this study, the Shennan classes were manually combined to give 4 groups (i to iv).

It is worth noting that the major division in the result is to separate group IV from the rest. This group contains those pithoi that Shennan described as individuals, unrelated to each other or any of the other classes. On the whole the correspondence between the automatic classification and Shennan's classes is good.

A similar result is obtained when the clusters are compared to the P&P classification. As with Shennan's classification, there are a few anomalies but in general the correspondence is good.

5.3. All pots

The same analysis was repeated using all 51 of the pots together in order to establish whether the method would separate out the two major types and whether the allocation to clusters would be the same as when the major types were considered separately.

As can be observed in Figure 1, the amphorae and pithoi classes were successfully and distinctly separated by the method. The two halves of the dendrogram exhibit a roughly similar structure to the dendrograms created separately. The differences, which were at a low (detailed) level, are probably due to faint similarities between the shapes of the pithoi and amphorae.

6. Conclusions

We have presented a novel method for automatic artefact classification using image analysis techniques to extract the initial information for the cluster analysis.

Cluster analysis has been used before in archaeology to classify artefacts and when concerned with classifying according to shape, these approaches have used a set of physical measurements of the pots such as height and width as input to the cluster analysis. Such variables describe 'intrinsic' attributes, that is properties of the pot itself.

The present method differs from previous work both in the automated nature of the image analysis and also by considering how a pot relates to each of the others in the collection. Under this definition, similar pots are those whose relationships to other pots are similar, not merely those which are similar to each other.

The comparison of the automatic classifications with human classifications has shown remarkably similar results. The minor differences show that the level of resolution between classes distinguished by humans is not consistent. This is most marked in the amphora classification. Shennan's groups 3,4 appear very similar to the machine at the resolution that separates the main clusters, but the human distinguishes them easily. The machine does distinguish between these two classes eventually, but at a much more sensitive resolution. This is because the human homes in on the marked difference between the rim shapes of the two types. As the rim is only a small proportion of the overall shape, this difference is not as important to the machine method, which uses the whole of the pot's shape. This shows both the power and weakness of the human mind for the classification task: it can work simultaneously on several levels (and of course uses more information than shape alone), but those levels are personal and shifting.

7. References

  1. P. Durham, P. Lewis and S. Shennan, ``Artefact matching and retrieval using the generalised hough transform,'' Dept of Electronics and Computer Science: 1993 Research Journal, pp. 57-59, 1993. Editor Adrian Pickering, University of Southampton 1993.

  2. D. Ballard, ``Generalising the hough transform to detect arbitrary shapes,'' Pattern Recognition, vol. 13, no. 2, pp. 111-122, 1981.