Chapter 5: Seriation Methods.

Permutation of Similarity or Data Matrices.

The permutation of the similarity matrix, exchanging rows and columns in an attempt to get the largest values of similarity closest to the diagonal, is one of the earliest methods in this area. In archaeological applications, it was described in the pair of papers by Brainerd and Robinson which appeared in ‘American Anthropology’ in 1951. Although the idea is simple, programs written for this method suffered from a number of problems.

1. TIME. For a large matrix, the time taken to compare the rows and columns and keep scanning through and exchanging them until no further improvement was possible, was very large. This could be overcome to some extent by leaving the program running over the weekend or by reading in the data in some arbitrary order and doing a few hundred iterations before examining it for any improvement.

2. LOOPING. For some configurations of the data, it would get stuck in a loop instead of converging to an optimum. An example of this could be:
exchange objects 23 and 64.
exchange objects 64 and 92.
exchange objects 92 and 23.

This loop of three objects would continue indefinitely, since at each step the exchange gives an apparent improvement in the configuration. These are very difficult to test for, since one never knows how many objects will form the loop and since the computer never remembers what has gone before, it won't notice such a loop without explicit tests.

3. FALSE OPTIMA. Even in cases where no looping occurred and the method converged to an optimum, there was no way of predicting whether this was the overall global optimum (which was wanted) or a local optimum which satisfied the criteria locally but could be improved if the method started from a different arrangement of the data. The solution suggested for this problem was to run the program several times, starting from a different ordering of the objects on each occasion. This did nothing to ease the problem of the time required to run the method even once.

This method was used for a number of archaeological applications. The famous study by David Clarke of beaker pottery, described in several places including his book “Analytical Archaeology”, used this method and presented the results as a shaded similarity matrix with different symbols indicating different similarity levels. By interpreting this matrix, he was able to suggest a taxonomy for the different types of beaker pottery used by the Belgae.

The other method was applied to binary (presence/absence) data and related to different orderings of the data matrix. It started as a manual method with the attributes listed along the top of the notice-board and each object represented by a strip of paper with present attributes coloured in black and absent ones left white. The strips of paper were then pinned to board and re-arranged to try and get the black areas as few and as large as possible. The resulting chart, often referred to as “battleships”, gave a type-frequency diagram optimised to place the most similar objects next to one another.

This approach worked well for amounts of data small enough to be pinned on a single board, but ran into serious problems once it was applied to much larger sets.

Principal Component Analysis.

This only works for numeric data. It assumes the n attributes are measurements in attribute space along mutually perpendicular axes and finds expressions for up to n new attributes, which are linear combinations of the previous ones, so that the variance in the data is maximised along the axes. The following paragraphs give a recipe for calculating these new coordinates.

1. Assume we have a data matrix X of t objects and n attributes. Assume all attributes have been scaled to have a mean value of zero and a variance of 1. This requires the following expressions.

For attribute X(I,J), the mean value, XM(I), is given by the equation

XM(I) = 1/t SUM{ X(I,J)}

and the variance is given by the equation

V = 1/(t-1)SUM{( X2(I,J) – XM2(I))}

and so replacing all the values of X(I,J) by X(I,J)-XM(I) will give new values of this attribute with the origin shifted to give a mean value of zero and dividing these values by V will scale them to lie in the range (-1,+1).

2. Calculate the matrix R = (1/t - 1) XT . X , which is a symmetric nxn matrix of product-moment correlation coefficients.

3. Calculate the eigenvalues and eigenvectors of R. Normalise the eigenvectors so that vi.viT = 1.

There are many subroutines, for example in the NAG library, which will calculate eigenvalues and eigenvectors for you. If the eigenvectors are supplied according to some other normalisation rule, it is easy to scale them as required.

The RANK of the matrix R, denoted by k, is the number of non-zero eigenvalues and also gives the number of independent dimensions in attribute space. Thus if the original n attributes were not all independent, this will become obvious from the rank of the matrix R. The size of the eigenvalue gives an estimate of the variance along its principal axis, and it is usual to include enough dimensions to account for 75% of the variance.

The eigenvectors give a set of n mutually perpendicular directions in attribute space, which are called the “principal axes”. Distances along these axes are calculated from the original attributes in the following way. If the eigenvalue V1 has the values (0.5,0.4,0.8), then the object with attribute values (0.1,0.2,0.3) will have the value along this axis of
d1 = 0.1*0.5 + 0.2*0.4 + 0.3*0.8
and similarly, any other object with attribute values (a1, a2, a3) will have
d1 = a1*0.5 + a2*0.4 + a3*0.8

The eigenvectors corresponding to the three largest eigenvalues are called the “main axes” and it is often useful to plot out the data in terms of these main axes and look at the distribution obtained. Sometimes this shows definite clustering, and so hints that some form of cluster analysis is the obvious next step. On other occasions the data will appear as one single group, indicating that cluster analysis is not appropriate.

You will now appreciate that the two-dimensional data supplied for your exercise could well be data from a more general set of attributes which have been recalculated in terms of the first two principal axes. Thus the results from your exercise are applicable to many sets of numeric data.

Principal Coordinate Analysis.

This is applied to a distance matrix D and the resulting eigenvectors each give the coordinates of one object along the principal axes. If all attributes are numeric and the distances are calculated using a Euclidean metric, then this method gives the same results as the Principal Component Analysis. However this method may also be applied to distance matrices obtained from mixed attributes and will give reasonable results in many cases. If the numbers are not appropriate for this method, this will be indicated by the eigenvalues, which will become large and negative.

The matrix D contains values of d{ij}, the distance between object i and object j. No normalisation is specified at this stage.

1. Calculate the t x t matrix E where e{ij} = -1/2 d2{ij}

2. Calculate the t x t matrix F where

f{ij} = e{ij} - 1/t Si e{ij} - 1/t Sj e{ij} -1/t2 Si  Sj e{ij}

3. Calculate the eigenvalues and eigenvectors of F. If the eigenvectors are normalised so that
vi.viT = l
then the components of the normalised eigenvectors give the coordinates of the object along the principal axes. Although the calculation of eigenvalues and eigenvectors can usually be done in a library routine, so you do not need to study methods for this problem, it is still a lengthy and time-consuming process. The accuracy of the results is not guaranteed, but depends on the condition of the problem.

Multiple Factor Analysis.

This is applied to the data matrix and frequently follows a Principal Component Analysis. In an attempt to clarify the structure of the data, axes are rotated and need not remain mutually perpendicular and so independent of each other. This method can become very subjective.

Multi-dimensional Scaling.

These methods are applied the n x n distance matrix D of distances D{ij}, and the non-metric methods may be applied to any dissimilarity matrix. The method applies a re-scaling transformation to the data to calculate distances d{ij} (corresponding to the original D{ij}) which belong to an attribute space of lower dimensions. While different methods use different transformations, in each case the ORDER of the objects relative to one another must not be altered, although the actual distances will be. “METRIC SCALING” is given by a linear transformation function and “NON-METRIC SCALING” is given by a monotonic transformation function. The non-metric scaling only preserves the order of the objects, and there is no analytic solution to this problem, so some iterative method must be used.

The method also calculates the disparity for each d{ij} in the matrix and uses this to calculate the stress caused by the iteration. Values of distance are obtained which minimise the stress for each iteration and the method continues to reduce the dimensions as long as the stress remains less than some specified tolerance. If it is possible to reduce the problem to two-dimensions, then the data may be plotted out and its structure studied.

References.

G.Dunn & B.S.Everitt “An Introduction to Mathematical Taxonomy” Chapters 4 & 5.
J.E.Doran & F.R.Hodson “Mathematics and Computers in Archaeology” Chapter 8.
ed K.Enslein, A.Ralston & H.S.Wilf. “Statistical Methods for Digital Compters”.
A.P.M.Coxon. “User's Guide to Multidimensional Scaling”.
ed P.M.Davies & A.P.M.Coxon. “Key Texts in Multidimensional Scaling”.
The Bonn Seriation and Archaeological Statistics Package.

Copyright (c) Susan Laflin 1998.