Chapter 7: Comparing Shapes.

Tangent Method.

Several coding methods have been applied to the problem of comparing open profiles supplied in the same orientation. Some of these can be extended to deal with closed profiles, but in these cases most of them require a designated starting point which corresponds to the same position on each profile.

The first method requires an initial normalisation, calculating a value of si, the percentage arc length, corresponding to each data point (xi,yi). Thus the values of si range from 0 to 100. Then tangents, expressed in degrees, are calculated for each point and graphs of ti against si are plotted for each profile. Because the curves have been normalised, this method gives a good comparison and the values of tangent are more sensitive to small changes in shape than the original data points.

As an example, the shape shown in Figure 7.1 could be represented by plotting the following data set, where the points are x(i),y(i) for values of i from 1 to 20.

(30,60), (30,50), (40,50), (40,40), (30,40), (30,30), (33,20), (40,16), (50,20), (55,10), (50,6), (40,5), (30,10), (21,20), (20,30), (20,40), (10,40), (10,50), (20,50), and (20,60).

The expression for arc-length is given as the length of the polyline curve and the data-points will usually be recorded at closer intervals, giving an even better estimate of the shape.

s(i)=SUM{ sqrt{(x(j)-x(j-1))2+(y(j)-y(j-1))2}}

and so for the above points, the values of arc-length will be :
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.44, 68.50, 79.27, 90.45, 96.85, 106.90, 118.08, 131.53, 141.57, 151.57, 161.57, 171.57, 181.57, 191.57
The total arc length is thus 191.57 and so dividing each of the above lengths by 100/191.57 will give the values of percentage arc-length for this example.

The tangents may be calculated from the expression

t(i) = { (y(i+1) - y(i)) } / { (x(i+1) - x(i) ) }

but the value t(n) at the last point cannot be calculated in this way and will need to be set equal to t(n-1). To be compatible with the method, it will be necessary to convert these values to degrees, and this will also deal with the case when the x-values are equal and so the above expression would give overflow in the computer. The algorithm for calculating tangent now becomes:
Loop for i from 1 to n-1
If x(i) = x(i+1)
then
If y(i) is greater than y(i+1)
then
set t(i) to 270
else
set t(i) to 90
set gradient to [y(i+1)-y(i)]/[x(i+1)-x(i)]
set t(i) to arctan(gradient)
Endloop
Set t(n) to t(n-1)

There are still some additional points to be considered. The arctan function usually returns values in radians not degrees, but this is a simple scaling. More serious is the fact that it returns values for the principal angle, not the full range from 0 to 360 degrees. To allow for this, consider the following diagram:

The principal angle goes from -pi/2 to +pi/2 and any values from the arctan function are values in radians within this range. Converting to degrees, the principal angle goes from -90o to +90o. i.e. from a position vertically downwards up the right-hand side of the axes to a position vertically upwards. To decide which quadrant is actually needed, we must look at the values of
dy = y(i+1) - y(i)
and
dx = x(i+1)-x(i)

In the first quadrant, dy and dx are both positive and the angle is simply the value given by arctan.
In the second quadrant, dy is positive and dx is negative. A negative value of tangent means the result from arctan will be an angle of -90o+ q and the angle we want is 90o+ q, so we need to calculate (180o+arctan(tangent))
In the third quadrant, dy and dx are both negative making the value of the tangent positive. The value of the angle is (180o + arctan (tangent).
In the fourth quadrant, dy is negative and dx is positive. The angle we want is given by (270o + q) and the angle we have is (-90o + q), so we need to calculate (360o + arctan(tangent)).

Consequently, a possible algorithm might be:

Set dy to [y(i+1)-y(i)]
Set dx to [x(i+1)-x(i)]
Set gradient to dy/dx
Set angle to 57.296*arctan(gradient)
If dx is negative
then Set t(i) to 180 + angle
else if dy is positive
then set t(i) to angle
else set t(i) to 360 + angle
Endif
Endif

With these additions, the set of data discussed above results in the graph shown in Figure 7.3, given below:

This method was developed by P. Main of the British Museum Research Laboratory and has been applied to several archaeological studies.

Chain-Code Method.

This method has been applied to profiles recorded by a digitising camera, where the outlines are obtained automatically in the form of illuminated pixels on a screen.

The chain-code records a sequence of integers indicating the direction of the next illuminated pixel and follows the edge of the profile from start to finish. The simplified version of the object shown in Figure 7.4 would lead to the following Chain-code.
66006644666770007755543323222244220022

An alternative representation, the “Differential Chain-code sequence”, subtracts the previous direction from each one, so the straight sections in any direction show up as patches of zeros. Applying this the above sequence would give the result:
6060(-6)020(-2)00(-1)0700(-7)02001101(-1)1000(-2)02020(-2)0

Once the sequences have been obtained, various pattern-matching techniques can be applied to decide on the similarity between different profiles. This has been applied to the identification of oak leaves of different varieties of oak, by R. White of the Biology Department at Southampton University. Since the overall shape of the leaves remain the same and the identification depends on the number and shape of the serrations along the edges, this method seems good at discriminating this sort of detail. He had also tried an analysisusing an elliptical Fourier series, but found the time required for these calculations was prohibitively large on his system.

B-Spline Method.

This is a method which I have applied to pottery-profiles and in the form I used it was good a recognising similarities of overall shape. I think it would be possible to extend it to give greater weight to small details in specific areas if this were considered desirable.

A B-Spline curve is calculated from the positions of a small number of control points and so once a curve can be represented in this way, the amount of data to be recorded can be reduced substantially, often by a factor of 10. The calculation uses a parameter t which varies from zero to tmax as the curve is drawn, and each pair of coordinates (x(t), y(t)) is calculated from a subset of the control points. For a B-Spline curve of order k with (n+1) control-points, the value of tmax is given by tmax = n - k + 2, and the nearest k control points are used to calculate any point on the curve.  

Fitting such a curve to a set of digitised points (x(i),y(i)) requires some method evaluating of t(i) for each digitised point. One possible method is to scale the arc-length of the whole curve to fit the range 0 to tmax}. Thus for the N digitised points, we have:

smax = SUM{ sqrt{((x(i)-x(i-1))2 + (y(i)-y(i-1))2)}}
t(i) = t(i-1) + sqrt{ (x(i)-x(i-1))2 + (y(i)-y(i-1))2) } * tmax/smax

Then the control points have to be adjusted to minimise the distance between x(i) and x(t(i)) and between y(i) and y(t(i)). Some initial experimentation will be needed to decide on the order and number of control-points appropriate for representing this type of profile and this should be kept constant for the study. Then for each section of the curve in turn, the nearest control-point may be adjusted to minimise the distance between the B-Spline curve and the digitised points. In many CAD packages, this is done interactively, but it is also possible to calculate this automatically. If the values of k and n and the method of calculating t(i) are kept constant, then the resulting set of control points should be unique for any given profile.
Once each profile has a unique set of control points, then applying a scaled Manhattan metric to the control points of different curves will give a good measure of their dissimilarity. This can be used to classify the profiles.

As a simple example of the B-Spline calculation, let us consider a curve of order 3 with 6 control-points. This gives n=5 and tmax=4. Evaluating this for a value of t = 0.5, requires the following table to give the values of the Knot-vector, j, and Nj,k .

 0         0          0          0          0.25     P0  

 0         1          0          0.5       0.5       P1  

 0         2          1          0.5       0.25     P2  

 1         3          0          0          0          P3  

 2         4          0          0          0          P4  

 3         5          0          0          0          P5  

 4         6          0          0  

 4         7          0  

 4         8  

The values of N are calculated from the expressions:
Nj,1 (t) = 1 if } Vj <= t <=V j+1
= 0 otherwise.

(where Vj is the value of the knot-vector) and

Nj,k (t) = (t-Vj)*Nj,k-1}(t) / (Vj+k-1-Vj) + (V j+k-t)*N j+1,k-1(t) / (V j+k-V j+1)

Thus the value at t = 0.5 is determined by the control-points P0, P1 and P2. However, because the end values are repeated 3 times in the knot-vector, the curve is constrained to pass through the points P0 and P5.

Other possible choices for calculating t(i) are either t(i) = (i-1)*N /tmax or some complicated expression involving both arc-length and curvature so that the control-points will be closer together in areas of high curvature.

Comparing Closed Curves.

When comparing closed curves, there is always the problem of obtaining the correct orientation, so that we are comparing like with like. For example, taking the shape initially considered, if we merely ensure that the longest dimension is vertical, there are four possible orientations for comparison (the dotted outlines in Figure 7.6). Only one of these is correct.

In 1986, P. Sneath discussed this problem in a paper at the Computer Applications in Archaeology Conference. He considered two profiles, containing different numbers of digitised points, and recommended the following approach. Calculate the centroid for each profile. Shift the profiles so that the centroids coincide and keep these fixed. Now rotate one of the profiles until the minimum difference has been found. Then take the mirror-image of the rotated profile and repeat the process. The overall minimum will give the orientation which corresponds most closely and the value of the dissimilarity of these profiles.

References

`SHU - an interactive graphics package for the storage, retrieval and analysis of artifact shapes' P.L.Main. C.A.A.81 p75-82.
R.White. Private communication.
`Use of a Sinclair Spectrum for Shape Analysis' S.Laflin C.A.A 86 p83-90.
`Some experiments in fitting pairs of points that lack defined reference points'. P.H.A.Sneath. C.A.A.86 p179-196.

Copyright (c) Susan Laflin 1998.