Chapter 7: Comparing Shapes.
Tangent Method.
Several coding methods have been applied to the problem of comparing
open profiles supplied in the same orientation. Some of these can be extended
to deal with closed profiles, but in these cases most of them require a designated
starting point which corresponds to the same position on each profile.
The first method requires an initial normalisation,
calculating a value of si, the percentage arc length, corresponding
to each data point (xi,yi). Thus the values of si
range from 0 to 100. Then tangents, expressed in degrees, are calculated for
each point and graphs of ti against si are plotted for
each profile. Because the curves have been normalised, this method gives a good
comparison and the values of tangent are more sensitive to small changes in
shape than the original data points.
As an example, the shape shown in Figure 7.1 could be represented by plotting
the following data set, where the points are x(i),y(i) for values of i from 1
to 20.
(30,60), (30,50), (40,50), (40,40), (30,40), (30,30), (33,20), (40,16),
(50,20), (55,10), (50,6), (40,5), (30,10), (21,20), (20,30), (20,40),
(10,40), (10,50), (20,50), and (20,60).
The expression for arc-length is given as
the length of the polyline curve and the data-points will usually be recorded
at closer intervals, giving an even better estimate of the shape.
s(i)=SUM{ sqrt{(x(j)-x(j-1))2+(y(j)-y(j-1))2}}
and so for the above points, the values of arc-length will be :
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.44, 68.50, 79.27, 90.45, 96.85,
106.90, 118.08, 131.53, 141.57, 151.57, 161.57, 171.57, 181.57, 191.57
The total arc length is thus 191.57 and so dividing each of the
above lengths by 100/191.57 will give the values of percentage arc-length for
this example.
The tangents may be calculated from the expression
t(i) = { (y(i+1) - y(i)) } / { (x(i+1) - x(i) ) }
but the value t(n) at the last point cannot be calculated in this way
and will need to be set equal to t(n-1). To be compatible with the method, it
will be necessary to convert these values to degrees, and this will also deal
with the case when the x-values are equal and so the above expression would
give overflow in the computer. The algorithm for calculating tangent now
becomes:
Loop for i from 1 to n-1
If x(i) = x(i+1)
then
If y(i) is greater than y(i+1)
then
set t(i) to 270
else
set t(i) to 90
set gradient to [y(i+1)-y(i)]/[x(i+1)-x(i)]
set t(i) to arctan(gradient)
Endloop
Set t(n) to t(n-1)
There are still some additional points to
be considered. The arctan function usually returns values in radians not
degrees, but this is a simple scaling. More serious is the fact that it returns
values for the principal angle, not the full range from 0 to 360 degrees. To allow
for this, consider the following diagram:
The principal angle goes from -pi/2 to +pi/2 and any values
from the arctan function are values in radians within this range.
Converting to degrees, the principal angle goes from -90o
to +90o. i.e. from a position vertically downwards up the right-hand
side of the axes to a position vertically upwards. To decide which quadrant is
actually needed, we must look at the values of
dy = y(i+1) - y(i)
and
dx = x(i+1)-x(i)
In the first quadrant, dy and dx are both positive and the angle
is simply the value given by arctan.
In the second quadrant, dy is positive and dx is negative. A negative value of
tangent means the result from arctan will be an angle of -90o+ q
and the angle we want is 90o+ q, so we need to calculate
(180o+arctan(tangent))
In the third quadrant, dy and dx are both negative making the
value of the tangent positive. The value of the angle is (180o +
arctan (tangent).
In the fourth quadrant, dy is negative and dx is positive. The
angle we want is given by (270o + q) and the angle
we have is (-90o + q), so we need to calculate
(360o + arctan(tangent)).
Consequently, a possible algorithm might be:
Set dy to [y(i+1)-y(i)]
Set dx to [x(i+1)-x(i)]
Set gradient to dy/dx
Set angle to 57.296*arctan(gradient)
If dx is negative
then Set t(i) to 180 + angle
else if dy is positive
then set t(i) to angle
else set t(i) to 360 + angle
Endif
Endif
With these additions, the set of data discussed above results in the graph
shown in Figure 7.3, given below:
This method was developed by P. Main of the British Museum Research Laboratory and has been applied to several archaeological studies.
Chain-Code Method.
This method has been applied to profiles recorded by a digitising camera, where the outlines are obtained automatically in the form of illuminated pixels on a screen.
The chain-code records a sequence of integers indicating the
direction of the next illuminated pixel and follows the edge of the profile
from start to finish. The simplified version of the object shown in Figure 7.4
would lead to the following Chain-code.
66006644666770007755543323222244220022
An alternative representation, the “Differential Chain-code sequence”,
subtracts the previous direction from each one, so the straight sections in any
direction show up as patches of zeros. Applying this the above sequence would
give the result:
6060(-6)020(-2)00(-1)0700(-7)02001101(-1)1000(-2)02020(-2)0
Once the sequences have been obtained, various pattern-matching techniques
can be applied to decide on the similarity between different
profiles. This has been applied to the identification of oak leaves of
different varieties of oak, by R. White of the Biology Department at
Southampton University. Since the overall shape of the leaves remain the same
and the identification depends on the number and shape of the serrations along
the edges, this method seems good at discriminating this sort of detail. He had
also tried an analysisusing an elliptical Fourier series, but found the time
required for these calculations was prohibitively large on his system.
B-Spline Method.
This is a method which I have applied to pottery-profiles and in the form I used it was good a recognising similarities of overall shape. I think it would be possible to extend it to give greater weight to small details in specific areas if this were considered desirable.
A B-Spline curve is calculated from the
positions of a small number of control points and so once a curve can be
represented in this way, the amount of data to be recorded can be reduced substantially,
often by a factor of 10. The calculation uses a parameter t which varies from
zero to tmax as the curve is drawn, and each pair of coordinates (x(t),
y(t)) is calculated from a subset of the control points. For a B-Spline curve
of order k with (n+1) control-points, the value of tmax is given by tmax
= n - k + 2, and the nearest k control points are used to calculate any point
on the curve.
Fitting such a curve to a set of digitised points (x(i),y(i)) requires some method
evaluating of t(i) for each digitised point. One possible method is to scale the
arc-length of the whole curve to fit the range 0 to tmax}. Thus for the
N digitised points, we have:
smax = SUM{ sqrt{((x(i)-x(i-1))2 + (y(i)-y(i-1))2)}}
t(i) = t(i-1) + sqrt{ (x(i)-x(i-1))2 + (y(i)-y(i-1))2) } *
tmax/smax
Then the control points have to be adjusted
to minimise the distance between x(i) and x(t(i)) and between y(i) and y(t(i)).
Some initial experimentation will be needed to decide on the order and number
of control-points appropriate for representing this type of profile and this
should be kept constant for the study. Then for each section of the curve in
turn, the nearest control-point may be adjusted to minimise the distance
between the B-Spline curve and the digitised points. In many CAD packages, this
is done interactively, but it is also possible to calculate this automatically.
If the values of k and n and the method of calculating t(i) are kept constant,
then the resulting set of control points should be unique for any given profile.
Once each profile has a unique set of control points, then applying
a scaled Manhattan metric to the control points of different curves will give a
good measure of their dissimilarity. This can be used to classify the profiles.
As a simple example of the B-Spline calculation, let us consider a curve of order
3 with 6 control-points. This gives n=5 and tmax=4.
Evaluating this for a value of t = 0.5, requires the following table to give
the values of the Knot-vector, j, and Nj,k .
0 0 0 0 0.25
P0
0 1 0 0.5 0.5
P1
0 2 1 0.5 0.25
P2
1 3 0 0 0
P3
2 4 0 0 0
P4
3 5 0 0 0
P5
4 6 0 0
4 7 0
4 8
The values of N are calculated from the expressions:
Nj,1 (t) = 1 if } Vj <= t <=V j+1
= 0 otherwise.
(where Vj is the value of the knot-vector) and
Nj,k (t) = (t-Vj)*Nj,k-1}(t)
/ (Vj+k-1-Vj) + (V j+k-t)*N j+1,k-1(t)
/ (V j+k-V j+1)
Thus the value at t = 0.5 is determined by the control-points
P0, P1 and P2. However, because the end values are
repeated 3 times in the knot-vector, the curve is constrained to pass through the
points P0 and P5.
Other possible choices for calculating t(i) are either t(i) = (i-1)*N /tmax
or some complicated expression involving both arc-length and curvature so that
the control-points will be closer together in areas of high curvature.
Comparing Closed Curves.
When comparing closed curves, there is
always the problem of obtaining the correct orientation, so that we are
comparing like with like. For example, taking the shape initially considered,
if we merely ensure that the longest dimension is vertical, there are four
possible orientations for comparison (the dotted outlines in Figure 7.6). Only
one of these is correct.
In 1986, P. Sneath discussed this problem in a paper at the Computer
Applications in Archaeology Conference. He considered two profiles, containing
different numbers of digitised points, and recommended the following approach.
Calculate the centroid for each profile. Shift the profiles so that the
centroids coincide and keep these fixed. Now rotate one of the profiles until
the minimum difference has been found. Then take the mirror-image of the rotated
profile and repeat the process. The overall minimum will give the orientation
which corresponds most closely and the value of the dissimilarity of these
profiles.
References
`SHU - an interactive graphics package for the storage, retrieval
and analysis of artifact shapes' P.L.Main. C.A.A.81 p75-82.
R.White. Private communication.
`Use of a Sinclair Spectrum for Shape Analysis' S.Laflin C.A.A 86 p83-90.
`Some experiments in fitting pairs of points that lack defined reference
points'. P.H.A.Sneath. C.A.A.86 p179-196.
Copyright (c) Susan Laflin 1998.