This post is a discussion of an article published last year
by S.Sinha, A.M. Izhar, R.K. Pan, and B.K. Wells (2010: 1-17). The title of their work indicates they have
performed network analysis of Indus inscriptions and found syntactic
organization. My first reaction to the
title was that such “syntactic organization” had been found previously, on more
than one occasion. G.L. Possehl
describes a number of analyses that made such claims (1996). G.R. Hunter as early as 1929 noted the
function of the POT and COMB as final elements – or terminal signs to use M.
Korvink’s terminology (Possehl 1996: 85).
Similar conclusions come from the work of Father H. Heras (p. 110-112),
a Soviet Russian team of scholars headed by Yu. V. Knorozov (p. 115-122), a
Finnish team including A. Parpola (p. 124), and others. M. Korvink’s analysis, which I have referred
to repeatedly, came after the publication of Possehl’s review, but also notes
such organization (2008).
In their preliminary discussion, Sinha et al note that the
ancestral scripts that later became Mesopotamian cuneiform, Egyptian
hieroglyphs, and Mayan writing had limited connections to spoken language
(2010: 2). They state, “In almost all
cases, a writing system became more or less capable of expressing spoken
language only after centuries of development....” This is certainly correct for cuneiform and
hieroglyphs (my knowledge of Mayan writing is too limited to assess the
accuracy of the statement). However, the
earliest hieroglyphs may not be true writing, but rather proto-writing
(O’Connor 2009: 144-147).
Proto-cuneiform, like proto-Elamite, is also considered not closely tied
to speech (Englund 2001: 1-43). The
first tablets contain only numerical symbols, at the second stage containing
ideographs as well, representing “owners” (people and/or institutions) and
commodities. When speaking of a symbol
system that eventually evolved into a fully developed writing system, it is
common to extend the term “writing” to the earliest examples of the
script. But that does not really
demonstrate that the earliest inscriptions are indeed true writing. Thus, the fact that the Indus script shares
characteristics of the earliest proto-cuneiform, proto-Elamite, and
hieroglyphic scripts may well suggest that the Indus script itself is
proto-writing rather than a fully developed writing system (as noted in several
previous posts).
That caveat aside, the analysis of Sinha et al is quite
interesting. They begin by defining the
corpus they use which is based on Wells’ database (2010). This includes the inscriptions from all three
volumes of the corpus, as well as some unpublished data (Sinha et al 2010:
3). These inscriptions come from 3,896
artifacts, from which a subset of 2,393 are selected as being complete,
unbroken, and legible. From this a
further subset is taken for analysis, comprising inscriptions that occur on a
single line. The selected sequences
contain from 1 to 13 signs, with a median length of 4 signs. As a comparison set, the authors created
randomized variations on each inscription, in which the signs were combined in
every possible order, with statistical measures averaged. If the signs of the Indus script are arranged
purely randomly, the statistics from the comparison set would match the
statistics from the actual corpus. The
fact that results generally differ between the two sets is the evidence the
authors cite for syntactic organization.
To begin with, the authors note 21 signs that only occur
individually, as 1-sign inscriptions (2010: 4).
They then identify 128 signs that only occur at the beginning of an
inscription, a group they term “beginners.”
The third group of 43 signs, which appears only at the end of
inscriptions, is referred to as “enders.”
This leaves 401 signs, some of which sometimes occur at the beginning,
some of which sometimes occur at the end, but all of which also occur in other
positions. Out of this large group, the
authors isolate 127 signs that are never initial or final. By comparison, the randomized set shows very
different beginners and enders, far fewer beginners but considerably more
enders. “We observe that four beginners
in the empirical set...and two enders...never occur as beginners and enders
(respectively) in the 100 randomized trials we carried out. It implies that the occurrence of these signs
always at the beginning or end of a sequence (and never in any other position)
may be highly significant, and certainly not a result of simple chance” (2010:
5).
One troubling feature of the three lists and of the
conclusion drawn from this analysis is the large number of singletons. As noted in previous posts, one of the striking
characteristics shared by proto-cuneiform, proto-Elamite, and the Indus script
is the very large proportion of signs that occur only once in the corpus. Of approximately 1,900 proto-Elamite signs
(not including numerals), over 1,000 are singletons (Dahl 2002: 1-2). Around 300 additional signs occur only twice,
while the vast majority – 1,700 in all – appear fewer than 10 times. This leaves a relatively small core of 200
“common” signs, out of which just 20 occur 100 or more times. What effect does this characteristic have on
the three lists provided by Sinha et al?
As it happens, of 21 signs that occur solo, 10 are singletons in the
entire corpus and only one appears 9 times.
Thus, all are rare. In addition, of the 128 “beginners,” 66 are singletons. A similar proportion are singletons among the
“enders” – 26 out of 43 signs. It seems
misleading to say that a singleton is found only at the beginning or end of an
inscription. If it is a singleton, it
only occurs once, which is not enough to establish a rule!
After comparing the number of sequences begun
or ended by these signs, the authors observe that “most of the beginner signs
appear in only 1-3 distinct sequences” with a maximum of 8 sequences, while “a particular
ender sign can appear at the end of a maximum of 4 distinct inscriptions” (2010:
5). With such small numbers, the results
may not be random but they do not seem particularly robust either.
Sinha et al also look at the distribution of the number of
links each sign has (its degree) and the strength of these links (2010:
5-6). The authors describe their results
concerning the degree as indicating that “the variation of sign relations is
much more restricted in the empirical sequences than would be the case had each
of the sequences been put together randomly” (2010: 6). Their graph indeed shows a difference of
degree between the real or empirical set and the randomized set, but that
difference does not appear particularly great.
When it comes to strength, the authors must agree that the results are
almost the same between the two data sets, inasmuch as “this is governed by the
frequency of occurrence of individual signs” (2010: 6). Clearly, a common sign will have links with
more signs than will a rare sign, whether or not their arrangements are random.
Thus, the 10 most frequent signs in the empirical set have
the most links. These signs include POT,
BI-QUOTES, FISH, 2 POSTS, COMB, WHISKERED FISH, SPEAR, TRI-FORK, 3 POSTS, and
VEE IN DIAMOND. Arrows in one of the
figures show in which direction the links occur among these signs. For example, VEE IN DIAMOND precedes but does
not follow POT; POT both precedes and follows WHISKERED FISH and COMB; FISH
precedes but does not follow SPEAR, while SPEAR precedes but does not follow 2
POSTS. The profusion of arrows is hard
to interpret, but the description summarizes the situation: “about half of all
potential connections [are] present” (2010: 6).
When Sinha et al discuss medial signs, they seem to be on
firmer ground. The intersection of two
sets of signs – those with the most connections with preceding signs and those
with the most connections with following signs – yields an inner core of 26
signs most likely to appear in medial position (2010: 7). Of these, only 5 occur less than 100 times
apiece, so the statistical measures are more convincingly robust. Further statistical manipulations of the data
yield a group of 35 signs forming the “core lexicon” (2010: 7-8). They include medial signs: SINGLE QUOTE,
BI-QUOTES, 3 QUOTES, 3 POSTS, WINGED MAN, HAIRY HUNCHBACK, FISH, MAKRED FISH,
FISH UNDER CHEVRON, WHISKERED FISH, TRI-FORK, RAKE, MALLET, CUPPED POST, POT,
CRAB, CIRCLED FORK. They also include
signs that are most likely to precede this medial group: SINGLE POST, PINCH,
CROSSROADS EX, POTTED ONE, TRI-FORK TOPPED POT, CIRCLED E-FORK, CIRCLED VEE,
CARTWHEEL, OVERLAPPING CIRCLES, VEE IN DIAMOND, BOAT (and three others
occurring slightly fewer than 100 times each).
The authors then focus on determining which signs occur in
pairs by comparing the probability of a given pair’s occurrence in the
empirical data versus the probability in the randomized data set (2010:
9). They find some groupings that
Korvink and others identified previously: FOOTED STOOL + PINWHEEL;
MALLET + FORK; CUPPED SPOON (or CUPPED POST) + 3 POSTS; 3 POSTS + SPEAR;
STACKED 7 + EF TOPPED EXIT; 2 POSTS + FISH; PRAWN + ZEE; ZEE + CROSSROADS EX;
CROSSROADS EX + POT; TRI-FORK TOPPED POT + POT; POT + MAN;
POT + COMB; CIRCLED VEE +
BI-QUOTES; VEE IN DIAMOND + BI-QUOTES;
CARTWHEEL + BI-QUOTES; CARTWHEEL + PINCH; BOAT + PINCH; FAT EX + PINCH. They also find some groupings that seem
unintuitive by Korvink’s analysis, including BI-QUOTES + CIRCLED FORK; BI-QUOTES
+ WHISKERED FISH; BI-QUOTES + FISH
UNDER CHEVRON; FAT CEE + POT.
Here I have underlined the signs identified as terminals by
Korvink and placed his prefix elements in italics. Thus, the reader can see that where Korvink
divides the inscriptions into three segments – prefix, medial portion, and
terminal – Sinha et al simply note which combinations are most frequent. The signs that occur most frequently at the
beginning of the medial segment will thus seem to form pairs with the most
frequent prefix constant (i.e., the final sign in the prefix). The signs that occur most frequently at the
end of the medial segment will similarly seem to form pairs with the most
frequent terminals. Now, how does one
determine whether or not this type of – perhaps fortuitous – occurrence is
significant? Following the method of
Sinha et al, one constructs a control group of fake sequences in which signs
occur in random order. Then one points
to the differences between the set of random sequences and the real sequence,
concluding that this difference indicates syntactical organization. I suppose I am suggesting that there is
organization all right, but not necessarily any syntax behind that organization.
The main source of my discomfort with the study’s
conclusions lies in the final section where the authors construct segmentation
trees for the longest 33 inscriptions, each containing 10 or more signs. The second tree they show is M-38 (see illustration above): 2 POSTS / BLANKET / MAN HOLDING DEE-SLASH / TRI-FORK / BI-QUOTES / FISH UNDER CHEVRON / WHISKERED FISH / 2 POSTS / FISH / PRAWN / ZEE / CROSSROADS EX / POT. They find a strong
sign pair to be 2 POSTS + FISH and another, equally strong, is PRAWN + ZEE. So far, I would agree.
Sinha et al find a third pair that I find problematic:
FISH UNDER CHEVRON + WHISKERED FISH. Further, they link this pair fairly strongly with the preceding BI-QUOTES, forming a triad. In Korvink's analysis, BI-QUOTES belongs with the first four symbols, all five together forming the prefix. Korvink also notes that the various "fish" signs tend to occur in a semi-standardized order (2008: 37), with FISH UNDER CHEVRON typically the first of any series, followed by FISH, WHISKERED FISH, DOT IN FISH, MARKED FISH in that order (though no inscription contains all of these). He also notes the pair 2 POSTS + FISH, which may precede the otherwise standard sequence (2008: 38). In M-38, though, this pair follows the other "fish."
Sinha et al further show a strong link between the pair PRAWN + ZEE on one side and the following sign, CROSSROADS EX (2010: 11). This seems clear enough. But then they link this triad to the final symbol, the terminal POT. This link with POT is stronger, in their view, than that between the symbols preceding BI-QUOTES, all elements of the prefix according to Korvink's analysis. Thus, Sinha et al would divide the prefix between an initial "phrase" of four signs on one side, and the prefix constant along with the first two elements of the medial segment on the other side. I find this segmentation tree -- as well as the other two shown -- to be unconvincing.
The authors’ description of their first example (with my sign names substituted for
their numerical designations) reads thus:
The 3-sign cluster “LOOP ARMED MAN-DOUBLE
BOATS-SPEAR” at the beginning of the sequence is the initial phrase, and is
separated from the rest by sign SQUARE AY.
The medial sequence is broken into two parts “VEE IN DIAMOND-BI-QUOTES-FISH
UNDER CHEVRON” and “DOT IN FISH-CUPPED POST-3 POSTS.” The sequence ends with the terminal phrase “2
POSTS-FAT EX IN DIAMOND-POT.” We observe
that each of these four sub-sequences obtained by this analysis also occur as units in other inscriptions in the
WUCS dataset, thereby verifying the accuracy of the segmentation procedure (2010: 11, emphasis added).
Let us examine this type of verification, seeking the sign pairs
in the corpus that Sinha et al designate. I find 10 examples
of 2 POSTS + BLANKET in the concordance of Koskenniemi and Parpola (1982: 101-102). In two of these other than the cited sequence, BLANKET is followed by the MAN HOLDING DEE-SLASH (KP2027, M-119, M-1188). There are other examples of this "man" followed immediately by one or the other FORK (14 instances in KP, 1982: 33). Five of these place the FORK just before BI-QUOTES. Interestingly, other instances have this apparent pair at or near the end of the inscription rather than in the prefix. Since BI-QUOTES is one of the most frequent signs, it is not surprising that it often occurs immediately preceding FISH UNDER CHEVRON. But if this prefix constant is joined with the pair, FISH UNDER CHEVRON + WHISKERED FISH, then the occurrences are fewer (6 examples in KP, 1982: 45-46). There are multiple examples of 2 POSTS + FISH (66 examples in KP, 1982: 99-100). Equally compelling is the link between PRAWN + ZEE (45 examples in KP, 1982: 53). Of these 45 pairs, many but not all are followed by CROSSROADS EX (27 examples). Two of these include another sign or two between CROSSROADS EX and the terminal (POT in each case). In two inscriptions, the sign following CROSSROADS EX is either missing or illegible. Thus, I find Korvink's description more convincing as it separates the terminal from the medial signs. It seems to me that the "link" between CROSSROADS EX and POT is
essentially fortuitous and due to the fact that POT is the most frequent sign by far.
In conclusion, I am not any more impressed with this article
than I was by the work of Yadav et al (2008: 39-52). Sinha et al, like the previous team, demonstrate that there are
some patterns in the sequences of Indus signs.
But neither group seems to have developed a way to determine which of these
patterns are likely to be syntactic and which are the result of other characteristics,
such as frequency. That is, if one finds that feature X correlates with feature Y, this may indicate a direct link between the two. On the other hand, it may only be a clue that some other (unobserved) feature Z directly links with both and thus the apparent link between X and Y is really an indirect one.
The conclusion of Sinha et al is tempered by their recognition
that there is still no generally accepted sign list; where Mahadevan lists 417
signs, Wells enumerates over 700 (2010: 12).
They state that “an important open issue that needs to be settled is the
robustness of these results with respect to the sign list being used” (loc.
cit.). Part of this open issue can
easily be settled, though. For example,
among the “beginners” that appear only in initial position are some that would
not be included if Wells had not classified them as independent signs rather
than variants. For example, Wells always
distinguishes multiples of a sign from the same sign’s individual occurrences:
DOUBLE SKEWERED CHEVRONS, QUADRUPLE TRI-FORK, DOUBLE CHEVRONS, DOUBLE EF TOPPED
EXITS, DOUBLE CUPS, TRIPLE BOATS. Each
of these signs, when found individually, occurs in non-initial positions. Wells also distinguishes reversed signs from
their mirror images, something not all other list-makers do (unless the
reversed sign appears in an inscription along with its mirror image). If we remove both of these types of examples
and remove all the rare signs (those that occur only once or twice in the
corpus), then we are left with a much smaller inventory.
In fact, if we require a sign to occur at least five times –
which is still far too rare for meaningful statistics – the list of “beginners”
would be significantly reduced. If this reduced list were to be compared to sign lists of other researchers, even fewer would remain. For example, Wells
has ODD STACKED 7 appearing in initial position only. But there are non-initial examples in the KP
list where all variants of ODD STACKED are conflated (1982: 96). MAN WITH WING might remain on the list of
beginners, but KP show one instance where this sign is final (M-1054). This fact is not noted by Sinha et al because they analyze only complete, unbroken inscriptions and M-1054 shows an initial sign to have been broken off. CAGED MAN HOLDING POST is a singleton in KP’s
concordance although Wells gives it a frequency of 5 (1982: 33). FIGURE EIGHT WITH LADDER indeed appears only
initially, but the similar BEE WITH SLASH, possibly a variant, is medial (1982:
56). The BED is only a beginner if one
separates the variants according to the number of legs; the one with 6 legs occurs
initially as Sinha et al observe, but other variants appear both medially and finally (1982: 148). SKEWERED CIRCLE is probably a variant of
SKEWERED DONUT, which appears medially in 3 inscriptions (1982: 70).
SHISH KEBAB IN TRIANGLE is almost certainly a
variant of BISECTED STRIPED TRIANGLE (1982: 22). The latter sign occurs at least as often in
medial position as initially. Besides this fact, KEBAB IN TRIANGLE only occurs
in a single inscription that is duplicated multiple times on tablets, as far as
I can tell. The duplicate inscriptions should be removed from the frequency count per Sinha et al's stated procedure (2010: 3). STRIPED BOWTIE WITH EXTRA
RIBBON occurs once in medial position (Kd-8).
REVERSED FOOTED STOOL is non-initial at least 6 times (Koskenniemi and
Parpola 1982: 131). BLANKET has a number
of variants with differing numbers and arrangements of inner strokes or “ticks.”
Wells’ version with 8 “ticks” may well
be a beginner, but other variants certainly occur medially and even once in
final position (1982: 138-140). Wells
divides ASTERISK UNDER TABLE into two signs, one with an eight-legged star
resembling the keyboard asterisk, the other showing the “X” form with extra
strokes only between the upper and lower “legs,” not across the middle. Presumably the KP list subsumes the two into
one group with all in initial position (1982: 132). The similar EX UNDER TABLE is always final
except for one instance which is initial – perhaps this one instance was an
error for ASTERISK UNDER TABLE! The
version of POTTED 3 with slashes cross the sign may be initial, but other
variations are medial (1982: 178). Two
examples with slashes appear in what may be final position on a broken seal
(H-689). CRAB IN STRIPED LEAF TOPPED POT
is another sign that is most likely a variant of several others, one of which
occurs medially (L-11). VEE &
TRI-FORK IN DIAMOND appears in both medial and final position according to KP
(1982: 201).
It is rather tedious to examine each symbol in this way, but
the point should be clear. If
researchers use the KP list, the same statistical measures would yield results
that differ from those reported for Sinha et al using Wells’ list. Lest the reader think my objection applies
only to beginners, I will quickly note the enders as well. If we remove signs occurring fewer than 5
times, this group would then be reduced to PANTS (one variant only since the KP
list, which groups together all variants, shows 2 examples in initial
position), EF PRONGED CHEVRON UPON STACKED 6 IN POT (which KP show as occurring
in only 2 distinct inscriptions), CAGED SLASHES IN OVERLAPPING CIRCLES,
SEPTUPLE STACKED ROOFS (which KP show as a singleton), and RAINY CARTWHEEL.
The foundation figure of Lugalkisalsi, bearing inscription (after Aruz 2003: 65). |
There is good reason for defining writing narrowly in this
way, since not all communication systems that use visible marks qualify as
writing even to the most generous. Gelb
gives examples of such, including carvings and paintings on stone such as those
in caves at Lascaux, petroglyphs in Africa and the Americas, mason’s marks from
Anatolia, the Nsibidi system used in Nigeria (similar to the Adinkra symbols I
referred to in some earlier posts), and so on.
We could also include Navaho sand paintings. They are visible and they communicate – at
least to those who know enough of the Navaho culture to interpret them. But they are not writing and not intended to
be writing. In a similar way, we can
point to symbols in our own culture that are not part of a writing system – the
conventional hearts that appear on Valentines, the cross, star of David, and
star and crescent that represent major religious faiths, the icons on computer
screens, some simple depictions found on traffic signs, etc. These communicate and they are symbols. But they are not writing.
When scholars study proto-cuneiform, they may not always
distinguish between the semasiographical stage, or proto-writing, on the one
hand and the phonetic stage on the other.
After all, the symbols gradually evolved from proto-writing into true
writing. The same may be true of
Egyptian hieroglyphs – the earliest symbols found on pre-dynastic tags mostly
evolved into later glyphs with specific phonetic realizations. But not all early writing systems continued
to evolve in this way. We know, for
example, that proto-Elamite failed to evolve into true writing, dying out
instead. After a lapse of centuries, it
was replaced by Linear Elamite, which is phonetic writing. Similarly, the Indus script did not evolve
continuously into a fully developed writing system used later on. Instead, it died out and after a lapse of
several centuries was replaced by clearly phonetic writing.
The authors note another problem, namely the fact that the
inscriptions are so short (2010: 12).
They want to counter this by claiming that “many early writing systems
exhibit such brevity,” citing early Sumerian and early Egyptian. Early proto-cuneiform (if that is what they
mean by early Sumerian) does include very brief texts. Oddly enough, though, even in proto-cuneiform
there are some that are longer than any Indus inscription (see tablets dating
to Uruk III phase in Nissen, Damerow and Englund 1993: 22-23). For example, a single tablet listing pigs
bears 30 or so sections, each containing a single wedge (here functioning
rather like the bullets in a modern bulleted list) and one or more
non-numerical symbols (fig. 26). And
this is just on one side! Even if we
count symbols occurring on multiple sides of objects from the Indus Valley,
there are no inscriptions with 30 signs.
So, while some brief texts are understandable in early proto-cuneiform
economic documents, others are longer without necessarily containing phonetic
writing. The same is probably true of
Egyptian hieroglyphs, as noted previously.
Perhaps early (proto) writing is always brief, the authors suggest, because “the main use of writing was as a mean[s] of maintaining accounts, list and other economic records” (loc. cit.). But this is mainly true in the Near East. In China, the oracle bones record divinations rather than economic transactions. Among the Maya, early writing records astronomical and ritual information. Mesoamerican proto-writing such as that used by the Mixtecs includes symbols for dates – which include numerals – and the names of persons. There are also semi-conventionalized methods for indicating specific places and few other items of information. The earliest Egyptianglyphs are also mainly indications of specific persons and places. Thus, writing does not always serves first as an economic account and therefore we cannot assume that this is the reason for the brevity of Indus texts.
In one way, it does not matter what definition one
uses. One can still study a “script”
even though it is not fully developed writing.
Several scholars now focus mainly on proto-cuneiform or proto-Elamite,
for example, and they have been able to make considerable progress in
interpreting the early, enigmatic accounts.
They do not try to excuse the difficulties of their documents by
changing definitions to suit them. Neither
should scholars of the Indus script. It
is whatever it is because the people who once used it found it sufficient for
their purposes. In the end, then, is there syntactic organization in the Indus script? I am not convinced we know, although the signs are not completely random.
REFERENCES
Aruz, J. 2003. Art of the First Cities: The Third Millennium BC from the Mediterranean to the Indus. New York: Metropolitan Museum of Art and Yale University Press.
Dahl, J.L. 2002. “Proto-Elamite Sign Frequencies” in
Cuneiform Digital Library Bulletin 1: 1-3.
Available online at http://cdli.ucla.edu/pubs/cdlb/2002/001.html
Englund, R. 2001. “The State of Decipherment of
Proto-Elamite” available online at http://cdli.ucla.edu
, to be published in S. Houston, ed., First
Writing. Cambridge: Cambridge University Press.Gelb, I.J. 1963. A Study of Writing. Chicago: University of Chicago Press.
Korvink, M.P. 2008. The
Indus Script: A Positional Statistical Approach. Gilund Press.
Koskenniemi, K. and A. Parpola. 1982. A Concordance to the Texts in the Indus Script. Helsinki: Department of Asian and African Studies, University of Helsinki.
Nissen, H.J., P. Damerow, R.K. Englund. 1993. Archaic Bookkeeping: Early Writing and
Techniques of Economic Administration in the Ancient Near East. Paul
Larsen, transl. Chicago and London: University of Chicago Press.
O’Connor, D. 2009. Abydos:
Egypt’s First Pharaohs and the Cult of Osiris. London: Thames & Hudson.
Pinch, G. 2006 and 1994. Magic in Ancient Egypt. London: The British Museum.
Possehl, G.L. 1996. Indus
Age: The Writing System. Philadelphia: University of Pennsylvania.
Sinha, S., A.M. Izhar, R.K. Pan, B.K. Wells. 2010. “Network
analysis of a corpus of undeciphered Indus civilization inscriptions indicates
syntactic organization” available online at arXiv:1005.4997v1
Wells, B.K. 2010. Epigraphic
Approaches to Indus Writing. Cambridge, MA: Oxbow Press.Wieger, Dr. L. 1965. Chinese Characters: Their Origin, Etymology, History, Classification and Signification. A Thorough Study from Chinese Documents. New York: Paragon and Dover (reprint of 1927 edition by Catholic Mission Press, originally printed 1915).
Yadav, N., M.N. Vahia, I. Mahadevan, H. Jogelkar. 2008. “A statistical approach for pattern search in Indus writing” in International Journal of Dravidian Linguistics, 37 (1): 39-52.