This post is a discussion of an article published last year by S.Sinha, A.M. Izhar, R.K. Pan, and B.K. Wells (2010: 1-17). The title of their work indicates they have performed network analysis of Indus inscriptions and found syntactic organization. My first reaction to the title was that such “syntactic organization” had been found previously, on more than one occasion. G.L. Possehl describes a number of analyses that made such claims (1996). G.R. Hunter as early as 1929 noted the function of the POT and COMB as final elements – or terminal signs to use M. Korvink’s terminology (Possehl 1996: 85). Similar conclusions come from the work of Father H. Heras (p. 110-112), a Soviet Russian team of scholars headed by Yu. V. Knorozov (p. 115-122), a Finnish team including A. Parpola (p. 124), and others. M. Korvink’s analysis, which I have referred to repeatedly, came after the publication of Possehl’s review, but also notes such organization (2008).
In their preliminary discussion, Sinha et al note that the ancestral scripts that later became Mesopotamian cuneiform, Egyptian hieroglyphs, and Mayan writing had limited connections to spoken language (2010: 2). They state, “In almost all cases, a writing system became more or less capable of expressing spoken language only after centuries of development....” This is certainly correct for cuneiform and hieroglyphs (my knowledge of Mayan writing is too limited to assess the accuracy of the statement). However, the earliest hieroglyphs may not be true writing, but rather proto-writing (O’Connor 2009: 144-147). Proto-cuneiform, like proto-Elamite, is also considered not closely tied to speech (Englund 2001: 1-43). The first tablets contain only numerical symbols, at the second stage containing ideographs as well, representing “owners” (people and/or institutions) and commodities. When speaking of a symbol system that eventually evolved into a fully developed writing system, it is common to extend the term “writing” to the earliest examples of the script. But that does not really demonstrate that the earliest inscriptions are indeed true writing. Thus, the fact that the Indus script shares characteristics of the earliest proto-cuneiform, proto-Elamite, and hieroglyphic scripts may well suggest that the Indus script itself is proto-writing rather than a fully developed writing system (as noted in several previous posts).
That caveat aside, the analysis of Sinha et al is quite interesting. They begin by defining the corpus they use which is based on Wells’ database (2010). This includes the inscriptions from all three volumes of the corpus, as well as some unpublished data (Sinha et al 2010: 3). These inscriptions come from 3,896 artifacts, from which a subset of 2,393 are selected as being complete, unbroken, and legible. From this a further subset is taken for analysis, comprising inscriptions that occur on a single line. The selected sequences contain from 1 to 13 signs, with a median length of 4 signs. As a comparison set, the authors created randomized variations on each inscription, in which the signs were combined in every possible order, with statistical measures averaged. If the signs of the Indus script are arranged purely randomly, the statistics from the comparison set would match the statistics from the actual corpus. The fact that results generally differ between the two sets is the evidence the authors cite for syntactic organization.
To begin with, the authors note 21 signs that only occur individually, as 1-sign inscriptions (2010: 4). They then identify 128 signs that only occur at the beginning of an inscription, a group they term “beginners.” The third group of 43 signs, which appears only at the end of inscriptions, is referred to as “enders.” This leaves 401 signs, some of which sometimes occur at the beginning, some of which sometimes occur at the end, but all of which also occur in other positions. Out of this large group, the authors isolate 127 signs that are never initial or final. By comparison, the randomized set shows very different beginners and enders, far fewer beginners but considerably more enders. “We observe that four beginners in the empirical set...and two enders...never occur as beginners and enders (respectively) in the 100 randomized trials we carried out. It implies that the occurrence of these signs always at the beginning or end of a sequence (and never in any other position) may be highly significant, and certainly not a result of simple chance” (2010: 5).
One troubling feature of the three lists and of the conclusion drawn from this analysis is the large number of singletons. As noted in previous posts, one of the striking characteristics shared by proto-cuneiform, proto-Elamite, and the Indus script is the very large proportion of signs that occur only once in the corpus. Of approximately 1,900 proto-Elamite signs (not including numerals), over 1,000 are singletons (Dahl 2002: 1-2). Around 300 additional signs occur only twice, while the vast majority – 1,700 in all – appear fewer than 10 times. This leaves a relatively small core of 200 “common” signs, out of which just 20 occur 100 or more times. What effect does this characteristic have on the three lists provided by Sinha et al? As it happens, of 21 signs that occur solo, 10 are singletons in the entire corpus and only one appears 9 times. Thus, all are rare. In addition, of the 128 “beginners,” 66 are singletons. A similar proportion are singletons among the “enders” – 26 out of 43 signs. It seems misleading to say that a singleton is found only at the beginning or end of an inscription. If it is a singleton, it only occurs once, which is not enough to establish a rule!
After comparing the number of sequences begun or ended by these signs, the authors observe that “most of the beginner signs appear in only 1-3 distinct sequences” with a maximum of 8 sequences, while “a particular ender sign can appear at the end of a maximum of 4 distinct inscriptions” (2010: 5). With such small numbers, the results may not be random but they do not seem particularly robust either.
Sinha et al also look at the distribution of the number of links each sign has (its degree) and the strength of these links (2010: 5-6). The authors describe their results concerning the degree as indicating that “the variation of sign relations is much more restricted in the empirical sequences than would be the case had each of the sequences been put together randomly” (2010: 6). Their graph indeed shows a difference of degree between the real or empirical set and the randomized set, but that difference does not appear particularly great. When it comes to strength, the authors must agree that the results are almost the same between the two data sets, inasmuch as “this is governed by the frequency of occurrence of individual signs” (2010: 6). Clearly, a common sign will have links with more signs than will a rare sign, whether or not their arrangements are random.
Thus, the 10 most frequent signs in the empirical set have the most links. These signs include POT, BI-QUOTES, FISH, 2 POSTS, COMB, WHISKERED FISH, SPEAR, TRI-FORK, 3 POSTS, and VEE IN DIAMOND. Arrows in one of the figures show in which direction the links occur among these signs. For example, VEE IN DIAMOND precedes but does not follow POT; POT both precedes and follows WHISKERED FISH and COMB; FISH precedes but does not follow SPEAR, while SPEAR precedes but does not follow 2 POSTS. The profusion of arrows is hard to interpret, but the description summarizes the situation: “about half of all potential connections [are] present” (2010: 6).
When Sinha et al discuss medial signs, they seem to be on firmer ground. The intersection of two sets of signs – those with the most connections with preceding signs and those with the most connections with following signs – yields an inner core of 26 signs most likely to appear in medial position (2010: 7). Of these, only 5 occur less than 100 times apiece, so the statistical measures are more convincingly robust. Further statistical manipulations of the data yield a group of 35 signs forming the “core lexicon” (2010: 7-8). They include medial signs: SINGLE QUOTE, BI-QUOTES, 3 QUOTES, 3 POSTS, WINGED MAN, HAIRY HUNCHBACK, FISH, MAKRED FISH, FISH UNDER CHEVRON, WHISKERED FISH, TRI-FORK, RAKE, MALLET, CUPPED POST, POT, CRAB, CIRCLED FORK. They also include signs that are most likely to precede this medial group: SINGLE POST, PINCH, CROSSROADS EX, POTTED ONE, TRI-FORK TOPPED POT, CIRCLED E-FORK, CIRCLED VEE, CARTWHEEL, OVERLAPPING CIRCLES, VEE IN DIAMOND, BOAT (and three others occurring slightly fewer than 100 times each).
The authors then focus on determining which signs occur in pairs by comparing the probability of a given pair’s occurrence in the empirical data versus the probability in the randomized data set (2010: 9). They find some groupings that Korvink and others identified previously: FOOTED STOOL + PINWHEEL; MALLET + FORK; CUPPED SPOON (or CUPPED POST) + 3 POSTS; 3 POSTS + SPEAR; STACKED 7 + EF TOPPED EXIT; 2 POSTS + FISH; PRAWN + ZEE; ZEE + CROSSROADS EX; CROSSROADS EX + POT; TRI-FORK TOPPED POT + POT; POT + MAN; POT + COMB; CIRCLED VEE + BI-QUOTES; VEE IN DIAMOND + BI-QUOTES; CARTWHEEL + BI-QUOTES; CARTWHEEL + PINCH; BOAT + PINCH; FAT EX + PINCH. They also find some groupings that seem unintuitive by Korvink’s analysis, including BI-QUOTES + CIRCLED FORK; BI-QUOTES + WHISKERED FISH; BI-QUOTES + FISH UNDER CHEVRON; FAT CEE + POT.
Here I have underlined the signs identified as terminals by Korvink and placed his prefix elements in italics. Thus, the reader can see that where Korvink divides the inscriptions into three segments – prefix, medial portion, and terminal – Sinha et al simply note which combinations are most frequent. The signs that occur most frequently at the beginning of the medial segment will thus seem to form pairs with the most frequent prefix constant (i.e., the final sign in the prefix). The signs that occur most frequently at the end of the medial segment will similarly seem to form pairs with the most frequent terminals. Now, how does one determine whether or not this type of – perhaps fortuitous – occurrence is significant? Following the method of Sinha et al, one constructs a control group of fake sequences in which signs occur in random order. Then one points to the differences between the set of random sequences and the real sequence, concluding that this difference indicates syntactical organization. I suppose I am suggesting that there is organization all right, but not necessarily any syntax behind that organization.
The main source of my discomfort with the study’s conclusions lies in the final section where the authors construct segmentation trees for the longest 33 inscriptions, each containing 10 or more signs. The second tree they show is M-38 (see illustration above): 2 POSTS / BLANKET / MAN HOLDING DEE-SLASH / TRI-FORK / BI-QUOTES / FISH UNDER CHEVRON / WHISKERED FISH / 2 POSTS / FISH / PRAWN / ZEE / CROSSROADS EX / POT. They find a strong sign pair to be 2 POSTS + FISH and another, equally strong, is PRAWN + ZEE. So far, I would agree.
Sinha et al find a third pair that I find problematic: FISH UNDER CHEVRON + WHISKERED FISH. Further, they link this pair fairly strongly with the preceding BI-QUOTES, forming a triad. In Korvink's analysis, BI-QUOTES belongs with the first four symbols, all five together forming the prefix. Korvink also notes that the various "fish" signs tend to occur in a semi-standardized order (2008: 37), with FISH UNDER CHEVRON typically the first of any series, followed by FISH, WHISKERED FISH, DOT IN FISH, MARKED FISH in that order (though no inscription contains all of these). He also notes the pair 2 POSTS + FISH, which may precede the otherwise standard sequence (2008: 38). In M-38, though, this pair follows the other "fish."
Sinha et al further show a strong link between the pair PRAWN + ZEE on one side and the following sign, CROSSROADS EX (2010: 11). This seems clear enough. But then they link this triad to the final symbol, the terminal POT. This link with POT is stronger, in their view, than that between the symbols preceding BI-QUOTES, all elements of the prefix according to Korvink's analysis. Thus, Sinha et al would divide the prefix between an initial "phrase" of four signs on one side, and the prefix constant along with the first two elements of the medial segment on the other side. I find this segmentation tree -- as well as the other two shown -- to be unconvincing.
The authors’ description of their first example (with my sign names substituted for their numerical designations) reads thus:
The 3-sign cluster “LOOP ARMED MAN-DOUBLE BOATS-SPEAR” at the beginning of the sequence is the initial phrase, and is separated from the rest by sign SQUARE AY. The medial sequence is broken into two parts “VEE IN DIAMOND-BI-QUOTES-FISH UNDER CHEVRON” and “DOT IN FISH-CUPPED POST-3 POSTS.” The sequence ends with the terminal phrase “2 POSTS-FAT EX IN DIAMOND-POT.” We observe that each of these four sub-sequences obtained by this analysis also occur as units in other inscriptions in the WUCS dataset, thereby verifying the accuracy of the segmentation procedure (2010: 11, emphasis added).
Let us examine this type of verification, seeking the sign pairs in the corpus that Sinha et al designate. I find 10 examples of 2 POSTS + BLANKET in the concordance of Koskenniemi and Parpola (1982: 101-102). In two of these other than the cited sequence, BLANKET is followed by the MAN HOLDING DEE-SLASH (KP2027, M-119, M-1188). There are other examples of this "man" followed immediately by one or the other FORK (14 instances in KP, 1982: 33). Five of these place the FORK just before BI-QUOTES. Interestingly, other instances have this apparent pair at or near the end of the inscription rather than in the prefix. Since BI-QUOTES is one of the most frequent signs, it is not surprising that it often occurs immediately preceding FISH UNDER CHEVRON. But if this prefix constant is joined with the pair, FISH UNDER CHEVRON + WHISKERED FISH, then the occurrences are fewer (6 examples in KP, 1982: 45-46). There are multiple examples of 2 POSTS + FISH (66 examples in KP, 1982: 99-100). Equally compelling is the link between PRAWN + ZEE (45 examples in KP, 1982: 53). Of these 45 pairs, many but not all are followed by CROSSROADS EX (27 examples). Two of these include another sign or two between CROSSROADS EX and the terminal (POT in each case). In two inscriptions, the sign following CROSSROADS EX is either missing or illegible. Thus, I find Korvink's description more convincing as it separates the terminal from the medial signs. It seems to me that the "link" between CROSSROADS EX and POT is essentially fortuitous and due to the fact that POT is the most frequent sign by far.
In conclusion, I am not any more impressed with this article than I was by the work of Yadav et al (2008: 39-52). Sinha et al, like the previous team, demonstrate that there are some patterns in the sequences of Indus signs. But neither group seems to have developed a way to determine which of these patterns are likely to be syntactic and which are the result of other characteristics, such as frequency. That is, if one finds that feature X correlates with feature Y, this may indicate a direct link between the two. On the other hand, it may only be a clue that some other (unobserved) feature Z directly links with both and thus the apparent link between X and Y is really an indirect one.
The conclusion of Sinha et al is tempered by their recognition that there is still no generally accepted sign list; where Mahadevan lists 417 signs, Wells enumerates over 700 (2010: 12). They state that “an important open issue that needs to be settled is the robustness of these results with respect to the sign list being used” (loc. cit.). Part of this open issue can easily be settled, though. For example, among the “beginners” that appear only in initial position are some that would not be included if Wells had not classified them as independent signs rather than variants. For example, Wells always distinguishes multiples of a sign from the same sign’s individual occurrences: DOUBLE SKEWERED CHEVRONS, QUADRUPLE TRI-FORK, DOUBLE CHEVRONS, DOUBLE EF TOPPED EXITS, DOUBLE CUPS, TRIPLE BOATS. Each of these signs, when found individually, occurs in non-initial positions. Wells also distinguishes reversed signs from their mirror images, something not all other list-makers do (unless the reversed sign appears in an inscription along with its mirror image). If we remove both of these types of examples and remove all the rare signs (those that occur only once or twice in the corpus), then we are left with a much smaller inventory.
In fact, if we require a sign to occur at least five times – which is still far too rare for meaningful statistics – the list of “beginners” would be significantly reduced. If this reduced list were to be compared to sign lists of other researchers, even fewer would remain. For example, Wells has ODD STACKED 7 appearing in initial position only. But there are non-initial examples in the KP list where all variants of ODD STACKED are conflated (1982: 96). MAN WITH WING might remain on the list of beginners, but KP show one instance where this sign is final (M-1054). This fact is not noted by Sinha et al because they analyze only complete, unbroken inscriptions and M-1054 shows an initial sign to have been broken off. CAGED MAN HOLDING POST is a singleton in KP’s concordance although Wells gives it a frequency of 5 (1982: 33). FIGURE EIGHT WITH LADDER indeed appears only initially, but the similar BEE WITH SLASH, possibly a variant, is medial (1982: 56). The BED is only a beginner if one separates the variants according to the number of legs; the one with 6 legs occurs initially as Sinha et al observe, but other variants appear both medially and finally (1982: 148). SKEWERED CIRCLE is probably a variant of SKEWERED DONUT, which appears medially in 3 inscriptions (1982: 70).
SHISH KEBAB IN TRIANGLE is almost certainly a variant of BISECTED STRIPED TRIANGLE (1982: 22). The latter sign occurs at least as often in medial position as initially. Besides this fact, KEBAB IN TRIANGLE only occurs in a single inscription that is duplicated multiple times on tablets, as far as I can tell. The duplicate inscriptions should be removed from the frequency count per Sinha et al's stated procedure (2010: 3). STRIPED BOWTIE WITH EXTRA RIBBON occurs once in medial position (Kd-8). REVERSED FOOTED STOOL is non-initial at least 6 times (Koskenniemi and Parpola 1982: 131). BLANKET has a number of variants with differing numbers and arrangements of inner strokes or “ticks.” Wells’ version with 8 “ticks” may well be a beginner, but other variants certainly occur medially and even once in final position (1982: 138-140). Wells divides ASTERISK UNDER TABLE into two signs, one with an eight-legged star resembling the keyboard asterisk, the other showing the “X” form with extra strokes only between the upper and lower “legs,” not across the middle. Presumably the KP list subsumes the two into one group with all in initial position (1982: 132). The similar EX UNDER TABLE is always final except for one instance which is initial – perhaps this one instance was an error for ASTERISK UNDER TABLE! The version of POTTED 3 with slashes cross the sign may be initial, but other variations are medial (1982: 178). Two examples with slashes appear in what may be final position on a broken seal (H-689). CRAB IN STRIPED LEAF TOPPED POT is another sign that is most likely a variant of several others, one of which occurs medially (L-11). VEE & TRI-FORK IN DIAMOND appears in both medial and final position according to KP (1982: 201).
It is rather tedious to examine each symbol in this way, but the point should be clear. If researchers use the KP list, the same statistical measures would yield results that differ from those reported for Sinha et al using Wells’ list. Lest the reader think my objection applies only to beginners, I will quickly note the enders as well. If we remove signs occurring fewer than 5 times, this group would then be reduced to PANTS (one variant only since the KP list, which groups together all variants, shows 2 examples in initial position), EF PRONGED CHEVRON UPON STACKED 6 IN POT (which KP show as occurring in only 2 distinct inscriptions), CAGED SLASHES IN OVERLAPPING CIRCLES, SEPTUPLE STACKED ROOFS (which KP show as a singleton), and RAINY CARTWHEEL.
|The foundation figure of Lugalkisalsi, bearing inscription (after Aruz 2003: 65).|
There is good reason for defining writing narrowly in this way, since not all communication systems that use visible marks qualify as writing even to the most generous. Gelb gives examples of such, including carvings and paintings on stone such as those in caves at Lascaux, petroglyphs in Africa and the Americas, mason’s marks from Anatolia, the Nsibidi system used in Nigeria (similar to the Adinkra symbols I referred to in some earlier posts), and so on. We could also include Navaho sand paintings. They are visible and they communicate – at least to those who know enough of the Navaho culture to interpret them. But they are not writing and not intended to be writing. In a similar way, we can point to symbols in our own culture that are not part of a writing system – the conventional hearts that appear on Valentines, the cross, star of David, and star and crescent that represent major religious faiths, the icons on computer screens, some simple depictions found on traffic signs, etc. These communicate and they are symbols. But they are not writing.
When scholars study proto-cuneiform, they may not always distinguish between the semasiographical stage, or proto-writing, on the one hand and the phonetic stage on the other. After all, the symbols gradually evolved from proto-writing into true writing. The same may be true of Egyptian hieroglyphs – the earliest symbols found on pre-dynastic tags mostly evolved into later glyphs with specific phonetic realizations. But not all early writing systems continued to evolve in this way. We know, for example, that proto-Elamite failed to evolve into true writing, dying out instead. After a lapse of centuries, it was replaced by Linear Elamite, which is phonetic writing. Similarly, the Indus script did not evolve continuously into a fully developed writing system used later on. Instead, it died out and after a lapse of several centuries was replaced by clearly phonetic writing.
The authors note another problem, namely the fact that the inscriptions are so short (2010: 12). They want to counter this by claiming that “many early writing systems exhibit such brevity,” citing early Sumerian and early Egyptian. Early proto-cuneiform (if that is what they mean by early Sumerian) does include very brief texts. Oddly enough, though, even in proto-cuneiform there are some that are longer than any Indus inscription (see tablets dating to Uruk III phase in Nissen, Damerow and Englund 1993: 22-23). For example, a single tablet listing pigs bears 30 or so sections, each containing a single wedge (here functioning rather like the bullets in a modern bulleted list) and one or more non-numerical symbols (fig. 26). And this is just on one side! Even if we count symbols occurring on multiple sides of objects from the Indus Valley, there are no inscriptions with 30 signs. So, while some brief texts are understandable in early proto-cuneiform economic documents, others are longer without necessarily containing phonetic writing. The same is probably true of Egyptian hieroglyphs, as noted previously.
Perhaps early (proto) writing is always brief, the authors suggest, because “the main use of writing was as a mean[s] of maintaining accounts, list and other economic records” (loc. cit.). But this is mainly true in the Near East. In China, the oracle bones record divinations rather than economic transactions. Among the Maya, early writing records astronomical and ritual information. Mesoamerican proto-writing such as that used by the Mixtecs includes symbols for dates – which include numerals – and the names of persons. There are also semi-conventionalized methods for indicating specific places and few other items of information. The earliest Egyptianglyphs are also mainly indications of specific persons and places. Thus, writing does not always serves first as an economic account and therefore we cannot assume that this is the reason for the brevity of Indus texts.
In one way, it does not matter what definition one uses. One can still study a “script” even though it is not fully developed writing. Several scholars now focus mainly on proto-cuneiform or proto-Elamite, for example, and they have been able to make considerable progress in interpreting the early, enigmatic accounts. They do not try to excuse the difficulties of their documents by changing definitions to suit them. Neither should scholars of the Indus script. It is whatever it is because the people who once used it found it sufficient for their purposes. In the end, then, is there syntactic organization in the Indus script? I am not convinced we know, although the signs are not completely random.
Aruz, J. 2003. Art of the First Cities: The Third Millennium BC from the Mediterranean to the Indus. New York: Metropolitan Museum of Art and Yale University Press.
Dahl, J.L. 2002. “Proto-Elamite Sign Frequencies” in Cuneiform Digital Library Bulletin 1: 1-3. Available online at http://cdli.ucla.edu/pubs/cdlb/2002/001.htmlEnglund, R. 2001. “The State of Decipherment of Proto-Elamite” available online at http://cdli.ucla.edu , to be published in S. Houston, ed., First Writing. Cambridge: Cambridge University Press.
Gelb, I.J. 1963. A Study of Writing. Chicago: University of Chicago Press.
Korvink, M.P. 2008. The Indus Script: A Positional Statistical Approach. Gilund Press.
Koskenniemi, K. and A. Parpola. 1982. A Concordance to the Texts in the Indus Script. Helsinki: Department of Asian and African Studies, University of Helsinki.Nissen, H.J., P. Damerow, R.K. Englund. 1993. Archaic Bookkeeping: Early Writing and Techniques of Economic Administration in the Ancient Near East. Paul Larsen, transl. Chicago and London: University of Chicago Press.
O’Connor, D. 2009. Abydos: Egypt’s First Pharaohs and the Cult of Osiris. London: Thames & Hudson.
Pinch, G. 2006 and 1994. Magic in Ancient Egypt. London: The British Museum.Possehl, G.L. 1996. Indus Age: The Writing System. Philadelphia: University of Pennsylvania.
Sinha, S., A.M. Izhar, R.K. Pan, B.K. Wells. 2010. “Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization” available online at arXiv:1005.4997v1Wells, B.K. 2010. Epigraphic Approaches to Indus Writing. Cambridge, MA: Oxbow Press.
Wieger, Dr. L. 1965. Chinese Characters: Their Origin, Etymology, History, Classification and Signification. A Thorough Study from Chinese Documents. New York: Paragon and Dover (reprint of 1927 edition by Catholic Mission Press, originally printed 1915).
Yadav, N., M.N. Vahia, I. Mahadevan, H. Jogelkar. 2008. “A statistical approach for pattern search in Indus writing” in International Journal of Dravidian Linguistics, 37 (1): 39-52.