Monday, May 7, 2012

'Phrase Structure Rules' in Indus Script

Phrase structure tree of the English sentence, "The dog bites the cat."

In English, as in other languages, speech consists of words put together into phrases, which, in turn, are put together into sentences.  (This sequence is neither the beginning nor the end of the things put together, but it is all we need for now.)  Using transformational grammar, linguists analyze sentences into their constituent phrases and the phrases into their constituent words, in the form of phrase structure trees.  To diagram the sentence, “The dog bites the cat,” I began the tree at the top with “S,” which stands for sentence.  I divide this into two constituent parts, a noun phrase (NP) on the left and a verb phrase (VP) on the right.  The initial NP is further divided into its constituents, the article “the” (Art) on the left and the noun “dog” (N) on the right.  The second element in the sentence, the VP, is also further subdivided, into the verb “bites” (V) on the left and another NP, “the cat,” on the right.  At the base, the final NP subdivides into another sequence of Art + N.  (Additional details can be included, but I chose to simplify the diagram.)

The Indus inscriptions are amenable to a similar type of analysis, although I cannot relate the various elements of the script to any specific linguistic elements.  In other words, no one yet knows whether each sign represents a phonetic element (eg., a sound, a syllable), a semantic element (e.g., a word or affix), or neither one.  Nevertheless, several researchers have demonstrated that there is some structure in the inscriptions.  The following discussion is my attempt to describe the structure using a tree similar to that found in linguistics for phrase structure.  My analysis is based upon Korvink’s statistical positional analysis of the Indus script, although he does not use the tree diagram (2008).

Korvink’s analysis demonstrates the existence of three possible subdivisions or subclasses of an inscription, which I have often discussed in previous posts:

1.       Prefix, typically in initial position, consisting of one or more signs (variables) followed by one of three “constants,” either SINGLE QUOTE, BI-QUOTES, or PINCH (in rare cases by two constants);

2.       Medial section, typically following any prefix and/or preceding any terminal, consisting of one or more signs;

3.       Terminal, typically in final position, consisting of one or two terminal signs, POT, FLANGE-TOPPED POT, FORK-TOPPED POT, COMB, MAN, BEARER, CHEVRON-HATTED BEARER, POT-HATTED BEARER, SPEAR, or PINWHEEL.

An inscription may include one, two, or all three of these segments, which I will term ‘phrases’ in the following (without implying that they are linguistic phrases).  Based upon Korvink’s discussion, I originally assumed the basic structure could be expressed in the following manner, where parentheses indicate an element that is optional:

First approximation to phrase structure tree for an Indus inscription.

This would indicate that the M phrase is always present and thus obligatory.  Before M, a P phrase optionally occurs.  After M, a T phrase optionally occurs.  This is what Korvink’s analysis suggests, providing a testable hypothesis.

If it is correct that the M phrase is obligatory, there should be no inscriptions consisting only of a P phrase or of a T phrase.  On my first pass through my own database, I did find some P-only and T-only inscriptions, although not many.  Of course, I had to eliminate from consideration all inscriptions on broken objects where another element may originally have appeared.  While this winnowing process leaves over 100 M-only inscriptions, I found just 16 examples containing only a P phrase. 
Seal B-7 with inscription containing only a P phrase: CIRCLED VEE / BI-QUOTES.

Of these 16, 10 appear on seals from Mohenjo daro, Harappa, Kalibangan, and Banawali; 3 examples plus one duplicate are on tablets from Harappa; and 3 examples appear among the graffiti on pot shards and an ivory stick.  All of the graffiti examples are dubious: either one or more symbols are close to a break or difficult to identify.  For this reason, I will ignore the graffiti category for now.  This leaves 13 inscriptions.  In these, the P phrase consists of 2 to 6 signs.  Shorter types outnumber longer ones: 8 inscriptions consist of a 2-sign P, 2 each of a 3-sign and 4-sign P, and 1 each of a 5-sign and a 6-sign P.  Thus, the length of the apparently independent ‘prefix’ is much the same as that of the P in inscriptions that include one or more other elements.
Button seal Dmd-1 with inscription containing only a T phrase: POT.

I found even fewer inscriptions containing only the T phrase, 8 in all.  Only 1 instance appears on a seal – specifically a button seal from Daimabad.  The other 7 examples occur in graffiti on pot shards and a bangle fragment.  Again, it seems wise to discard the data from graffiti for now.  With only one counter-example, I conclude that the button seal with only a POT is an anomaly, that the hypothesis that T is optional is essentially confirmed.  But I must revise my original assessment somewhat as regards M.  There are in fact a few inscriptions consisting only of P and I cannot easily explain away or discard this.  So while the M phrase is almost always present, it too is optional in some cases.  While P is clearly optional, being absent in many cases, its status appears somewhat stronger than that of T.  We will come back to this later.

For now, although my initial analysis is not entirely accurate, I will continue using it since it describes the bulk of the data.  The reason for the occasional absence of M is not at all clear at this point, but that is something to investigate further in a future post.
Detail from seal H-43 with inscription containing only an M phrase:

In examining the M-only inscriptions, I found many examples demonstrating greater variety of form than in the other two phrases.

Number of signs
in M-only inscription
No. of examples
including graffiti
No. of examples
excluding graffiti
2 (+3 Gulf seals)
1 (+2 Gulf seals)
0 (+2 Gulf seals)
159 (excluding Gulf)

After excluding such dubious examples as M-only inscriptions on broken objects (the vast majority), on graffiti, and on seals from the Persian/Arabian Gulf, my search yielded 159 good examples.  These contain between 1 and 8 signs, with most examples being at the shorter end (true of the Indus inscriptions as a whole).

Given that an inscription can be M alone – in rare cases P alone – the next step is to determine which combinations of phrases occur.  First, let us consider the possibility of repetition of a single phrase.  Since M can appear alone, if repetition is possible then we would also see MM.  Since the P occurs alone, we might also expect to see PP.  Since T essentially does not appear alone (except for dubious cases), we would not expect to see TT.  Which 2-phrase sequences do occur?

I found two possible examples of a PP phrase within a longer inscription, both of which are better analyzed as PMT.  Analysis of the first example, M-706, depends upon the identification of the initial sign.  Looking only at the relative sizes of the signs themselves, this inscription reads: SINGLE QUOTE // VEE IN DIAMOND / BI-QUOTES // WHISKERED FISH / BI-RAKE WITH ATTACHED TRI-FORK // POT.  This could be analyzed as PPMT, although that would be quite anomalous in that the variable(s) in the first P is missing.  The icon on this seal is the unicorn, however, and its long horn cramps the first sign, causing it to become smaller than the other signs.  Thus, this is better analyzed as SINGLE POST / VEE IN DIAMOND / BI-QUOTES // WHISKERED FISH / BI-RAKE WITH ATTACHED TRI-FORK // POT.  Analysis then becomes a straightforward PMT.

The second possible example is more problematic, since the inscription takes up two lines (Ns-9).  Reading both lines from left to right, I could transcribe the inscription as follows:


(Here, “X” represents an ambiguous element at the left end of the second row of signs, where the bottom of the element is lost with the broken off corner.  I am assuming this element is not a sign but the top of the tail of the tiger body of the icon.)  This would yield an anomalous analysis as PTP.  Thus, it seems better to read the second line in the opposite direction from the first line, right to left (a ways of reading termed boustrophedon, “as the ox plows”):


This provides a normal analysis of PMT (possibly with a rare appearance of two constants, one at the end of line 1, the other at the beginning of line 2).
Seal C-17 with inscription analyzable as MT (or MTT):

Since we can analyze the apparent PP sequences in these two examples in more normalized fashion, I tentatively conclude that PP does not occur.  A similar process yields the same result for TT.  There are infrequent examples where two terminal signs appear together on one line.  In two such cases, one might conclude that a terminal sign appears in one part of the inscription and a second terminal appears in another part of the same inscription, yielding a TT phrase.  The first example is C-17, where there are two signs above the iconic unicorn (WINGED MAN // POT) and a third sign below the head of the animal where the cult stand usually appears (MAN).  We could analyze this as MT-T (in other words, two phrases, one MT and the other simply T), due to the spatial separation of POT and MAN.  However, if we assume that the final sign took this unusual position only because of crowding at the top or for stylistic reasons, we can group the 3 signs together in a less anomalous form as indicating simply MT (where T includes two signs). 

The other example is from the Gulf, showing a somewhat odd POT on the left and another on the right, on either side of the iconic animal.  Again, due to their separation, the signs could be analyzed as T-T (two phrases, each simply T).  Other Gulf inscriptions show sequencing and positioning of signs not found in Indus inscriptions.  So, again, it is best simply not to group this inscription with authentic Indus inscriptions.

Since PP and TT do not occur, we must ask whether MM appears.  If it does not, then we can express all three non-occurrences with a single rule: a phrase cannot duplicate the previous phrase type.  If an inscription appears on two lines and each line is an M phrase, then there are indeed MM (or rather, M-M) inscriptions.  Since one cannot know, at the outset, whether an inscription on two lines contains one or two independent messages, I have analyzed each line separately.  But I have not yet attempted to demonstrate whether this is accurate.  For now, I have assumed MM is possible, leaving the test of this assumption to another time.  No other duplication of phrase type is allowed for contiguous phrases.

Based on the postulated (P)M(T) sequence, I expect just two kinds of two-phrase sequences, namely, PM and MT.  That is, I would not expect to find MP, TM, PT, or TP.  In a first pass, I find PM (without any additional elements) in 6 Gulf seals and 1 Mesopotamian example (to be temporarily discarded from further analysis) alongside 235 Indus examples.  Narrowing the search to the first volume of the corpus and to inscriptions that are clear and complete on seals and tablets, I find 72 examples.  This clearly demonstrates the occurrence of PM as a complete inscription.

If the P phrase can occur alone, even if rarely, we would expect a few phrases with 2 constituents to lack M, yielding PT.  As it happens, I do find a few instances of PT, either as a complete inscription or as part of one:


2.       H-829 (tablet):  CARTWHEEL / BI-QUOTES // MAN // (B side) QUAD-FORK / 2 POSTS / CUP (PT-M; although this seems to appear in the KP concordance as CARTWHEEL / BI-QUOTES // MAN BY CHEVRON, which yields analysis as PM).

3.       M-257 (seal): BOAT / PINCH // POT // RECTANGLE (PT-M).


5.       M-1591 (graffiti): BI-QUOTES // POT (PT?)


7.       M-394 (bar seal): CUPPED SPOON / 3 POSTS / SINGLE QUOTE // FLANGE-TOPPED POT / POT (PT).

8.       Laursen 23 (Gulf seal): AY OVER QUOTES / CRAB / SINGLE QUOTE // DOUBLE POTS (PT?).

We can safely discard number 8, the Gulf seal, since this group is often anomalous.  As my notes indicate, we can also probably discard number 2, assuming my original reading was wrong and the KP concordance is correct.  The 5th example appears as graffiti on a pot shard and can also be discarded.  This still leaves 5 examples, 3 PT alone (M-311, M-394, and M-1177, colored yellow above) and 2 with PT followed by something else, PT-M (M-257), and PT-MT (M-267) (colored blue above).  These last two will be dealt with in a later post, when I consider longer inscriptions.  It is interesting that all 5 come from Mohenjo daro, which hints at geographical variation in phrase structure rules.  However, much more material comes from Mohenjo daro than from any other site, so this may only be an accident of preservation.
Seal M-311 with rare inscription containing PT phrases (shown reversed):

My conclusion thus far is that PP does not occur; PM is common; PT occurs but rarely.

Moving to sequences beginning with M, we expect to find MT, but not MP.  My search yielded no examples, so I conclude that MP does not occur.  There are a number of inscriptions that appear on two or more lines, or on two or more sides of an object.  One cannot determine whether this spatial separation is meaningful or not.  So I have coded each line separately, in case each is a separate message.  By this criterion, there are indeed MM sequences.  This should not be taken as a definitive finding, though, by any means.

As expected, MT is common, even more so than PM.  I find a total of 609 occurrences not counting Gulf seals or Indus graffiti with this pattern.  Again, narrowing the field to the first volume of the corpus and discarding all broken and unclear examples, the number shrinks but remains high with 136 examples.

In conclusion, MP does not occur; MM appears to be common; MT is very common.

Let us move now to sequences with T.  Since this is a terminal element, it should not occur in initial position (except after a break).  Thus, there should be no TP, no TM, and no TT.  Indeed, not counting broken objects which probably contained M originally, there are no instances of TP or TM.  The only example of TT is the Gulf seal mentioned earlier (Laursen 56).  I noted already that PT is rare and MT very frequent.
TP does not occur; TM does not occur; TT does not occur.

That is enough for now.  My next post will take up the occurrences of inscriptions with three phrases.


Korvink , M.P. 2008. The Indus Script: A Positional-Statistical Approach. Gilund Press.

Koskenniemi, K. and A. Parpola. 1982. A Concordance to the Texts in the Indus Script. Helsinki: Department of Asian and African Studies, University of Helsinki.

Laursen, S.T. 2010. "The westward transmission of Indus Valley sealing technology: origin and development of the 'Gulf type' seal and other administrative technologies in Early Dilmun, c. 2100-2000 B.C." in Arabian Archaeology and Epigraphy 21: 96-134.  (available online at

No comments:

Post a Comment