Converting the GBI Syntax Trees to a Dependency Analysis

2nd July 2015 / by James Tauber

With one child on each branch identified as the head, a constituent analysis can be converted to a dependency analysis. Fortunately, the GBI syntax trees have an explicit indication of the head, so I went ahead and converted them to a dependency format.

Non-leaf nodes in the GBI syntax trees have a Head attribute which indicates the index of the child considered the head.

So the algorithm is fairly straightforward. For each leaf-node:

walk up the tree until you find a node whose Head attribute is NOT the index of the child we just came from
follow the Head attributes back down the tree until you hit another leaf-node
that second leaf-node is the head of the leaf-node you started on
the “type” of the dependency is the Cat of the second-to-last node you visited walking up in step 1.

The only catch is the source data this script uses omits a Head altogether in three types of cases. The original GBI analysis treated the Head as being "1" in these cases so I special case that in the code. I don’t necessarily agree with the choice but it’s easy to change (see below).

I’ve put the code in a gist: https://gist.github.com/jtauber/c02d0928811b7ed21c9a

The result (on the first part of John 3.16) is:

64003016001 Οὕτως 64003016003 ADV
64003016002 γὰρ 64003016003 conj
64003016003 ἠγάπησεν None CL
64003016004 ὁ 64003016005 det
64003016005 θεὸς 64003016003 S
64003016006 τὸν 64003016007 det
64003016007 κόσμον 64003016003 O
64003016008 ὥστε 64003016013 conj
64003016009 τὸν 64003016010 det
64003016010 υἱὸν 64003016013 O
64003016011 τὸν 64003016012 det
64003016012 μονογενῆ 64003016010 np
64003016013 ἔδωκεν, 64003016003 CL

The dependency relationship color highlighting experiment on this site shows a possible way of conveying this dependency information in a text (in this case, 2 John).

As mentioned, I don’t necessarily always agree with the GBI choice of head, however, it’s fairly straightfoward to alter the code to override the choice of head in certain contexts.

For example, if you consider the complementizer the head, you can just add code that takes Head="0" where Rule="that-VP" and so on. Similarly with prepositions, determiners, etc.

Finally note that it’s not quite possible to reconstruct the original tree from the dependency data because the algorithm effectively eliminates information on some intermediate nodes. Some may consider this an advantage.

← Types of Disagreement in Syntactic Analyses pyuca supports Python 2 again →

Comments on “Converting the GBI Syntax Trees to a Dependency Analysis”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.

Converting the GBI Syntax Trees to a Dependency Analysis

Comments on “Converting the GBI Syntax Trees to a Dependency Analysis”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Get Posts by Email