Converting the GBI Syntax Trees to a Dependency Analysis

With one child on each branch identified as the head, a constituent analysis can be converted to a dependency analysis. Fortunately, the GBI syntax trees have an explicit indication of the head, so I went ahead and converted them to a dependency format.

Non-leaf nodes in the GBI syntax trees have a Head attribute which indicates the index of the child considered the head.

So the algorithm is fairly straightforward. For each leaf-node:

  • walk up the tree until you find a node whose Head attribute is NOT the index of the child we just came from
  • follow the Head attributes back down the tree until you hit another leaf-node
  • that second leaf-node is the head of the leaf-node you started on
  • the “type” of the dependency is the Cat of the second-to-last node you visited walking up in step 1.

The only catch is the source data this script uses omits a Head altogether in three types of cases. The original GBI analysis treated the Head as being "1" in these cases so I special case that in the code. I don’t necessarily agree with the choice but it’s easy to change (see below).

I’ve put the code in a gist: https://gist.github.com/jtauber/c02d0928811b7ed21c9a

The result (on the first part of John 3.16) is:

64003016001 Οὕτως 64003016003 ADV
64003016002 γὰρ 64003016003 conj
64003016003 ἠγάπησεν None CL
64003016004 ὁ 64003016005 det
64003016005 θεὸς 64003016003 S
64003016006 τὸν 64003016007 det
64003016007 κόσμον 64003016003 O
64003016008 ὥστε 64003016013 conj
64003016009 τὸν 64003016010 det
64003016010 υἱὸν 64003016013 O
64003016011 τὸν 64003016012 det
64003016012 μονογενῆ 64003016010 np
64003016013 ἔδωκεν, 64003016003 CL

The dependency relationship color highlighting experiment on this site shows a possible way of conveying this dependency information in a text (in this case, 2 John).

As mentioned, I don’t necessarily always agree with the GBI choice of head, however, it’s fairly straightfoward to alter the code to override the choice of head in certain contexts.

For example, if you consider the complementizer the head, you can just add code that takes Head="0" where Rule="that-VP" and so on. Similarly with prepositions, determiners, etc.

Finally note that it’s not quite possible to reconstruct the original tree from the dependency data because the algorithm effectively eliminates information on some intermediate nodes. Some may consider this an advantage.


Comments on “Converting the GBI Syntax Trees to a Dependency Analysis”