Functional Dependency in the MorphGNT Table

15th December 2015 / by James Tauber

Often it’s useful to see whether certain columns in a table can be entirely determined by others. For example, can you unambigously get the lemma from just the form (the answer is no so a more useful question is which forms are ambiguous as to lemma)? Does knowing the part-of-speech help? Here we provide some code and give some examples.

At the end I provide the script used.

Run in the same directory as the MorphGNT SBLGNT, it runs like this:

$ ./dep.py 6 7
45

What this is telling us is that there are 45 times where the value of column 6 (the normalized form) gives us multiple possible values for column 7 (the lemma). In relational database terms was say that column 7 is not functionally dependendent on or not functionally determined by column 6 because of those 45 cases.

If you run:

$ ./dep.py -v 6 7

it will actually list all 45, starting with something like:

ἄμωμον {'ἄμωμος', 'ἄμωμον'}
ἴδε {'ἴδε', 'ὁράω'}
ὑποταγῇ {'ὑποταγή', 'ὑποτάσσω'}
καλῶν {'καλός', 'καλέω'}
Ἰουδαίας {'Ἰουδαῖος', 'Ἰουδαία'}
...

You can also give more than one column for either the determinant or dependent.

For example, does knowing the form AND part-of-speech determine the lemma?

Turns out there are only 8 exceptions in the current MorphGNT/SBLGNT:

$ ./dep.py -v 6,2 7
Ἅννα N- {'Ἅννα', 'Ἅννας'}
ἀνώτερον A- {'ἀνώτερος', 'ἀνώτερον'}
ἀλάβαστρον N- {'ἀλάβαστρος', 'ἀλάβαστρον'}
χρυσᾶ A- {'χρύσεος', 'χρυσοῦς'}
μακράν A- {'μακράν', 'μακρός'}
ὕστερον A- {'ὕστερον', 'ὕστερος'}
ταχύ A- {'ταχύ', 'ταχύς'}
ἤρχοντο V- {'ἄρχω', 'ἔρχομαι'}
8

There are other things that can be explored with this. How many lemmas have more than one part-of-speech in the MorphGNT/SBLGNT?

$ ./dep.py 7 2
70

How many forms have more than one parse analysis extant in the text, even if you know the lemma and part-of-speech:

$ ./dep.py 6,7,2 3
903

Given a lemma, part-of-speech and parse analysis, how many cases are there where multiple alternative forms are seen:

$ ./dep.py 7,2,3 6
132

Looking at these with the -v option, you can see some are unavoidable:

ὁράω V- 1AAI-P-- {'εἴδομεν', 'εἴδαμεν'}
κλείς N- ----APF- {'κλεῖς', 'κλεῖδας'}

whereas others are likely corrections that need to be made to the lemmatization:

τις RI ----GSM- {'τινος', 'τινός'}

The most recent set of corrections to MorphGNT/SBLGNT (which will be in release 6.07) stem from this sort of analysis.

There are still more to discuss and resolve, however. See https://github.com/morphgnt/sblgnt/issues/32 and other issues on GitHub for details and to help in the discussion.

The script

← Off to the Linguistic Society of America’s 90th Annual Meeting A (Not So) New Numbering System for Greek New Testament Lexemes →

Comments on “Functional Dependency in the MorphGNT Table”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.

Functional Dependency in the MorphGNT Table

The script

Comments on “Functional Dependency in the MorphGNT Table”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Get Posts by Email