Functional Dependency in the MorphGNT Table
At the end I provide the script used.
Run in the same directory as the MorphGNT SBLGNT, it runs like this:
$ ./dep.py 6 7 45
What this is telling us is that there are 45 times where the value of column 6 (the normalized form) gives us multiple possible values for column 7 (the lemma). In relational database terms was say that column 7 is not functionally dependendent on or not functionally determined by column 6 because of those 45 cases.
If you run:
$ ./dep.py -v 6 7
it will actually list all 45, starting with something like:
ἄμωμον {'ἄμωμος', 'ἄμωμον'} ἴδε {'ἴδε', 'ὁράω'} ὑποταγῇ {'ὑποταγή', 'ὑποτάσσω'} καλῶν {'καλός', 'καλέω'} Ἰουδαίας {'Ἰουδαῖος', 'Ἰουδαία'} ...
You can also give more than one column for either the determinant or dependent.
For example, does knowing the form AND part-of-speech determine the lemma?
Turns out there are only 8 exceptions in the current MorphGNT/SBLGNT:
$ ./dep.py -v 6,2 7 Ἅννα N- {'Ἅννα', 'Ἅννας'} ἀνώτερον A- {'ἀνώτερος', 'ἀνώτερον'} ἀλάβαστρον N- {'ἀλάβαστρος', 'ἀλάβαστρον'} χρυσᾶ A- {'χρύσεος', 'χρυσοῦς'} μακράν A- {'μακράν', 'μακρός'} ὕστερον A- {'ὕστερον', 'ὕστερος'} ταχύ A- {'ταχύ', 'ταχύς'} ἤρχοντο V- {'ἄρχω', 'ἔρχομαι'} 8
There are other things that can be explored with this. How many lemmas have more than one part-of-speech in the MorphGNT/SBLGNT?
$ ./dep.py 7 2 70
How many forms have more than one parse analysis extant in the text, even if you know the lemma and part-of-speech:
$ ./dep.py 6,7,2 3
903
Given a lemma, part-of-speech and parse analysis, how many cases are there where multiple alternative forms are seen:
$ ./dep.py 7,2,3 6
132
Looking at these with the -v
option, you can see some are unavoidable:
ὁράω V- 1AAI-P-- {'εἴδομεν', 'εἴδαμεν'} κλείς N- ----APF- {'κλεῖς', 'κλεῖδας'}
whereas others are likely corrections that need to be made to the lemmatization:
τις RI ----GSM- {'τινος', 'τινός'}
The most recent set of corrections to MorphGNT/SBLGNT (which will be in release 6.07) stem from this sort of analysis.
There are still more to discuss and resolve, however. See https://github.com/morphgnt/sblgnt/issues/32 and other issues on GitHub for details and to help in the discussion.