Analysing the Verbs in Nestle 1904

17th April 2017 / by James Tauber

The last couple of weeks, I’ve been working on getting my greek-inflexion code working on Ulrik Sandborg-Petersen’s analysis of the Nestle 1904. The first pass of this is now done.

The motivation for doing this work was (a) to expand the verb stem database and stemming rules; (b) to be able to annotate the Nestle 1904 with additional morphological information for my adaptive reader and some similar work Jonathan Robie is doing.

My usual first step when dealing with a next text is to automatically generate as many new entries in the lexicon / stem-database as I can (see the first step in Update on LXX Progress).

In some cases, this is just a new stem for an existing verb because of a new form of an already known verb. But sometimes it’s an entirely new verb.

I thought the Nestle 1904 would be considerably easier than the LXX because the text is so similar but there were numerous challenges that arose.

It became clear very quickly that there were considerable differences in lemma choice between the Nestle 1904 and the MorphGNT SBLGNT. This didn’t completely surprise me: I’ve spend quite a bit of time cataloging lemma choice differences between lexical resources and there are considerable differences even between BDAG and Danker’s Concise Lexicon.

But even these aside, there were 7,743 out of 28,352 verbs mismatching after my code had already done it’s best to automatically fill in missing lexical entries and stems.

A. The normalisation column in Nestle 1904 doesn’t normalise capitalisation, clitic accentuation, or moveable nu, all of which greek-inflexion assumes has been done.

Capitalisation alone accounted for 1042 mismatches. Clitic accentuation alone accounted for 1008 mismatches. Moveable nu alone accounted for 4153 mismatches.

B. Nestle 1904 systematically avoids assimilation of συν and ἐν preverbs.

Taken alone, these accounted for 91 mismatches. Mapping prior to analysis by greek-inflexion is somewhat of a hack that I’ll address in later passes.

C. There were 8 spelling differences in the endings which required an update to stemming.yaml:

κατασκηνοῖν (PAN) in Matt 13:32
κατασκηνοῖν (PAN) in Mark 4:32
ἀποδεκατοῖν (PAN) in Heb 7:5
φυσιοῦσθε (PMS-2P) in 1Cor 4:6
εἴχαμεν (IAI.1P) in 2John 1:5
εἶχαν (IAI.3P) in Mark 8:7
εἶχαν (IAI.3P) in Rev 9:8
παρεῖχαν (IAI.3P) in Acts 28:2

D. The different parse code scheme (Robinson’s vs CCAT) had to be mapped over.

This should have been straightforward but voice in the formal morphology field sometimes seemed to be messed up (which I corrected as part of G. below).

E. There were 182 differences (type not token) in lemma choice, mostly active vs middle forms.

See https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-lemma-differences-txt for the full list.

F. There were a small handful of per-form lemma corrections I made

ἐπεστείλαμεν AAI.1P ἀποστέλλω ἐπιστέλλω
ἀγαθουργῶν PAP.NSM ἀγαθοεργέω ἀγαθουργέω
συνειδυίης XAP.GSF συνοράω σύνοιδα
γαμίσκονται PMI.3P γαμίζω γαμίσκω

G. Finally, I made 69 (type not token) parse code changes.

See https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-parse-txt for the list.

With all this, the greek-inflexion code (on a branch not yet pushed at the time of writing) can correctly generate all the the verbs in the Nestle 1904 morphology.

There are definitely improvements I need to make in a second pass and at least a small number of corrections that I think need to be made to the Nestle 1904 analysis.

But it’s now possible for me to produce an initial verb stem annotation for the Nestle 1904 and I’m a step closer to a morphological lexicon with broader coverage.

UPDATE: I’ve added some more parse corrections but not yet updated the gist.

← An Initial Reboot of Oxlos Update on LXX Progress →

Comments on “Analysing the Verbs in Nestle 1904”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.

Analysing the Verbs in Nestle 1904

Comments on “Analysing the Verbs in Nestle 1904”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Get Posts by Email