Annotating the Normalization Column in MorphGNT: Part 1

27th November 2015 / by James Tauber

Since the Series-6 release, MorphGNT has had a column that normalizes the word forms in the text for contextual things like accent changes, elision, movable nu and capitalization. I thought it would be useful to provide an annotation of exactly what normalization had been done for each word in the text and why.

I wrote a short Python script that runs some heuristics on each case where the “word” column and “norm” column differ to determine the nature of the in-context change.

In this post, I’ll just report on some statistics. In later posts, I’ll dive into further details that rely on actually looking at the surrounding context (rather than just the difference in one row).

There are 47,630 times where the word and norm columns differ.

38,523 times there is a change of accent (clitics, oxytones taking graves, etc).

3,721 times there is a change in capitalization.

1,221 times there is elision: 984 times a straight dropping of a final vowel, 237 times an additional aspiration of the preceding consonant.

5,223 times there is a movable nu. Note that both the existence and absence of nu is normalized to (ν) so this covers all cases where a nu could be dropped as well as the 142 times when it actually is.

226 times there is a movable sigma (20 times where it’s actually dropped). This doesn’t count ἐξ (another 234 times). There are also 825 times οὐκ appears and 105 times οὐχ appears.

In addition to the 47,630 cases above, there are also 32 other instances of two types of discrepancy that need to be resolved. One is ἑλπίδι with a rough accent in Romans. The other is the cases where Χριστός appears with lower case χ. I’m not sure what the solution to the former is but the latter might just involve having two distinct lemmata for Χριστός vs χριστός.

All these statistics might seem of trivial interest but they are side effects of a more important task of both verifying the normalization and, as will be covered in subsequent posts, testing context-sensitive accentuation rules.

← A (Not So) New Numbering System for Greek New Testament Lexemes Back to a More Sustainable Blogging Pace →

Comments on “Annotating the Normalization Column in MorphGNT: Part 1”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.

Annotating the Normalization Column in MorphGNT: Part 1

Comments on “Annotating the Normalization Column in MorphGNT: Part 1”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Get Posts by Email