Analyzing Nominal Morphology: Part 2
As well as producing a YAML file with entries for each lexeme, I also now generate a (space-delimited) tabular form that looks like this:
ἀβαρής a-4a -- M n-3d(2aA) ἀβαρ AS ἀβαρῆ ἀβαρ ῆ εσ+α ἄβυσσος n-2b -- F n-2b ἀβυσσ GS ἀβύσσου ἀβύσσ ου ο+ιο ἄβυσσος n-2b -- F n-2b ἀβυσσ AS ἄβυσσον ἄβυσσ ον ο+ν ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NS ἀγαθοποιῶν ἀγαθοποι ῶν ουντ+ ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NP ἀγαθοποιοῦντες ἀγαθοποι οῦντες ουντ+ες ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι AP ἀγαθοποιοῦντας ἀγαθοποι οῦντας ουντ+ας ἀγαθοποιέω verb PA F n-1c ἀγαθοποιουσ NP ἀγαθοποιοῦσαι ἀγαθοποιοῦσ αι α+ι ἀγαθοποιΐα n-1a -- F n-1a ἀγαθοποιϊ DS ἀγαθοποιΐᾳ ἀγαθοποιΐ ᾳ α+ι ἀγαθοποιός a-3a -- M n-2a ἀγαθοποι GP ἀγαθοποιῶν ἀγαθοποι ῶν +ων ἀγαθός a-1a(2a) -- M n-2a ἀγαθ NS ἀγαθός ἀγαθ ός ο+ς
The columns are:
- lemma
- Mounce category (or
verb
for particples) for overall lexeme - aspect / voice (for participles)
- gender
- Mounce category used for particular sub-paradigm (different from overall lexeme for adjectives or participles)
- lexeme-level theme
- case / number
- form
- form-specific theme
- form-specific distinguisher
- stem ending and suffix
What’s helpful about this format is you can use awk
, grep
, sort
, wc
and other Unix tools to very quickly extract information. (I may soon put it in SQL and expose a web interface too). So you can see all the times a particular distinguisher is used, or all the times it’s used for a particular case / number. Or what all the sandhi rules are.
I’ve already written a Python script that generates a list of paradigms based on this (keyed off Mounce category for now, until I’ve finalized my own, which will actually be defined by these paradigms).
The paradigms look like:
n-3b(1) M (10): NS: ξ {κ+ς} GS: κος {κ+ος} DS: κι {κ+ι} AS: κα {κ+α} NP: κες {κ+ες} GP: κων {κ+ων} AP: κας {κ+ας}
There’s actually a feedback loop where inconsistencies and errors spotted in this paradigm output inform corrections to the underlying distinguisher rules.
The code and data are available at https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers.