Release of greek-normalisation 0.1

6th July 2019 / by James Tauber

For years I’ve had Python code for normalising Greek forms, checking for stray characters, etc. I finally got around to consolidating them in a library.

It has a few little utilities like:

>>> strip_last_accent_if_two('γυναῖκά')
'γυναῖκα'

>>> grave_to_acute('τὴν')
'τήν'

>>> breathing_check('ἀι')
False

but the core of it is the normalisation of tokens with knowledge of clitics and elision.

>>> normalise('τὴν')
('τήν', ['grave'])

>>> normalise('γυναῖκά')
('γυναῖκα', ['extra'])

>>> normalise('σου')
('σου', ['enclitic'])

>>> normalise('Τὴν')
('τήν', ['grave', 'capitalisation'])

>>> normalise('ὁ')
('ὁ', ['proclitic'])

>>> normalise('μετ’')
('μετά', ['elision'])

>>> normalise('οὐκ')
('οὐ', ['movable', 'proclitic'])

See my previous post The Normalisation Column in MorphGNT for the original work this code came form.

There are also some regular expressions that I’ve used to check mistakes in things like the Open Apostolic Fathers.

It’s just an initial 0.1 release but parts of the code have already been in use for years.

The repository is https://github.com/jtauber/greek-normalisation and it’s pip-installable as greek-normalisation.

← Release of greek-normalisation 0.3 Summer Conferences →

Comments on “Release of greek-normalisation 0.1”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.

Release of greek-normalisation 0.1

Comments on “Release of greek-normalisation 0.1”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Get Posts by Email