Generating Readers

Back in April 2014, Brian Renshaw posted a Good Friday Greek Reader. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.

You can see a sample PDF at https://github.com/jtauber/greek-reader/blob/master/example/reader.pdf which roughly looks like what Brian produced.

From a code point of view, it’s a fairly simple Python 3 script that generates LaTeX that is then typeset using XeTeX. There is also an experimental backend using SILE. The code is open source under an MIT license and is available at https://github.com/jtauber/greek-reader. It assumes you’re comfortable with those tools and editing text files to tweak things, but my hope is eventually a website could be built around this.

To produce a reader like this, whether manually or automatically, you need:

  1. a text
  2. lemmatization
  3. frequency counts
  4. glosses
  5. full citation forms / headwords (e.g λαμπάς, άδος, ἡ) for nominals
  6. parsing (e.g. AAI 3S) for verbs

MorphGNT gave me 1, 2, 3 and 6. 4 came from Dodson (although you can override both globally and per verse) and 5 came from Danker’s Concise Lexicon.

What’s nice about doing this programmatically, besides that fact you can make corrections upstream and have them applied to all the generated readers is that you can make this adaptive. In the example, I chose which words to annotate based on frequency but it could just as easily be based on other criteria such as what a particular student has learnt up to this point or what has been covered in a particular textbook up to this point.

One major feature I want to add, though, is richer annotation both morphologically AND syntactically so it becomes possible to generate something more akin to Zerwick and Gosvenor’s A Grammatical Analysis of the Greek New Testament.

One major motivation for my continuing work on a Morphological Lexicon is being able to provide more focused, helpful annotations for readers indicating not just a lemma but a principal part or some additional information that helps the student understand the form.

For the syntax, I’d like to eventually develop a catalog of constructions so, much like forms are only annotated if they are less frequent (or otherwise unknown to the student), particular syntactic constructions in a text can be called out based on similar criteria. Some of this is possible with existing syntactic analyses, the trick is knowing which annotations to include and which are already obvious. (I have some ideas for how to crowdsource difficult constructions, but more on that later).

The greek-reader project is a great example of a pretty simple tool that can do a lot because it builds on rich data. As we get better and better data, we can build better and better tools.


Comments on “Generating Readers”