Categories of Reader Work
My interest in tools for helping read Greek (especially, but by no means only, the New Testament) goes back at least thirteen or fourteen years. In a 2004 post copied over to this blog, I talk about algorithms for ordering vocabulary to accelerate verse coverage. It was around this time I was also working on what became Quisition, a flashcard site with spaced repetition.
In November 2005, I registed the domain readjohn.com
with a view to building a site to help people learn Greek by reading through John’s gospel. The reason for John was not only the simplicity of its Greek but the fact it’s the one thing I had the OpenText analysis for at the time. As proof I had more than just the GNT in mind, I point out that I registered readhomer.com
just two months later. I wasn’t just thinking Greek either, as I registered readdante.com
at the same time.
Vocabulary was just an initial part of the model of what it takes to be able to read a text. It happens to be the easiest to model because all it takes, to first approximation, is a lemmatized text. But it illustrates the basic concept: if you model what is needed to read a text and you model what a student knows, you can:
- help order texts (including individual clauses or even phrases) in a way that’s appropriate to the student’s level
- appropriately scaffold the texts with just enough information to fill in the gap in their understanding
One thing I was experimenting with for scaffolding was inlining Greek that the student could understand (according to the ordering generated by my vocabulary algorithms) in larger text kept in English. So in the first lesson, the student might be given something like John 1.41 in this form:
He first found his own brother Simon καὶ λέγει αὐτῷ, “We have found the Messiah!”
The combination of vocabulary ordering algorithms (driven by clause-level analysis of John’s gospel) with this sort of inlining I was calling a New Kind of Graded Reader and you can find a lot of posts from around March 2008 on this blog about it including this video. I subsequently did a full-length talk at BibleTech 2010. There’s also a post with an extended example of the inlining approach.
That initial category of reader work is still alive and by no means abandonded, it’s just taking a long time to get the analysis broadened to take into account not just vocabulary but inflectional morphology, lexical relatedness, syntactic constructions, etc. In fact, a large part of my linguistic analysis work is motivated by the reader work (which was a big theme of my BibleTech 2015 talk).
The second, somewhat independent (although still very much corpus-driven and using much of the same machine-actionable linguistic data) reader project was the semi-automated generation of more traditional print readers (the sort with rarer words glossed in footnotes and perhaps more obscure syntactic constructions or idioms commented on). You can read more about it in this post. One aim with the semi-automatic generation of printed readers was being able to customize them quite easily to a particular level. The scaffolding wouldn’t necessarily be adaptive but it could be personalized.
Again this is still of great interest to me and motivates a lot of work on machine-actionable data. While I might experiment with approaches other than using TeX, I still want to do more in this area, most likely collaborating with people interested in particular texts (and able to help work on glosses and syntactic commentary).
A third category of work is a loose collection of various little prototypes over the years for ways of presenting information in a reader. This includes things like interlinears, colour-coded texts, various ways of showing dependency relations, etc. Brian Rosner and I consolidated these prototypes in a framework for generating static HTML files in https://github.com/jtauber/online-reader. There are various online demos linked in the README.
That repo did initially include a dynamic reading environment written in Vue.js but that was broken out as the starting point for DeepReader (see below).
The fourth category of work (which goes back to my vision for readjohn.com, readhomer.com and readdante.com when I registered the domains) is an online adaptive reading environment with integrated learning tools. I talked about this at SBL 2016 in San Antonio, a Global Philology workshop in Leipzig in May, and I will be talking about it at SBL International 2017 in Berlin next month.
The idea is to integrate vocabulary and morphological drills with the reading environment so the text drives what to drill, the results of the drills help determine the text, the scaffolding needed, etc.
So the adaptive reading environment will model:
- what’s needed to understand an upcoming passage
- what the student has already seen
- what the student has inquired about
- what is at an optimal recall interval
- what the student is good or not so good at understanding (based on explicit assessment including meta-cognitive questions)
This is what I’m most actively working on at the moment. As with the other categories of readers, it relies heavily on linguistic resources so I’m doing a lot in that area.
From an implementation point-of-view, this is being implemented as a Vue.js-based application running in the browser talking to a range of microservices on the backend. Much of the “heavy lifting” will be done by the microservices. The generic parts of the frontend application are being broken out by Brian and me as a framework called DeepReader which could be used for all sorts of readers (even just Kindle-style EPUB readers). I’ll have a lot more to say about DeepReader in the future as well as the specific application of it to building an adaptive reading environment for Greek.
So there are really four distinct categories of reader projects that I’ve been working on on and off for the last thirteen or fourteen years:
- a “New Kind of Graded Reader”
- semi-automatic generation of printed readers
- framework for generating static HTML files
- online adaptive reading environment with integrated learning tools
They are all related in that they build on the same linguistic data (which is where most of the effort actually goes).
Hopefully all that provides a little bit of a high-level guide to all the reading stuff talked about on this blog, on Twitter, and which is implemented in various repositories on GitHub.
I should stress none of the code is specific to the New Testament or even to Greek. I’d be happy to collaborate with anyone on producing the necessary linguistic data for other texts and other languages.