Mean Dependency Depth
In Mean Log Frequency of Lexemes I mentioned that, as well as mean log word frequency, reading comprehension measures such as the Lexile® framework use average sentence length. Now that we have Dependency Paths calculated, we can explore potentially more useful proxies for syntactic complexity.
As an initial experiment, we’ll simply take the mean dependency depth of each target where our targets are chapters and by “dependency depth” I simply mean the number of labels in the dependency path. In other words np-O-CL-CL
will count as 4 and we’ll just average across all the words in each chapter.
An initial run reveals one interesting problem. Luke 3 is given a considerably higher score than anything else because of the analysis of the genealogy (A the son of B the son of C…and so on, leads to very long paths). Reading that genealogy is arguably not that taxing syntactically which highlights one flaw in the dependency depth approach (or, perhaps the analysis chosen for the genealogy).
This aside, let’s look at what this measure identifies as easiest chapters:
2685 67009 2715 67006 2746 66014 2831 67014 2840 66013 2840 69005 2841 67007 2869 66007 2888 67016 2892 69003
Interestingly, the top 10 chapters for lowest mean dependency depth are all in Romans, 1 Corinthians and Galatians.
If we average, instead, across entire books, the top ten are:
- 3 John
- 1 Corinthians
- 1 John
- James
- Galatians
- John
- Romans
- Matthew
- Mark
- 2 John
which is perhaps a little less surprising.
The hardest chapters, Luke 3 aside, are the first chapters of Ephesians, 2 Timothy and Colossians, which probably isn’t much of a surprise either. The hardest books overall are Ephesians and Colossians.
The code is available here (tweak line 13 to get book-level stats).
Note, this all may be quite sensitive to the choice of analysis. It would be an interesting exercise to see, for example, what the PROIEL dependency analysis yields.
In future posts, we’ll try a few more measures and then try to bring them together to see how chapters (or books, or authors) compare across multiple criteria.