13th November 2015 /
by James Tauber
Over the years, when generating vocab coverage stats or orderings for graded readers, I’ve used either lemmas or inflected forms as the items being learnt.
The problem with using inflected forms is that it assumes knowing one form of a lexeme has nothing to do with knowing any other form of that lexeme. The problem with using lemmas is that it assumes knowing one form of a lexeme is enough to know all of them.
read more...
12th November 2015 /
by James Tauber
While much of my work going back 10 years or more was on the nominals, the last few years I’ve been focused on verbal morphology. I decided that for my SBL paper, however, I’d revisit some of my noun work and ended up exploring some ideas afresh.
read more...
11th November 2015 /
by James Tauber
In my previous post, I talked about the legal / licensing aspects of open linguistic data but there are technical aspects in order for linguistic data to be open too.
read more...
10th November 2015 /
by James Tauber
I don’t think I’ve ever articulated why I favour a Creative Commons CC-BY-SA license on all my New Testament Greek data.
read more...
9th November 2015 /
by James Tauber
Adding another potential readbility metric, let’s look at the mean log frequency of dependency paths.
read more...
8th November 2015 /
by James Tauber
Exactly two weeks ago I said I’d be blogging every day until my talk at SBL. Well, that’s two weeks away so I’m at the half way point. I think the blogging has gone well.
read more...
7th November 2015 /
by James Tauber
Back in April 2014, Brian Renshaw posted a Good Friday Greek Reader. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.
read more...
6th November 2015 /
by James Tauber
In many Greek morphology projects, I’ve wanted a way of conveying the surface form of an inflected word while also conveying the underlying components prior to the application of the sandhi rule. A couple of years ago, I came up with a simple representation for inline annotation.
read more...
5th November 2015 /
by James Tauber
The parts of speech in a particular language can be drawn up on the basis of syntactic properties, morphological properties, and/or (perhaps most problematically) semantic properties.
What if we just want to classify lexemes in the MorphGNT based on what morphosynactic and morphosemantic features they have?
read more...
4th November 2015 /
by James Tauber
In a previous post, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let’s now take a look at mean log frequency of forms.
read more...