Initial Thoughts on the Cost of Learning a Form

Over the years, when generating vocab coverage stats or orderings for graded readers, I’ve used either lemmas or inflected forms as the items being learnt.

The problem with using inflected forms is that it assumes knowing one form of a lexeme has nothing to do with knowing any other form of that lexeme. The problem with using lemmas is that it assumes knowing one form of a lexeme is enough to know all of them.

read more...

Analyzing Nominal Morphology: Part 1

While much of my work going back 10 years or more was on the nominals, the last few years I’ve been focused on verbal morphology. I decided that for my SBL paper, however, I’d revisit some of my noun work and ended up exploring some ideas afresh.

read more...

Technical Aspects of Openness

In my previous post, I talked about the legal / licensing aspects of open linguistic data but there are technical aspects in order for linguistic data to be open too.

read more...

Why I Use CC-BY-SA Licenses

I don’t think I’ve ever articulated why I favour a Creative Commons CC-BY-SA license on all my New Testament Greek data.

read more...

Mean Log Frequency of Dependency Paths

Adding another potential readbility metric, let’s look at the mean log frequency of dependency paths.

read more...

At the Half Way Point

Exactly two weeks ago I said I’d be blogging every day until my talk at SBL. Well, that’s two weeks away so I’m at the half way point. I think the blogging has gone well.

read more...

Generating Readers

Back in April 2014, Brian Renshaw posted a Good Friday Greek Reader. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.

read more...

Inline Annotation of Sandhi

In many Greek morphology projects, I’ve wanted a way of conveying the surface form of an inflected word while also conveying the underlying components prior to the application of the sandhi rule. A couple of years ago, I came up with a simple representation for inline annotation.

read more...

Morphological Parts of Speech in Greek

The parts of speech in a particular language can be drawn up on the basis of syntactic properties, morphological properties, and/or (perhaps most problematically) semantic properties.

What if we just want to classify lexemes in the MorphGNT based on what morphosynactic and morphosemantic features they have?

read more...

Mean Log Frequency of Forms

In a previous post, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let’s now take a look at mean log frequency of forms.

read more...