Some Unix Command Line Exercises Using MorphGNT

24th December 2017 / by James Tauber

I thought I’d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I’d share in case they are useful to others.

The point here is not to actually teach how to use bash or commands like grep, awk, cut, sort, uniq, head or wc but rather to motivate their use in a gradual fashion with real use cases and to structure what to actually look up when learning how to use them.

This little set of commands has served me well for over twenty years working with MorphGNT in its various iterations (although I obviously switch to Python for anything more complex).

Task 0

Clone https://github.com/morphgnt/sblgnt in git.

Task 1

Using wc and the concept of wildcards/globbing (and relying on the fact I have one line-per-word in those files) work out how many words are in the main text of SBLGNT.

Task 2

Using grep and wc work out how many times μονογενής appears. (You might be able to do it with just grep and appropriate options, but try using grep without options and wc and understand the concept of “piping” the output of one command to the input of another)

Task 3

How many verbs (tokens) are there in John’s gospel? (still doable just with grep and wc)

Task 4

How many unique verbs (lemmas) are there in John’s gospel?

(learn how to use awk to extract fields, and how to use sort and uniq in tandem)

Task 5

What are the 5 most common verbs (lemmas) in John’s gospel? (you might want to use head)

Task 6

Get counts in John’s Gospel of how many tokens appear in each tense/aspect (hint: use cut) and write the results to a file called john.txt rather than just output it in the terminal.

Task 7

Come up with your own question that you think could be answered using the types of operations and try it out.

← Lexical Dispersion in the Greek New Testament Via Gries's DP SBL Papers Now Online →

Comments on “Some Unix Command Line Exercises Using MorphGNT”

J. K. Tauber

at the intersection of computing, linguistics, philology, and learning science

Now • Projects • Articles • Labs • Blog

Atom Feed

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.

I can be contacted at jtauber@jtauber.com.