“Sum” THATCamp possibilities?

As a participant of the upcoming THATCamp I was asked to outline a session I’d like have. Hmm… Well, I think I can brainstorm a few possibilities:

  • Exploiting RR is a increasingly popular open source statistical tool/programming language. I’d like to get up with others to discuss how it can be used in the digital humanities.
  • Graphing texts – There are many ways texts can be “measured”. I can count the number of words, the parts of speach, and the reading level. I’ve begun to count the “greatness” of a book as described in a number of blog postings. Once these sorts of things are measured, I’d like to discuss with people ways these measurments and be illustrated through the use of charts and graphs. A picture is worth a thousand words.
  • Integrating digital humanities with libraries – As a librarian one of my ultimate goals is to figure out ways digital humanities computing techniques can be seamlessly integrated into library collections and services. Instead of a library “catalog” simply pointing a person to a text, I’d like it to offer services allowing the user to… use the text. Maybe we can create a prototype of such a thing.
  • Reducing ambiguity – In one of my “experiements” I wanted to assess a set of works’ use of the word “being”, as in the thing, but the analysis returned too many false-positives because the word was being used as a verb and not a noun. Such a problem is not uncommon, and I’m wonding how it can be resolved.

‘Just some ideas, and please be gentle with me. I’m a noob.

Eric Lease Morgan

0 Vote

Tags: ,

One Response to ““Sum” THATCamp possibilities?”

  1. avatar aelang says:

    Hello Eric

    On your ‘reducing ambiguity’ point: this is fixed fairly easily. You need to run your corpus through a POS (part of speech) tagger which will automatically tag all words with the part of speech they belong to (or that the computer thinks they belong to). Tag sets for parts of speech differ, but the one used by the British National Corpus would use NN1 to mark ‘being’ as a noun, and VBG to mark it as a present participle verb.

    The British National Corpus is tagged using CLAWS (http://ucrel.lancs.ac.uk/claws/) which apparently has 96-97% accuracy, but there are also some open-source POS-taggers out there too. I have never used any of them, though, so I’m afraid I can’t help with recommending any specific ones.


Leave a Reply