Jon Crump
Email: jjcrump at uw.edu
Thanks to all at NW Python Day 2010 for being so welcoming to a newcomer. I was encouraged to give a lightning talk, Historical Records and OCR, on my project at:
The Itinerary of King John & the Rotuli Litterarum Patentium
This is a web page that stitches together the SIMILE timeline js library and the Google maps API. The Pythonic aspect of the project was that I used Python to parse OCR text output and prepare JSON input for the timeline library. The aim was to make a relatively uncommon but essential historical resource more widely available. Hardy's 1835 edition of the rolls was published in "record type", a typeface typical of such 19th century editions, which sought to preserve the look of the medieval system of scribal abbreviation within modern printed documents. Whatever virtues such a system might have, it is also an unfortunate fact that this typeface renders text completely opaque to ordinary OCR techniques. This is why the Rot.Lit.Pat. has never been digitized properly and indexed.
On the other hand, Hardy's edition included his invaluable itinerary of John's movements throughout his reign, based on the evidence of the patent rolls and other documents that were then available. The data of this itinerary was accessible to OCR. This data could not only be usefully visualized as a timeline, it could also serve as a useful interface to the documents, along with the existing indexes.
For those who are interested, the other pythonic aspect of this project is the use of rdflib.py. The aim, eventually, is to define an ontology for describing the Rot.Lit.Pat. and the subjects of its contents in such a way as to make it meaningfully public so that it can be integrated with other similar projects such as the Henry III Fine Rolls Project.
This experience of the Itinerary project led me to experiment with a number of other web based projects, most of which use Python in one way or another, including a little data visualization animation at Interferometric Qubit Measurement using the Python cgi module.