Materials

Preliminary list of recommended materials for Developers’ Challenge (items may be added). (Full license and availability information.)

dataset format(s) size availability license
Archimedes Palimpsest transcriptions XML: TEI P5 5.6 MB http://www.archimedespalimpsest.net/ CC-BY
Archimedes Palimpsest images TIFF approx 1 TB http://www.archimedespalimpsest.net/ or on HD CC-BY
British Prints Database: http://www.bpi1700.org.uk MySQL dump + online images MySQL dump of metadata: 21.7 MB CD; images http://images.cch.kcl.ac.uk/bpi/ (not to be redistributed) CC-BY-NC
Centre for History and Analysis of Recorded Music (CHARM) catalogues bespoke XML + METS 145MB CD CC-BY-NC
Clergy of the Church of England: http://www.theclergydatabase.org.uk/index.html MySQL dump dump is 474 MB CD CC-BY
DEMOS (text of articles) XML: TEI P5 2.4 MB CD CC-BY-NC-SA
Domesday/Prosopography of Anglo-Saxon England project Spreadsheet   CD CC-BY-NC
Duke Databank + HGV + APIS (papyri transcriptions, translations + metadata) XML: TEI P5 (EpiDoc) 2.2 GB git clone http://idp.atlantides.org/git/idp.data.git/ CC-BY (except APIS)
Euripidies Scholia pseudo-TEI P5 500 KB http://euripidesscholia.org/sourceFiles/ CC-BY-NC-SA
Greek, Roman and Byzantine Pottery at Ilion HTML, JPG, RDFa (+KML) 345 MB http://classics.uc.edu/troy/grbpottery/ CC-BY-NC-ND
Hofmeister TEI XML + Authority files 115MB + http://www.hofmeister.rhul.ac.uk/2008/content/reference/thesaurus_download.html CC-BY-NC-SA
Homer Multitext images TIFF, JPEG2000, JPG, Pyramid TIFF, +c >500 GB (TIFFs alone), several TB total http://amphoreus.hpcc.uh.edu/ CC-BY-NC-SA
Inscriptions of Aphrodisias XML: TEI P4 (EpiDoc) 6.6 MB http://insaph.kcl.ac.uk/iaph2007/xml/inscriptions.zip CC-BY
Inscriptions of Aphrodisias: feeds Atom 2.2 MB http://concordia.atlantides.org/examples/iaph2007.atom CC-BY
Inscriptions of Roman Tripolitania XML: TEI P4 (EpiDoc) 10.2 MB http://irt.kcl.ac.uk/irt2009/redist/inscr/irt2009_inscriptions.zip CC-BY
Inscriptions of Roman Tripolitania: feeds Atom 2.2 MB http://irt.kcl.ac.uk/irt2009/index.atom CC-BY
Inscriptions of Roman Tripolitania: geodata KML 400 KB http://irt.kcl.ac.uk/irt2009/redist/maps/tripolitania_earth.kml CC-BY
Jonathan Swift Archive bespoke XML 35 MB CD CC-BY-NC
Khirbat al-Mudayna al-Aliya excavations Atom + images + structured data   http://opencontext.org/sets/Jordan/Khirbat+al-Mudayna+al-Aliya CC-BY
Nineteenth Century Serials Edition Plain text 2.6 GB DVD CC-BY
Nomisma.org (ancient coins) RDFa (+KML) 2.3 MB http://nomisma.org/nomisma.org.xml CC-BY-NC
Old Bailey Transcripts bespoke XML > 1 GB FTP non-commercial (license required)
Perseus Greek and Roman texts XML: TEI P4 340MB http://nlp.perseus.tufts.edu/hopper/opensource CC-BY-NC-SA
Perseus Treebanks (grammatical markup) XML 10 MB http://nlp.perseus.tufts.edu/syntax/treebank/ CC-BY-NC-SA
Petra Great Temple Excavations Images + KML + Atom   http://opencontext.org/sets/Jordan/Petra+Great+Temple CC-BY
Stormont Papers (Hansard): text XML 47 MB CD non-commercial (license attached)
Stormont Papers (Hansard): geodata KML 78 MB CD non-commercial (license attached)
Victoria and Albert Museum Collections JSON via webservice   API doc: http://www.vam.ac.uk/api non-commercial (terms online)
Vision of Britain relational data (http://www.visionofbritain.org.uk) postgres dump 2GB DVD CC-BY-NC-SA
Vision of Britain historic mapping georeferenced rasters   http://www.visionofbritain.org.uk/maps (images not for redistribution)
WGBH OpenVault metadata records Dublin Core and PBCore 3000 records internet access via OAI-PMH from Fedora repository (http://openvault.wgbh.org/fedora/oai), a Solr request handler (http://openvault.wgbh.org/solr/select) non-commercial (terms online)
WGBH OpenVault Vietnam interview transcripts TEI with SMIL & RDF 230 records http://openvault.wgbh.org/api/dhdev non-commercial (terms online)
WW1 Poetry Archive JPG + metadata CSV 60 MB sample; full >10 GB sample on CD; remainder scrapable from http://www.oucs.ox.ac.uk/ww1lit non-commercial (license attached)