Preliminary list of recommended materials for Developers’ Challenge (items may be added). (Full license and availability information.)

dataset format(s) size availability license
Archimedes Palimpsest transcriptions XML: TEI P5 5.6 MB CC-BY
Archimedes Palimpsest images TIFF approx 1 TB or on HD CC-BY
British Prints Database: MySQL dump + online images MySQL dump of metadata: 21.7 MB CD; images (not to be redistributed) CC-BY-NC
Centre for History and Analysis of Recorded Music (CHARM) catalogues bespoke XML + METS 145MB CD CC-BY-NC
Clergy of the Church of England: MySQL dump dump is 474 MB CD CC-BY
DEMOS (text of articles) XML: TEI P5 2.4 MB CD CC-BY-NC-SA
Domesday/Prosopography of Anglo-Saxon England project Spreadsheet   CD CC-BY-NC
Duke Databank + HGV + APIS (papyri transcriptions, translations + metadata) XML: TEI P5 (EpiDoc) 2.2 GB git clone CC-BY (except APIS)
Euripidies Scholia pseudo-TEI P5 500 KB CC-BY-NC-SA
Greek, Roman and Byzantine Pottery at Ilion HTML, JPG, RDFa (+KML) 345 MB CC-BY-NC-ND
Hofmeister TEI XML + Authority files 115MB + CC-BY-NC-SA
Homer Multitext images TIFF, JPEG2000, JPG, Pyramid TIFF, +c >500 GB (TIFFs alone), several TB total CC-BY-NC-SA
Inscriptions of Aphrodisias XML: TEI P4 (EpiDoc) 6.6 MB CC-BY
Inscriptions of Aphrodisias: feeds Atom 2.2 MB CC-BY
Inscriptions of Roman Tripolitania XML: TEI P4 (EpiDoc) 10.2 MB CC-BY
Inscriptions of Roman Tripolitania: feeds Atom 2.2 MB CC-BY
Inscriptions of Roman Tripolitania: geodata KML 400 KB CC-BY
Jonathan Swift Archive bespoke XML 35 MB CD CC-BY-NC
Khirbat al-Mudayna al-Aliya excavations Atom + images + structured data CC-BY
Nineteenth Century Serials Edition Plain text 2.6 GB DVD CC-BY (ancient coins) RDFa (+KML) 2.3 MB CC-BY-NC
Old Bailey Transcripts bespoke XML > 1 GB FTP non-commercial (license required)
Perseus Greek and Roman texts XML: TEI P4 340MB CC-BY-NC-SA
Perseus Treebanks (grammatical markup) XML 10 MB CC-BY-NC-SA
Petra Great Temple Excavations Images + KML + Atom CC-BY
Stormont Papers (Hansard): text XML 47 MB CD non-commercial (license attached)
Stormont Papers (Hansard): geodata KML 78 MB CD non-commercial (license attached)
Victoria and Albert Museum Collections JSON via webservice   API doc: non-commercial (terms online)
Vision of Britain relational data ( postgres dump 2GB DVD CC-BY-NC-SA
Vision of Britain historic mapping georeferenced rasters (images not for redistribution)
WGBH OpenVault metadata records Dublin Core and PBCore 3000 records internet access via OAI-PMH from Fedora repository (, a Solr request handler ( non-commercial (terms online)
WGBH OpenVault Vietnam interview transcripts TEI with SMIL & RDF 230 records non-commercial (terms online)
WW1 Poetry Archive JPG + metadata CSV 60 MB sample; full >10 GB sample on CD; remainder scrapable from non-commercial (license attached)