Home > People & Projects > Electronic Text Corpus of Sumerian Literature

Project Details

not specified
Project Name: 
Electronic Text Corpus of Sumerian Literature
Principal Investigator / Director: 
Jacob Dahl
Oxford participants: 
Jacob Dahl (Main Contact)
Other Participants: 
not specified
  • Division: Humanities
  • Unit: Oriental Studies Faculty
  • Sub-Unit: Griffith Institute
Start Date: 
1997
End Date: 
2006
Partner organizations (inside or outside Oxford): 
not specified
Funder: 
Leverhulme Trust, AHRB (AHRC)
Subject Area: 
Oriental Studies
Project Description: 

Sumerian is the first language for which we have written evidence and its literature the earliest known. The Electronic Text Corpus of Sumerian Literature (ETCSL), a project of the University of Oxford, comprises a selection of nearly 400 literary compositions recorded on sources which come from ancient Mesopotamia (modern Iraq) and date to the late third and early second millennia BCE. The corpus contains Sumerian texts in transliteration, English prose translations and bibliographical information for each composition. The transliterations and the translations can be searched, browsed and read online. Every Sumerian word in the corpus has been given an English label or gloss. This enables the display of a word-by-word translation of each line of transliterated Sumerian together with the translation (interpretation) of that line in plain English. In the process of attaching a label to every Sumerian word, the base of the word was separated from any grammatical morpheme(s) and given it a standardised form. This standardised form is referred to as the lemma, and corresponds to a headword as found in dictionaries. There are two ways of searching the corpus. With the Simple search the transliterations and the translations can be searched in a way similar to searching for a word or phrase in a word processing document. With the Advanced search only the transliterations can be searched, and the output is always in the form of a Key-Word-In-Context (KWIC) concordance. The Electronic Text Corpus of Sumerian Literature is encoded using the extensible markup language (XML) in accordance with the Text Encoding Initiative (TEI) guidelines. A number of XML validators have been run on the corpus to ensure its compatibility with XML and TEI. To deliver the corpus on the web several technologies and software packages have been used. The most important ones are MySQL , Perl, and PHP. To cope satisfactorily with Sumerian and Akkadian transliterations, Unicode character encoding has been used where needed.

ICT Methods: 
CategorySub-HeadingsDetails
Data CaptureData ReuseUse of existing digital data
Textual InputManual input and transcription
- -Text recognition
Data structuring and enhancementOther Data ProcessingCoding and standardisation
- -Data modelling
Text EncodingLemmatisation
- -Text encoding - descriptive
- -Text encoding - referential
Last updated: 
25/06/2015 16:24:36
Updated by: 
martinw@ox.ac.uk