Home > People & Projects > Machine-readable grammatical resources for Indonesian

Project Details

not specified
Project Name: 
Machine-readable grammatical resources for Indonesian
Principal Investigator / Director: 
Mary Dalrymple
Oxford participants: 
Mary Dalrymple (Main Contact); Suriel Mofu
Other Participants: 
not specified
  • Division: Humanities
  • Unit: Linguistics, Philology & Phonetics Faculty
  • Sub-Unit: not specified
Start Date: 
not specified
End Date: 
not specified
Partner organizations (inside or outside Oxford): 
Australian National University
Funder: 
not specified
Subject Area: 
Linguistics
Project Description: 

This project has produced grammatical resources for Indonesian to guide grammar development for computer-implemented grammars and to establish a standard by which grammar coverage can be measured. The resources consist of a set of 52 machine-readable (plain text) files containing acceptable and unacceptable sentences of Indonesian, their translations, and comments on their grammatical structure. In this, the resource differs from standard grammars and textbooks of Indonesian, which assume that the human reader or learner can fill in a full paradigm on the basis of an abstract description or a few representative examples. Unlike corpora assembled from naturally occurring texts, the files contain unacceptable as well as acceptable examples; including unacceptable examples is crucial in ensuring that grammars produce only well-formed analyses, and do not accept ungrammatical input.

Our project connects with the project "Understanding Indonesian: developing a machine-usable grammar, dictionary and corpus", based at the Australian National University and funded by the Australian Research Council, with which PI Dalrymple is associated as a partner investigator. The Australian project has produced a broad-coverage grammar, lexicon, and balanced corpus of Indonesian as a part of the Parallel Grammar Project (PARGRAM), an international consortium of academic and commercial research institutions to develop computational grammars and lexicons within the shared linguistic framework of Lexical Functional Grammar (LFG). The testsuites have been essential to their work in guiding the development of the grammar, ensuring coverage of less common as well as of basic constructions, testing the full paradigm of constructions and their interactions, and testing the "tightness" of the grammar in excluding impossible analyses as well as producing well-formed analyses for the constructions under examination. Feedback from the "Understanding Indonesian" project has guided development of the testsuites and ensured full coverage and comprehensiveness.

ICT Methods: 
CategorySub-HeadingsDetails
Data CaptureData ReuseUse of existing digital data
Data analysisSearching/Linking
- -Searching and querying
Text AnalysisText mining
Data structuring and enhancementText EncodingLemmatisation
- -Text encoding - descriptive
- -Text encoding - referential
Last updated: 
25/06/2015 16:24:50
Updated by: 
martinw@ox.ac.uk