Department of Computer Science Vassar College Poughkeepsie NY USA | Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy FRANCE |

This is a Beta release of XCES, which instantiates the EAGLES Corpus Encoding Standard (CES) DTDs for linguistic corpora, developedby the Department of Computer Science,Vassar College, and EquipeLangue et Dialogue, LORIA/CNRS. XCES is under development and subjectto change.We are developing documentation to support XCES. However, the existing CES documentation supporting general encoding practices for linguisticcorpora and tag usage is largely relevant to the XCES instantiation, andshould be consulted.
XCES is continually under development. Because the XML framework provides us withmeans to go well beyond the capabilities of SGML, this development is takingseveral forms:
- XML support for additional types of annotation and resources,including discourse/dialogue, lexicons, and speech;
- creation of additionalXSLT scripts to perform common operations and transduce among formats (includingdifferent annotation formats);
- creation of a repository of annotation formats for "off the shelf"use or easy modification via the XCES schemas (in collaboration with ISO TC 37 SC4 Language Resources).
Contents
- XCES SCHEMAS
- XCES Schema Overview and Download
- Validation
- Usage
- XCES Header
- XCES stand-alone header
- New Elements (for spoken transcriptions)
- XCES Base Types
- Attribute types
- Element types
- String types
- XCES Linking
- Linking in cesAna Documents
- Linking in cesAlign Documents
- Example: Linking with stand-off annotation
- XCES DTDs
- XSLT scripts
Questions/comments to ide@cs.vassar.edu or suderman@cs.vassar.edu