Terminology and XML
One of the main objectives of the Slovo ASO project was to develop or to choose an XML terminology framework for the terms, related with the Medieval Slavic Studies. Despite the existence of several projects in the field, the various trends are sometimes so different, that it is not possible to decide, which one to choose.
The initial terminology work was based on the TEI P4 (Guidelines). The next edition of TEI, P5, removed the entire chapter, devoted to terminology. Due to the fact that all the work on description of medieval manscripts is done in the framework of TEI, it was not so easy to accept this fact. The preliminary decision of the group is to switch slowly to this new edition. In the meantime, several XML vocabularies are being developed. The best possible approach will be to implement the terminology for the Medieval Slavic culture, literature, history, and languages, into such a model which is closely related to the standards in the field.
The Slovo ASO project give us an opportunity to test the best practices in the field in order to make a decision and to choose the most suitable XML model for encoding of terminology data.
The Slovo ASO project began to evaluate the proposed ISO standard ISO/DIS 30042:2008(E) Terminology and other language and content resources — Computer applications in terminology — TermBase eXchange Format Specification (TBX). This is a proposal made by Localization Industry Standards Association (LISA) which is based on the previous work on the ISO for standartization: ISO 12200:1999 Machine-readable terminology interchange format; ISO 16642:2003, TMF (Terminological Markup Framework) and ISO 12620:1999, Computer applications in Terminology - data-categories.
The idea is that the future development of the terminology in the field should be closely tied with the standardized best practices and to be synchronized with the related projects in the field.
For the future initiatives and projects the following standards, references, and works in the field of terminology should be consulted:
- ISO 639-1:2002, Codes for the representation of names of languages – Part 1: Alpha-2 Code
- ISO 639-2:1998, Codes for the representation of languages – Part 2: Alpha-3 Code
- ISO/IEC 646:1991, Information technology – ISO 7-bit coded character set for information interchange
- ISO 1087-1:2000, Terminology work – Vocabulary – Part 1: Theory and applications
- ISO/1087-2:1999, Terminology work – Vocabulary – Part 2: Computer applications
- ISO 3166-1:2006, Code for the representation of names of countries and their subdivisions – Part 1: Country codes
- ISO 8601:2004, Data elements and interchange formats – Information interchange – Representation of dates and times
- ISO 8879:1986, Information processing – Text and office systems – Standard Generalized Markup Language (SGML)
- ISO 8879:1986/Cor 2:1999, Information processing – Text and office systems – Standard Generalized Markup Language (SGML) – Technical Corrigendum 2
- ISO/IEC 10646-1:2003, Information technology – Universal Multiple – Octet Coded Character Set (UCS) – Part 1: Architecture and basic multilingual plane
- ISO 12200:1999, Computer applications in terminology – Machine-readable terminology interchange format (MARTIF) – Negotiated interchange
- ISO 16642:2003, Computer applications in terminology – TMF (Terminological Markup Framework)
- ISO 12620:1999, Computer applications in Terminology - data-categories
- ISO/IEC 19757-2: Information technology -- Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG, 2003.
- ISO/DIS 30042:2008(E) Terminology and other language and content resources — Computer applications in terminology — TermBase eXchange Format Specification (TBX)
Recommendations and Specifications by Other Authorities
- TEI P4: Guidelines for Electronic Text Encoding and Interchange, 2000. Available at <http://www.tei-c.org/Guidelines/P4>
- TEI P5: Guidelines for Electronic Text Encoding and Interchange, November 2007. Available at <http://www.tei-c.org/Guidelines/P5>
- TMX: Translation Memory eXchange (TMX), 2007. Available at www.lisa.org/standards/tmx/.
- XLIFF: XML Localization Interchange File Format 1.2, 2008. Available at http://docs.oasis- open.org/xliff/xliff-core/xliff-core.html.
- XML: Extensible Markup Language 1.0 (Fourth Edition), August 2006. W3C Recommendation. Available at www.w3.org/TR/REC-xml/.
Institutions and Projects
- IEC: International Electrotechnical Commission (IEC)
- ISO:International Organization for Standardization (ISO)
- ISO TC37/3: ISO Technical Committee 37, Terminology and other language and content resources, Subcommittee 3, Computer applications in terminology.
- LISA: Localization Industry Standards Association http://www.lisa.org
- OASIS: Organization for the Advancement of Structured Information Standards
- OSCAR: (Open Standards for Container/content Allowing Reuse)
- W3C: World Wide Web Consortium.