OGA

OGA: A Corpus of Annotated Greek Texts

GitHub

Tokenization

Tokenization is the first layer of annotation. It identifies the "atoms" which annotation units are attached to. There can be different tokenization schemes, depending on the definition given to "token" (e.g., morpheme or morphosyntactic or prosodic word). The corpus currently contains a level of morphosyntactic tokenization.

Menu

Homepage
Standoff Annotation
Texts
Tokenization
Morphological Annotation
Syntactic Annotation
Next Layer of Annotation
Annis

Contacts

GitHub project
email prefix: celano
email suffix: informatik.uni-leipzig.de
Leipzig University
Department of Computer Science (NLP)
Augustplatz 10
04109, Leipzig

Supporters

The content is licensed under Creative Commons Attribution-ShareAlike 4.0

Design: adapted from HTML5 UP