OGA: A Corpus of Annotated Greek Texts
  • GitHub

Tokenization

Tokenization is the first layer of annotation. It identifies the "atoms" which annotation units are attached to. There can be different tokenization schemes, depending on the definition given to "token" (e.g., morpheme or morphosyntactic or prosodic word). The corpus currently contains a level of morphosyntactic tokenization.

Menu

  • Homepage
  • Standoff Annotation
  • Texts
  • Tokenization
  • Morphological Annotation
  • Syntactic Annotation
  • Next Layer of Annotation
  • Annis

Contacts

  • GitHub project
  • email prefix: celano
    email suffix: informatik.uni-leipzig.de
  • Leipzig University
    Department of Computer Science (NLP)
    Augustplatz 10
    04109, Leipzig

Supporters



Creative Commons License
The content is licensed under Creative Commons Attribution-ShareAlike 4.0

Design: adapted from HTML5 UP

Toggle