Tokenization

Tokenization is the first layer of annotation. It identifies the "atoms" which annotation units are attached to. There can be different tokenization schemes, depending on the definition given to "token" (e.g., morphosyntactic or prosodic word). The corpus currently contains a layer of morphosyntactic tokenization.