in with
Tutorial

KorAP: Corpus Data

KorAP is developed as being the main access point to DeReKo, being the successor of COSMAS II in that regard. But KorAP is not focussed on any specific corpus, it is, for example, now also used for the Romanian national corpus CoRoLa.

In KorAP, corpus texts are allowed to have arbitrary metadata information, that partially can be used to create subcorpora (so-called virtual corpora).

KorAP also supports an arbitrary number of Annotations from different sources (called foundries) with different layers.

Annotations of the following kind are supported:

Tokens
Annotations associated to single tokens (e.g. words or numbers)
Spans
Annotations to a sequence of words or nodes (e.g. sentences, phrases, constituency annotations)
Relations
Annotations of relations between tokens or spans (e.g. dependency annotations)
Attributes
Attribute information for tokens, spans, or relations (e.g. attributes of HTML elements)