Language identification development and test corpora for Suomi24 and NLF corpora


Persistent Identifier of this resource:

Access location:

This corpus includes files for evaluating language identification efficacy on the suomi24-2018-2020 ( and the new part of the klk-v2 ( corpora.
The lines are random "sentences" from the new material processed by the language bank of Finland during 2021-2022.

The suomi24 originating files are licensed under CC-BY-NC and the klk-v2 originating files under CC-BY.

You don’t have the permission to edit this resource.