Language identification development and test corpora for Suomi24 and NLF corpora

li-eval-suomi24-nlf

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2022021301

Access location:

This corpus includes files for evaluating language identification efficacy on the suomi24-2018-2020 (http://urn.fi/urn:nbn:fi:lb-2021101521) and the new part of the klk-v2 (http://urn.fi/urn:nbn:fi:lb-202009152) corpora.
The lines are random "sentences" from the new material processed by the language bank of Finland during 2021-2022.

The suomi24 originating files are licensed under CC-BY-NC and the klk-v2 originating files under CC-BY.

You don’t have the permission to edit this resource.