The Suomi 24 Corpus (2017H2)
View resource name in all available languages
Suomi 24 -korpus (2017H2)
Persistent Identifier of this resource:
The corpus is available in Kielipankki - the Language Bank of Finland, download: http://urn.fi/urn:nbn:fi:lb-2019010801. License details: http://urn.fi/urn:nbn:fi:lb-20150304151
The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 1.1.2001 to 31.12.2017. The tokenized version was created and the annotation process was then carried out by Jussi Piitulainen.
Researchers who have a user name and a password can download the entire corpus in the VRT format.
10.1.2019 Corpus size changed from the number of messages (85 475 616) to the number of corpora (1) since there might be slight changes in the published data and the size will be added at the time of publication.
16.1.2109 Corpus size data added.
- Turku Dependency Treebank parser