The Suomi 24 Corpus (2017H2)

View resource name in all available languages

Suomi 24 -korpus (2017H2)


Persistent Identifier of this resource:

Access location:

The corpus is available in Kielipankki - the Language Bank of Finland, download: License details:

The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 1.1.2001 to 31.12.2017. The tokenized version was created and the annotation process was then carried out by Jussi Piitulainen.

Researchers who have a user name and a password can download the entire corpus in the VRT format.

Change Log:
10.1.2019 Corpus size changed from the number of messages (85 475 616) to the number of corpora (1) since there might be slight changes in the published data and the size will be added at the time of publication.
16.1.2109 Corpus size data added.

You don’t have the permission to edit this resource.
  • Turku Dependency Treebank parser