The Suomi 24 Corpus (2015H1)

View resource name in all available languages

Suomi 24 -korpus (2015H1)

Suomi24-2015H1

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-201412171

Access location:

The corpus is available in Kielipankki - the Language Bank of Finland, download: http://urn.fi/urn:nbn:fi:lb-2015040801 (see there Suomi24-2015-10-29_VRT). License details: http://urn.fi/urn:nbn:fi:lb-20150304151

The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 2001 to June 2015. The corpus has been tokenized and annotated with morpho-syntactic analysis by FIN-CLARIN at the Department of Modern Languages, University of Helsinki.

The tokenized version was created by Aleksi Sahala. Annotation process was then carried out by Jussi Piitulainen (using CSC's Taito cluster). The morpho-syntactic analysis was produced with the Turku Dependency Parser.

Researchers who have a user name and a password can download the entire corpus at http://urn.fi/urn:nbn:fi:lb-2015040801 in the VRT format. University students have to apply for access rights at https://lbr.csc.fi/ (sign in with your university credentials) before being able to download the corpus at http://urn.fi/urn:nbn:fi:lb-2015040801

Earlier version: Suomi24-2015-05-25, containing 123 319 920 tokens / 10 000 Sentences

View resource description in all available languages

Aineisto on saatavilla Kielipankissa, lataus: http://urn.fi/urn:nbn:fi:lb-2015040801 (ks. sieltä Suomi24-2015-10-29_VRT).


Aineisto sisältää Suomi 24 keskustelupalvelun Suomi24 APIssa saatavilla olevat keskustelupalstat ajalta 2001-2015 kesäkuu.

Korkeakoulujen tutkijat saavat ladata koko aineiston omalle koneelleen osoitteesta http://urn.fi/urn:nbn:fi:lb-2015040801 VRT-muodossa.

Lisenssi: ks. http://urn.fi/urn:nbn:fi:lb-20150304251

You don’t have the permission to edit this resource.