amph-Corpus

787 Last view: 2024-05-06

19 Last update: 2023-09-28

View resource name in all available languages

Ajatella, miettiä, pohtia, harkita -korpus

amph

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2015021301

Access location: http://urn.fi/urn:nbn:fi:lb-2017022403

The corpus is available for download in Kielipankki - the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2017022403 You should be able to download it by just logging in with your university credentials. In case you cannot log in, even though you are affiliated to a university, see instructions at https://www.kielipankki.fi/lic/aca-status/

A copy of the uncompressed corpus is also available at puhti.csc.fi, instructions on how to gain access rights: https://www.kielipankki.fi/support/access/

The amph micro-corpus consists of altogether 3404 occurrences of the four most common Finnish THINK lexemes, ajatella, miettiä, pohtia, and harkita 'think, reflect, ponder, consider'.

These occurrences have been extracted from a corpus consisting of two months worth (January–February 1995) of written text from Helsingin Sanomat (1995), Finland’s major daily newspaper, and six months worth (October 2002 – April 2003) of written discussion in the SFNET (2002-2003) Internet discussion forum, namely regarding (personal) relationships (sfnet.keskustelu.ihmissuhteet) and politics (sfnet.keskustelu.politiikka). The newspaper corpus consisted altogether of 3,304,512 words of body text, excluding headers and captions (as well as punctuation tokens), and included 1,750 representatives of the studied THINK verbs, whereas the Internet corpus comprised altogether 1,174,693 words of body text, excluding quotes of previous postings as well as punctuation tokens, adding up to 1,654 representatives of the studied THINK verbs. The individual overall frequencies among the studied THINK lexemes in the corpus were 1492 for ajatella, 812 for miettiä, 713 for pohtia, and 387 for harkita.

The corpus contents were first automatically syntactically and morphologically analyzed using a computational implementation of Functional Dependency Grammar (Tapanainen and Järvinen, 1997, Järvinen and Tapanainen 1997) for Finnish, namely the FI-FDG parser (Connexor 2007). After this, all the instances of the studied THINK lexemes together with their syntactic arguments were manually validated and corrected, if necessary, and subsequently supplemented with semantic classifications. In addition, some extra-linguistic features (newspaper section or specific newsgroup, author ID when available, unique document index) are incorporated, when they could be identified and extracted from the original corpora.

The amph micro-corpus contains for each occurrence of the selected four THINK verbs in the original research corpora all relevant contextual features, including the verb itself, analyzed at the aforementioned morphological, syntactic and semantic levels in the immediate sentential context, as well as all pertinent extralinguistic features. In addition, the amph micro-corpus includes scripts for processing this data, R functions for its statistical analysis, as well as a comprehensive set of the ensuing results as R format data tables.

For a more detailed description of the corpus see https://www.kielipankki.fi/corpora/amph/

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CLARIN ACA - NC

Restrictions: Academic - Non Commercial Use, Attribution, No Redistribution

User Nature: Academic

Attribution Details: See Documentation section.

Execution location: hidden

Licensors:

Antti Arppe

Distribution rights holders:

CSC - Tieteen tietotekniikan keskus Oy , CSC — IT Center for Science Ltd

IPR Holder

Antti Arppe

Contact Person

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland

text

Monolingual text corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

777,288 Kb

Modalities

Written Language

Time Coverage

1995-2003

Metadata

Created: 02/13/2015

Last Updated: 09/28/2023

Revision: Link to resource group page added

Metadata Creator

Ute Dieckmann

Imre Bartis

Usage

Foreseen UseHuman Use

Use NLP Specific: Linguistic Research

Actual Use - Human Use

Use NLP Specific: Linguistic Research

Relation

Related Resource: SFNET Corpus http://urn.fi/urn:nb...

Relation Type: IsDerivedFrom

Related Resource: Finnish Text Collection http://urn.fi/urn:nb...

Relation Type: IsDerivedFrom

Related Resource: amph-Corpus, Helsinki Korp Version http://urn.fi/urn:nb...

Relation Type: IsOriginalFormOf

Documentation

Resource group page: http://urn.fi/urn:nb...

How to cite: www.kielipankki.fi/v...

People who looked at this resource also viewed the following: