Word embeddings trained with word2vec from the Suomi24 corpus

53 Last view: 2024-04-26

8 Last update: 2023-03-03

Word embeddings trained with word2vec from the Suomi24 corpus

View resource name in all available languages

word2vec-menetelmällä harjoitetut sanaupotukset Suomi24-korpuksesta

suomi24-wordvec

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2022061701

Access location: http://urn.fi/urn:nbn:fi:lb-2022061702

This package contains word embeddings trained with word2vec from Finnish Internet forum discussions from the Suomi24 corpus.

Instead of surface forms, the lemmas from text annotations were used. So inflected forms like "koiralta" are absent, and are instead all represented as the base form "koira".

The embedding file contains 633 758 entries. The dimension of the vector space is 128.

The embedding file is in a simple and easily parsed textual format produced by word2vec. The first line in the file gives the vocabulary size and dimension. Each line after that begins with a vocabulary item, followed by a space, followed by 128 floating point numbers (represented textually) each followed by a space. For efficient processing, look into converting this into a binary representation.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Start date: 03/03/2023

Licence

CC - BY

Restrictions: Attribution

Distribution Access/Medium: Downloadable

Licensors:

University of Helsinki

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland

IPR Holder

CSC - Tieteen tietotekniikan keskus Oy , CSC — IT Center for Science Ltd

Contact Person

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland

text

Lexical Conceptual Resource General Information

Other

Monolingual text lexicalConceptualResourceLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

633,758 Entries

Metadata

Created: 06/17/2022

Last Updated: 02/23/2023

Metadata Language: English (en)

Metadata Creator

Ute Dieckmann

Relation

Related Resource: http://urn.fi/urn:nb...

Relation Type: isDerivedFrom

Documentation

Resource group page: http://urn.fi/urn:nb...

Document Type: Other

Lisenssi (wordvec), License (wordvec), http://urn.fi/urn:nb...

Editor: FIN-CLARIN

How to cite: https://www.kielipan...

People who looked at this resource also viewed the following: