FinEst BERT – META-SHARE

Last view: 2024-04-23

203 Last view: 2024-04-23

Last update: 2021-09-15

7 Last update: 2021-09-15

FinEst BERT

finestbert

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2020061201

Access location: http://urn.fi/urn:nbn:fi:lb-2020061202

Bidirectional Encoder Representations from Transformers (BERT) multilingual model trained from scratch, covering three languages: Finnish, Estonian, and English. Used for various NLP classification tasks on the mentioned three languages, supporting both monolingual and multilingual/crosslingual (knowledge transfer) tasks. Whole-word masking used during data preparation and training; trained for 40 epochs with sequence length 128 and another 4 epochs with sequence length 512. FinEst BERT model published here is in pytorch format.

Corpora used:
Finnish - STT articles, CoNLL 2017 shared task, Ylilauta downloadable version;
Estonian - Ekspress Meedia articles, CoNLL 2017 shared task
English - English wikipedia

More information in the article "FinEst BERT and CroSloEngual BERT: less is more in multilingual models" by Matej Ulčar and Marko Robnik-Šikonja, published in the proceedings of the TSD 2020 conference.

"FinEst BERT" model by Matej Ulčar and Marko Robnik-Šikonja is published under Creative Commons Attribution 4.0 International (CC BY 4.0) license.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

CC - BY

Licensors:

University of Ljubljana

Distribution rights holders:

University of Ljubljana

IPR Holder

University of Ljubljana

Contact Person

User support FIN-CLARIN

text

Lexical Conceptual Resource General Information

Other

Encoding

Theoretic model: BERT

Encoding level: Other

Multilingual text lexicalConceptualResourceLanguages

Estonian Finnish English

Linguality

Linguality type: Multilingual

Multi-linguality type: Other

Size

74,986 Tokens

Metadata

Created: 06/12/2020

Last Updated: 09/15/2021

Metadata Language: English (en)

Revision: Link to resource group page added

Metadata Creator

Tommi Jauhiainen

Documentation

Resource group page: http://urn.fi/urn:nb...

Attribution details https://www.kielipan...

People who looked at this resource also viewed the following: