Donate Speech Corpus: Training data (100h)
View resource name in all available languages
Lahjoita puhetta -aineisto: Opetusdata (100h)
Persistent Identifier of this resource:
This resource is available for download in Kielipankki - The Language Bank of Finland as part of "Donate Speech: Selected dataset", http://urn.fi/urn:nbn:fi:lb-2022060127.
The resource contains a subset of 100 hours of transcribed speech that was selected from the Donate Speech Corpus and used for training an ASR system at Aalto University.
The training data includes speech from 1129 different speakers (according to the metadata accompanying the original recordings). Note that the training dataset has just over 20% of male speakers, whereas the puhelahjat-test and puhelahjat-dev sets contain 40% of male speakers.
For speech technology development purposes, the training dataset can be used together with the puhelahjat-test and puhelahjat-dev datasets. There is no overlap of speakers between these three sets.
People who looked at this resource also viewed the following: