Aalto University Automatic Speech Recognition System 2.1

Aalto-yliopiston automaattinen puheentunnistin 2.1


Aalto-ASR version 2 is a toolkit that provides functionalities for automatic speech recognition from audio files and for automatic forced alignment of text and speech (i.e., for aligning a manually written plain-text transcript with the corresponding audio file).

The current version includes models for recognizing Finnish speech and for aligning speech recordings with transcripts in Finnish, Swedish or Northern Sami. To some extent, the aligner might also be helpful for aligning transcripts in some other languages.

Aalto-ASR is currently available on the Puhti computing environment as a loadable module. The corresponding Docker container is also available for more advanced users who wish to install the software on their local system. For further instructions, please see Documentation.

Note that this version is still being tested and the instructions are not yet complete. You are welcome to contact FIN-CLARIN if you find bugs. Note, however, that the quality of the results of automatic speech recognition will depend strongly on the technical quality of the original recording and on the speech genre it contains. The current models for Finnish tend to work best with relatively standard spoken Finnish and with only one speaker per recording. Special vocabulary, loan words or colloquial expressions may not be correctly recognized, however. In case the recording contains a lot of background noise and/or there are several speakers, the result may not be useful for your purpose. In order to know how well the system performs on your data, you are advised to test it.

