The Corpus of Spoken and Written Ter Saami


Ter Saami belongs to the peninsular subbranch of East-Saamic and is not used in daily communication today.

Ter Saami used to be spoken in the eastern inland parts and the eastern coastal parts of the Kola Peninsula. But today there are no children learning the language at home; the handful of speakers left belong to the grandparent generation and live scattered around on the Kola Peninsula and even in other parts of Russia.

Ter Saami has no standardized written form. However, Ter Saami written texts have occasionally been printed using either Cyrillic or Latin script or even phonemic transcription.

This corpus contains spoken and written samples of Ter Saami. The written texts originate from a small booklet with Ter Saami poems by Oktyabrina Voronova from 1989, Pushkin's "Tale of the Fisherman and the Fish" translated into Ter Saami and published as a phonological transcription in 1971 as well as a few other small texts, among them Ter Saami words, phrases and sentences in Latin script from Chernyakov's small manuscript for a Ter Saami primer from 1929. The spoken text samples included in this corpus originate from recordings collected and transcribed by different researchers since the 1850s.

Whereas all texts in this corpus should be represented in a uniform orthographic variant, i.e. the contemporary Kildin Saami alphabet with slight modifications, in order to make corpus searches easier, the unification of the different original orthographies is still in the works. Note also that the current version of the corpus includes only the orthographic representation, but no additional morphosyntactic annotations. However, a new version of the corpus will be annotated for parts-of-speech. In the future we also plan to annotate the corpus morphologically and syntactically.

All data - including audio and video files, if there exist linked multmedia data - аre also available from the DoBeS archive of the Kola Saami Documentation Project ( at The Language Archive ( But note that the access to raw data and annotations through the archive might be restricted.

Scrambled sentences from the corpus will be made available in Korp (

