Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

401 Last view: 2024-04-24

31 Last update: 2023-03-15

Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

View resource name in all available languages

Hantin korpus (pohjoishantin aineistot ja käännökset) (UHLCS)

khanty-uhlcs

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-2014032613

The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access).

Location: /appl/data/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty

The Khanty computer corpus contains the following sub-corpora:

Khanty, Atlym dialect, 519 words, 3967 characters
Khanty, Kazym dialect, 62766 words, 585659 characters
Khanty, Konda dialect, 1115 words, 10234 characters
Khanty, Nizjam dialect, 17681 words, 259732 characters
Khanty, Obdorsk dialect, 10939 words, 200358 characters
Khanty, Synja dialect, 10939 words, 200358 characters.

The corpora of the Khanty dialects are samples taken from the following text collections:

Rédei, Károly (1968).
Nord-ostjakische Texte (Kazym-Dialekt) mit Skizze der Grammatik.
Gesammelt und herausgegeben von Károly Rédei. Abhandlung der Akademie
der Wissenschaften in Göttingen, philologisch-historische Klasse, dritte Folge 71.
Göttingen.

Steinitz, Wolfgang (1989).
Ostjakologische Arbeiten III. Texte aus dem Nachlass.
Eds.: Hartung, Liselotte, Hauel, Petra, Sauer, Gert & Schulze, Birgitte.
Janua Linguarum, Series Practica 256.
Mouton de Gruyter, Berlin.

Vértes, Edith (1980).
H. Paasonens südostjakische Textsammlungen.
Suomalais-Ugrilaisen Seuran Toimituksia 175.
Suomalais-Ugrilainen Seura, Helsinki.

The corpora are running texts and several corpora are morphologically analyzed. Morphologically encoded words of the texts are in the word-per-line format, and the plain texts are in sentence-per-line format. There are also texts in which the clauses and the sentences are marked with the information about the location of the sentences in the texts.

Khanty, Textbook:
Rugin, R.P. (1990).
Shum jôxan sjun'öng xâtLöt.
(Shchastlivye den'ki na Shum-jugane.) [Onnellisia päiviä Shum-joella.]
Kniga dlja dopol'nitel'nogo chtenija v 3-4 klassax xantyjskix shkol (shuryshkarskij dialekt).
Prosveshchenie, Leningrad.

The text includes six different versions: (1) one version edited in the original form by using the Cyrillic alphabet; (2) the same text as transformed to the Latin alphabet; the same text as translated into (3) Finnish, (4) English and (5) Russian, and (6) the original text in the Latin format as morphologically coded and translated into English.

Children's books:

Life of Jesus in Khanty (the Kazim dialect). (Trial edition).
Translation: Nyomysova, Yevdokiya Andreyevna &
Lozyamova, Zoya Nikiforovna.
ISBN 952-9790-25-2, ISBN 91-88394-97-2. 63 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1995.

Life of Jesus in Khanty (the Kazim dialect). (Second edition).
Translation: Nyomysova, Yevdokiya Andreyevna &
Lozyamova, Zoya Nikiforovna.
ISBN 952-9790-40-6, ISBN 91-88794-83-0. 63 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1997.

The computer corpora on the Khanty dialects, and the textbook were compiled and edited by Merja Salo with the financial support of the Academy of Finland. The adaptation of the texts for public use was done with the financial support of the Department of General Linguistics, University of Helsinki. The books of children were donated to the University of Helsinki by the Institute for Bible Translation, Helsinki and Stockholm.

The Khanty Corpus is a part of the UHLCS corpus collection.

UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).

License details: http://urn.fi/urn:nbn:fi:lb-20150304115
Detailed information:
http://urn.fi/urn:nbn:fi:lb-2014060214
http://www.ling.helsinki.fi/uhlcs/metadata/corpus-metadata/uralic-lgs/ugric-lgs/khanty

The purpose of the resource use must be outlined in a research plan.

log
25.11.2018 link http://islrn.org/resources/156-041-809-270-6 removed

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CLARIN RES

Restrictions: Academic - Non Commercial Use, No Redistribution, Other

User Nature: Academic

Attribution Details: See Documentation section.

Distribution Access/Medium: Accessible Through Interface

Licensors:

Pirkko Suihkonen

Distribution rights holders:

CSC - Tieteen tietotekniikan keskus Oy , CSC — IT Center for Science Ltd

University of Helsinki

IPR Holder

Pirkko Suihkonen

Contact Person

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland

text

Multilingual text corpusLanguages

Finnish Khanty Russian English

Linguality

Linguality type: Multilingual

Multi-linguality type: Parallel

Size

161,224 Words

Modalities

Written Language

Resource Creation

Resource Creator

Merja Salo

Metadata

Created: 10/16/2012

Last Updated: 03/15/2023

Metadata Language: English (en)

Revision: short-name, links to group page and license added

Metadata Creator

Ute Dieckmann

Imre Bartis

Relation

Related Resource: http://metashare.csc...

Relation Type: sub-corpus of

Related Resource: Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS), Helsinki Korp Version http://urn.fi/urn:nb...

Relation Type: IsOriginalFormOf

Documentation

License: http://urn.fi/urn:nb...

Document Type: Other

Attribution Details, https://www.kielipan...

Resource group page: http://urn.fi/urn:nb...

People who looked at this resource also viewed the following:

Resources from the same creators