ELFA Corpus

View resource name in all available languages

ELFA-korpus

ELFA

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-201403262

The corpus is available in Kielipankki - the Language Bank of Finland in several versions (see Related resources).

Altogether, the ELFA (English as a Lingua Franca in Academic Settings) corpus contains 1 million words of transcribed spoken academic ELF (approximately 131 hours of recorded speech). The data consists of both recordings and their transcripts. The transcripts are publicly available, but the audio files require individual access permissions that can be granted on the basis of a research plan. The recordings were made at the University of Tampere, the University of Helsinki, Tampere University of Technology, and Helsinki University of Technology.

The speech events in the corpus include both monologic events, such as lectures and presentations (33 % of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%).

As for the disciplinary domains, the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%).

Also the speakers in ELFA represent a wide range of first language backgrounds as the data comprises approximately 650 speakers with 51 different first languages ranging from African languages (e.g. Akan, Dagbani, Igbo, Kikuyu, Somali, Swahili), to Asian (e.g. Arabic, Bengali, Chinese, Hindi, Japanese, Persian, Turkish, Uzbek), and European languages (e.g. Czech, Danish, Dutch, French, German, Italian, Lithuanian, Polish, Portuguese, Russian, Romanian, Swedish etc.).The percentage of speech by native English speakers is 5%. Also, considering that the recordings were made in Finnish speaking universities, the percentage of speech by Finnish mother tongue speakers is relatively low at 28.5%.

IMPORTANT: This corpus contains personal data. By using the corpus, you agree to follow the guidelines of the Language Bank of Finland (for the link, see Documentation).

For detailed information on the license of the resource see http://urn.fi/urn:nbn:fi:lb-2016042203 (in Finnish: http://urn.fi/urn:nbn:fi:lb-2016042204)

You don’t have the permission to edit this resource.