The Newspaper and Periodical OCR Corpus of the National Library of Finland (1875-1920)

View resource name in all available languages

Kansalliskirjaston sanoma- ja aikakauslehtikokoelman OCR-korpus (1875-1920)

Digilib-1920-dl

Persistent Identifier of this resource:

http://urn.fi/urn:nbn:fi:lb-201801192

Access location:

The corpus is available in Kielipankki - the Language Bank of Finland (download link: [https://korp.csc.fi/download/Digilib/1875_1920/)

This corpus consists of the OCR results of the material published from 1875 to 1920 in the corpus of publications digitized by the National Library of Finland. Note that parts of the resource are copyright-protected.

Change Log:
2018-04-10: README.txt, license.txt added to zip file.

The full corpus, as FIN-CLARIN has it, is organized in eleven branches named arc01, ..., arc11. Each document is stored as a zip archive containing scanned image files in different resolutions, and the OCR results as XML documents. This distribution has the same structure but contains only the OCR results.

Each of the distribution files arc01.zip, ..., arc11.zip contains the material extracted from one branch of the full corpus. The distribution file "digilib_pub_1875-1920_every.zip" contains all 11 branches in one archive.

Change log:
10.4.2018: README.txt, license.txt added to zip file.
20.4.2018: Short name corrected: Digilib-Pub-1920-dl > Digilib-1920-dl

View resource description in all available languages

Kansalliskirjaston sanoma- ja aikakauslehtikokoelman OCR-korpus (1875-1920) on ladattavissa Kielipankin latauspalvelusta: [https://korp.csc.fi/download/Digilib/1875_1920/]

Tämä korpus koostuu vuosien 1875 ja 1920 välillä julkaistujen, Kansalliskirjaston digitoimien sanoma- ja aikakauslehtien OCR-tuloksista. Osa korpuksesta on tekijänoikeuksien alaista, eikä se siksi ole levitettävissä vapaasti.

Muutosloki:
10.4.2018: README.txt, license.txt lisätty zip-tiedostoon.
20.4.2018: Lyhytnimi korjattu Digilib-Pub-1920-dl > Digilib-1920-dl

You don’t have the permission to edit this resource.