Dictionary for Old Church Slavonic Optical Character Recognition
The specialized public database Dictionary for Old Church Slavonic Optical Character Recognition was created in the framework of the project Gorazd: An Old Church Slavonic Digital Hub (the project has been realized within the programme NAKI II of the Ministry of Culture of the Czech Republic, DG16P02H024, 2016-2020).
The goal of the database is to increase the successfulness in the optical character recognition (OCR) of printed Old Church Slavonic Cyrillic texts, i. e. dictionaries or editions. The database was developed during the digitization of the Old Church Slavonic Dictionary (OCSD) using the OCR method. The database is addressed for the use in the application ABBY FineReader 12 or a newer version.
The database includes more than 130 000 unique Old Church Slavonic lexical items excerpted from SJS. The database thus contains both normalized lemmas as well as the manuscript citations from both canonical Old Church Slavonic and later Church Slavonic texts.
User manual (for ABBYY FineReader 12):
- Unzip the contents of the package gorazd_ocr-1.zip.
- Launch the application ABBYY FineReader 12 and choose Nástroje in the main menu and then Jazykový editor.
- Open the dialogue Nový… and choose Vytvořit nový jazyk na základě existujícího jazyka. In the menu, choose Ruština (Starý Pravopis).
- Choose a name for the language, e. g. Old Church Slavonic.
- In the field Abeceda click on … and mark the characters that need to be recognized in the document. We recommend you to include into the alphabet just the characters that really appear in the text. This will increase the successfulness of the recognition.
- In the menu Slovník choose Uživatelský slovník and click on the button Upravit….
- Click on the button Importovat… and choose the file gorazd_ocr-1.txt. The loading of the file can take a while.
- As soon as the loading has finished, you can close the dialogue windows and choose your defined language (e. g. Old Church Slavonic) as Jazyk dokumentu.
- In the menu Nástroje, dialogue Možnosti, panel Číst mark the function Číst s výukou and allow the use of user models. This is necessary for the learning of the recognition of the characters not included in the source language.
The application ABBY FineReader 12 or a newer one.
© 2020, Institute of Slavonic Studies of the Czech Academy of Sciences.