The Document Imaging SDK installer installs only the English language files for the OCR. The other languages can be installed by the Document Imaging OCR language pack. The language files are placed in the “Bin\BiOCR\tessdata”folder of the document imaging.
Download the OCR Language Pack for Document Imaging SDK version 12.70 or higher
Download the OCR Language Pack for Document Imaging SDK version 12.65 or older
The language file names start with the language code (see the codes in the table below). Every language has at least one language file ([language code].traineddata). Some languages have more than one language files. Note: the osd.traineddata is not a language file (osd.traineddata need for orientation detection).
The language files are automatically copied to the Document Imaging installation directory with the OCR Language Pack.
For example, to use German language for the OCR, the German language files have to be placed in the “Bin\BiOCR\tessdata” folder (which is default in the OCR Language Pack installation) and the German language code “deu” have to be used in the OCR functions.
For example, the following files belongs to the English language:
eng.traineddata
eng.user-patterns
eng.user-words
Please note, that some languages have additional files in the script subfolder. For example, files for the Japanese language:
jpn.traineddata
jpn_vert.traineddata
Script\Japanese.traineddata
Script\Japanese_vert.traineddata
The following table contains the available languages for the OCR.
Language code (ISO 639-3) |
Language name |
afr |
Afrikaans |
sqi |
Albanian |
grc |
Ancient Greek Language |
ara |
Arabic |
aze |
Azerbaijani |
eus |
Basque |
bel |
Belarusian |
ben |
Bengali |
bul |
Bulgarian |
cat |
Catalan |
chr |
Cherokee |
chi_sim |
Chinese (Simplified) |
chi_tra |
Chinese (Traditional) |
hrv |
Croatian |
ces |
Czech |
dan |
Danish |
nld |
Dutch |
eng |
English |
epo |
Esperanto |
epo_alt |
Esperanto alternative |
est |
Estonian |
fin |
Finnish |
frk |
Frankish |
fra |
French |
glg |
Galician |
deu |
German |
ell |
Greek |
heb |
Hebrew |
hin |
Hindi |
hun |
Hungarian |
isl |
Icelandic |
ind |
Indonesian |
ita |
Italian |
ita_old |
Italian (Old) |
jpn |
Japanese |
kan |
Kannada |
kor |
Korean |
lav |
Latvian |
lit |
Lithuanian |
mkd |
Macedonian |
msa |
Malay |
mal |
Malayalam |
mlt |
Maltese |
enm |
Middle English (1100-1500) |
frm |
Middle French (ca. 1400-1600) |
nor |
Norwegian |
pol |
Polish |
por |
Portuguese |
ron |
Romanian |
rus |
Russian |
srp |
Serbian (Latin) |
slk |
Slovakian |
slv |
Slovenian |
spa |
Spanish |
spa_old |
Spanish (Old) |
swa |
Swahili |
swe |
Swedish |
tam |
Tamil |
tel |
Telugu |
tha |
Thai |
tur |
Turkish |
ukr |
Ukrainian |
vie |
Vietnamese |