OCR Languages

The Document Imaging SDK installer installs only the English language files for the OCR. The other languages can be installed by the Document Imaging OCR language pack. The language files are placed in the “Bin\BiOCR\tessdata”folder of the document imaging.

Download the OCR Language Pack for Document Imaging SDK version 12.70 or higher

Download the OCR Language Pack for Document Imaging SDK version 12.65 or older

 


The language file names start with the language code (see the codes in the table below). Every language has at least one language file ([language code].traineddata). Some languages have more than one language files. Note: the osd.traineddata is not a language file (osd.traineddata need for orientation detection).

The language files are automatically copied to the Document Imaging installation directory with the OCR Language Pack.

For example, to use German language for the OCR, the German language files have to be placed in the “Bin\BiOCR\tessdata” folder (which is default in the OCR Language Pack installation) and the German language code “deu” have to be used in the OCR functions.

For example, the following files belongs to the English language:

eng.traineddata

eng.user-patterns

eng.user-words

 

Please note, that some languages have additional files in the script subfolder. For example, files for the Japanese language:

jpn.traineddata

jpn_vert.traineddata

Script\Japanese.traineddata

Script\Japanese_vert.traineddata

 

The following table contains the available languages for the OCR.

Language code (ISO 639-3)

Language name

afr

Afrikaans

sqi

Albanian

grc

Ancient Greek Language

ara

Arabic

aze

Azerbaijani

eus

Basque

bel

Belarusian

ben

Bengali

bul

Bulgarian

cat

Catalan

chr

Cherokee

chi_sim

Chinese (Simplified)

chi_tra

Chinese (Traditional)

hrv

Croatian

ces

Czech

dan

Danish

nld

Dutch

eng

English

epo

Esperanto

epo_alt

Esperanto alternative

est

Estonian

fin

Finnish

frk

Frankish

fra

French

glg

Galician

deu

German

ell

Greek

heb

Hebrew

hin

Hindi

hun

Hungarian

isl

Icelandic

ind

Indonesian

ita

Italian

ita_old

Italian (Old)

jpn

Japanese

kan

Kannada

kor

Korean

lav

Latvian

lit

Lithuanian

mkd

Macedonian

msa

Malay

mal

Malayalam

mlt

Maltese

enm

Middle English (1100-1500)

frm

Middle French (ca. 1400-1600)

nor

Norwegian

pol

Polish

por

Portuguese

ron

Romanian

rus

Russian

srp

Serbian (Latin)

slk

Slovakian

slv

Slovenian

spa

Spanish

spa_old

Spanish (Old)

swa

Swahili

swe

Swedish

tam

Tamil

tel

Telugu

tha

Thai

tur

Turkish

ukr

Ukrainian

vie

Vietnamese