简介
Tesseract-OCR,一款由HP实验室开发并由Google维护的开源光学字符识别引擎,支持全球100多种语言,并且支持样本训练。
准备
依赖
1
| yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel
|
下载
1 2
| wget https://github.com/tesseract-ocr/tesseract/archive/4.1.0.tar.gz wget https://github.com/DanBloomberg/leptonica/releases/download/1.78.0/leptonica-1.78.0.tar.gz
|
安装
Leptonica
1.编译安装
1 2 3 4
| tar -xzvf leptonica-1.78.0.tar.gz cd leptonica-1.78.0.tar.gz ./configure --profix=/usr/local/leptonica make && make install
|
2.环境变量
1 2
| #打开 /etc/profile 并 追加以下配置 vi /etc/profile
|
1 2 3 4 5 6 7 8 9 10 11 12
| PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/leptonica/lib/pkgconfig export PKG_CONFIG_PATH CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/leptonica/include/leptonica export CPLUS_INCLUDE_PATH C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/leptonica/include/leptonica export C_INCLUDE_PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib export LD_LIBRARY_PATH LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib export LIBRARY_PATH LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica export LIBLEPT_HEADERSDIR
|
1 2
| #应用配置 source /etc/profile
|
Tesseract
编译安装
1 2 3 4
| tar -xzvf 4.1.0.tar.gz cd tesseract-4.1.0 ./configure --profix=/usr/local/tesseract make && make install
|
环境变量
1 2
| #打开 /etc/profile 并 追加以下配置 vi /etc/profile
|
1 2
| PATH=$PATH:/usr/local/tesseract/bin export PATH
|
1 2
| #应用配置 source /etc/profile
|
语言
1 2 3 4 5 6 7
| #所有语言 https://github.com/tesseract-ocr/tessdata #下载语言(以英语为例) cd /usr/local/tesseract/share/tessdata wget https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata #查看可用语言 tesseract --list-langs
|
完成
1 2 3 4
| #版本 tesseract -v #测试 tesseract test.jpg stdout -l eng
|