ocr_study

A simple project to demo how to use python ocr lib to recognize text in pdf file

How to install:

set env virable:

virable name: TESSDATA_PREFIX
virable value: D:/tessdata

download language data for tesseract:

d:
cd tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata

wiki url
donwload url
extract the downloaded file to "D:/tools/poppler/"
Set env path:
- add "D:/tools/poppler/Release-24.07.0-0/poppler-24.07.0/Library/bin" to PATH
test installation:
```
pdftoppm -v
```

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
0.pdf		0.pdf
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt