Langchain-Chatchat/test_image.py
zhenkaivip d2716addd6
使用paddleocr实现实现UnstructuredPaddlePDFLoader和UnstructuredPaddleImageLoader (#344)
* jpg and png ocr

* fix

* write docs to tmp file

* fix

* image loader

* fix

* fix

* add pdf_loader

* fix

* update INSTALL.md

---------

Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>
2023-05-13 11:13:40 +08:00

13 lines
297 B
Python

from configs.model_config import *
import nltk
nltk.data.path = [NLTK_DATA_PATH] + nltk.data.path
filepath = "./img/test.jpg"
from loader import UnstructuredPaddleImageLoader
loader = UnstructuredPaddleImageLoader(filepath, mode="elements")
docs = loader.load()
for doc in docs:
print(doc)