Langchain-Chatchat/test_image.py at 6a273501ee281ab264064dbb56a672bbde3b8763 - Langchain-Chatchat - Gitea4PDT

RYDE-WORK/Langchain-Chatchat

mirror of https://github.com/RYDE-WORK/Langchain-Chatchat.git synced 2026-01-26 16:53:36 +08:00

zhenkaivip d2716addd6

使用paddleocr实现实现UnstructuredPaddlePDFLoader和UnstructuredPaddleImageLoader (#344 )

* jpg and png ocr

* fix

* write docs to tmp file

* fix

* image loader

* fix

* fix

* add pdf_loader

* fix

* update INSTALL.md

---------

Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>

2023-05-13 11:13:40 +08:00

13 lines

297 B

Python

Raw Blame History

 from configs.model_config import *
 import nltk
 nltk.data.path = [NLTK_DATA_PATH] + nltk.data.path
 filepath = "./img/test.jpg"
 from loader import UnstructuredPaddleImageLoader
 loader = UnstructuredPaddleImageLoader(filepath, mode="elements")
 docs = loader.load()
 for doc in docs:
     print(doc)