Langchain-Chatchat/test_pdf.py at 82f1b7f2e3e7ed4ba530156e4cfea5ec4d1e90c9 - Langchain-Chatchat - Gitea4PDT

RYDE-WORK/Langchain-Chatchat

mirror of https://github.com/RYDE-WORK/Langchain-Chatchat.git synced 2026-01-28 01:33:17 +08:00

zhenkaivip d2716addd6

使用paddleocr实现实现UnstructuredPaddlePDFLoader和UnstructuredPaddleImageLoader (#344 )

* jpg and png ocr

* fix

* write docs to tmp file

* fix

* image loader

* fix

* fix

* add pdf_loader

* fix

* update INSTALL.md

---------

Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>

2023-05-13 11:13:40 +08:00

13 lines

292 B

Python

Raw Blame History

 from configs.model_config import *
 import nltk
 nltk.data.path = [NLTK_DATA_PATH] + nltk.data.path
 filepath = "docs/test.pdf"
 from loader import UnstructuredPaddlePDFLoader
 loader = UnstructuredPaddlePDFLoader(filepath, mode="elements")
 docs = loader.load()
 for doc in docs:
     print(doc)