Langchain-Chatchat/test_pdf.py
zhenkaivip dd93837343
使用paddleocr实现 (#342)
* jpg and png ocr

* fix

* write docs to tmp file

* fix

* [BUGFIX] local_doc_qa.py line 172: logging have no end args. (#323)

* image loader

* fix

* fix

* update api.py

* update api.py

* update api.py

* update README.md

* update api.py

* add pdf_loader

* fix

---------

Co-authored-by: RainGather <3255329+RainGather@users.noreply.github.com>
Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>
2023-05-13 08:45:17 +08:00

13 lines
292 B
Python

from configs.model_config import *
import nltk
nltk.data.path = [NLTK_DATA_PATH] + nltk.data.path
filepath = "docs/test.pdf"
from loader import UnstructuredPaddlePDFLoader
loader = UnstructuredPaddlePDFLoader(filepath, mode="elements")
docs = loader.load()
for doc in docs:
print(doc)