mirror of https://github.com/RYDE-WORK/Langchain-Chatchat.git synced 2026-01-28 01:33:17 +08:00

使用paddleocr实现实现UnstructuredPaddlePDFLoader和UnstructuredPaddleImageLoader (#344 )

* jpg and png ocr

* fix

* write docs to tmp file

* fix

* image loader

* fix

* fix

* add pdf_loader

* fix

* update INSTALL.md

---------

Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>

2023-05-13 11:13:40 +08:00

1.1 KiB

Raw Blame History

安装

环境检查

# 首先，确信你的机器安装了 Python 3.8 及以上版本
$ python --version
Python 3.8.13

# 如果低于这个版本，可使用conda安装环境
$ conda create -p /your_path/env_name python=3.8

# 激活环境
$ source activate /your_path/env_name

# 关闭环境
$ source deactivate /your_path/env_name

# 删除环境
$ conda env remove -p  /your_path/env_name

项目依赖

# 拉取仓库
$ git clone https://github.com/imClumsyPanda/langchain-ChatGLM.git

# 进入目录
$ cd langchain-ChatGLM

# 使用paddleocr需要卸载detectron2避免tools冲突
$ pip uninstall detectron2

# 安装依赖
$ pip install -r requirements.txt

# 验证paddleocr是否成功，首次运行会下载约18M模型到~/.paddleocr
$ python test_image.py

注：使用 langchain.document_loaders.UnstructuredFileLoader 进行非结构化文件接入时，可能需要依据文档进行其他依赖包的安装，请参考 langchain 文档。

1.1 KiB Raw Blame History Unescape Escape

安装

环境检查

项目依赖

1.1 KiB

Raw Blame History