Langchain-Chatchat/README_en.md
liunux4odoo d316efe8d3
release 0.2.6 (#1815)
## 🛠 新增功能

- 支持百川在线模型 (@hzg0601 @liunux4odoo in #1623)
- 支持 Azure OpenAI 与 claude 等 Langchain 自带模型 (@zRzRzRzRzRzRzR in #1808)
- Agent 功能大量更新,支持更多的工具、更换提示词、检索知识库 (@zRzRzRzRzRzRzR in #1626 #1666 #1785)
- 加长 32k 模型的历史记录 (@zRzRzRzRzRzRzR in #1629 #1630)
- *_chat 接口支持 max_tokens 参数 (@liunux4odoo in #1744)
- 实现 API 和 WebUI 的前后端分离 (@liunux4odoo in #1772)
- 支持 zlilliz 向量库 (@zRzRzRzRzRzRzR in #1785)
- 支持 metaphor 搜索引擎 (@liunux4odoo in #1792)
- 支持 p-tuning 模型 (@hzg0601 in #1810)
- 更新完善文档和 Wiki (@imClumsyPanda @zRzRzRzRzRzRzR @glide-the in #1680 #1811)

## 🐞 问题修复

- 修复 bge-* 模型匹配超过 1 的问题 (@zRzRzRzRzRzRzR in #1652)
- 修复系统代理为空的问题 (@glide-the in #1654)
- 修复重建知识库时 `d == self.d assert error` (@liunux4odoo in #1766)
- 修复对话历史消息错误 (@liunux4odoo in #1801)
- 修复 OpenAI 无法调用的 bug (@zRzRzRzRzRzRzR in #1808)
- 修复 windows下 BIND_HOST=0.0.0.0 时对话出错的问题 (@hzg0601 in #1810)
2023-10-20 23:16:06 +08:00

174 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

![](img/logo-long-chatchat-trans-v2.png)
🌍 [中文文档](README.md)
📃 **LangChain-Chatchat** (formerly Langchain-ChatGLM):
A LLM application aims to implement knowledge and search engine based QA based on Langchain and open-source or remote
LLM API.
---
## Table of Contents
- [Introduction](README.md#Introduction)
- [Pain Points Addressed](README.md#Pain-Points-Addressed)
- [Quick Start](README.md#Quick-Start)
- [1. Environment Setup](README.md#1-Environment-Setup)
- [2. Model Download](README.md#2-Model-Download)
- [3. Initialize Knowledge Base and Configuration Files](README.md#3-Initialize-Knowledge-Base-and-Configuration-Files)
- [4. One-Click Startup](README.md#4-One-Click-Startup)
- [5. Startup Interface Examples](README.md#5-Startup-Interface-Examples)
- [Contact Us](README.md#Contact-Us)
- [List of Partner Organizations](README.md#List-of-Partner-Organizations)
## Introduction
🤖️ A Q&A application based on local knowledge base implemented using the idea
of [langchain](https://github.com/hwchase17/langchain). The goal is to build a KBQA(Knowledge based Q&A) solution that
is friendly to Chinese scenarios and open source models and can run both offline and online.
💡 Inspried by [document.ai](https://github.com/GanymedeNil/document.ai)
and [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216) , we build a local knowledge base question
answering application that can be implemented using an open source model or remote LLM api throughout the process. In
the latest version of this project, [FastChat](https://github.com/lm-sys/FastChat) is used to access Vicuna, Alpaca,
LLaMA, Koala, RWKV and many other models. Relying on [langchain](https://github.com/langchain-ai/langchain) , this
project supports calling services through the API provided based on [FastAPI](https://github.com/tiangolo/fastapi), or
using the WebUI based on [Streamlit](https://github.com/streamlit/streamlit).
✅ Relying on the open source LLM and Embedding models, this project can realize full-process **offline private
deployment**. At the same time, this project also supports the call of OpenAI GPT API- and Zhipu API, and will continue
to expand the access to various models and remote APIs in the future.
⛓️ The implementation principle of this project is shown in the graph below. The main process includes: loading files ->
reading text -> text segmentation -> text vectorization -> question vectorization -> matching the `top-k` most similar
to the question vector in the text vector -> The matched text is added to `prompt `as context and question -> submitted
to `LLM` to generate an answer.
📺[video introdution](https://www.bilibili.com/video/BV13M4y1e7cN/?share_source=copy_web&vd_source=e6c5aafe684f30fbe41925d61ca6d514)
![实现原理图](img/langchain+chatglm.png)
The main process analysis from the aspect of document process:
![实现原理图2](img/langchain+chatglm2.png)
🚩 The training or fined-tuning are not involved in the project, but still, one always can improve performance by do
these.
🌐 [AutoDL image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5) is supported, and in v9 the codes are update
to v0.2.5.
🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)
## Pain Points Addressed
This project is a solution for enhancing knowledge bases with fully localized inference, specifically addressing the
pain points of data security and private deployments for businesses.
This open-source solution is under the Apache License and can be used for commercial purposes for free, with no fees
required.
We support mainstream local large prophecy models and Embedding models available in the market, as well as open-source
local vector databases. For a detailed list of supported models and databases, please refer to
our [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/)
## Quick Start
### Environment Setup
First, make sure your machine has Python 3.10 installed.
```
$ python --version
Python 3.10.12
```
Then, create a virtual environment and install the project's dependencies within the virtual environment.
```shell
# 拉取仓库
$ git clone https://github.com/chatchat-space/Langchain-Chatchat.git
# 进入目录
$ cd Langchain-Chatchat
# 安装全部依赖
$ pip install -r requirements.txt
$ pip install -r requirements_api.txt
$ pip install -r requirements_webui.txt
# 默认依赖包括基本运行环境FAISS向量库。如果要使用 milvus/pg_vector 等向量库,请将 requirements.txt 中相应依赖取消注释再安装。
```
### Model Download
If you need to run this project locally or in an offline environment, you must first download the required models for
the project. Typically, open-source LLM and Embedding models can be downloaded from HuggingFace.
Taking the default LLM model used in this project, [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b), and
the Embedding model [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base) as examples:
To download the models, you need to first
install [Git LFS](https://docs.github.com/zh/repositories/working-with-files/managing-large-files/installing-git-large-file-storage)
and then run:
```Shell
$ git lfs install
$ git clone https://huggingface.co/THUDM/chatglm2-6b
$ git clone https://huggingface.co/moka-ai/m3e-base
```
### Initializing the Knowledge Base and Config File
Follow the steps below to initialize your own knowledge base and config file:
```shell
$ python copy_config_example.py
$ python init_database.py --recreate-vs
```
### One-Click Launch
To start the project, run the following command:
```shell
$ python startup.py -a
```
### Example of Launch Interface
1. FastAPI docs interface
![](img/fastapi_docs_026.png)
2. webui page
- Web UI dialog page:
![img](img/LLM_success.png)
- Web UI knowledge base management page:
![](img/init_knowledge_base.jpg)
### Note
The above instructions are provided for a quick start. If you need more features or want to customize the launch method,
please refer to the [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/).
---
## Contact Us
### Telegram
[![Telegram](https://img.shields.io/badge/Telegram-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white "langchain-chatglm")](https://t.me/+RjliQ3jnJ1YyN2E9)
### WeChat Group、
<img src="img/qr_code_67.jpg" alt="二维码" width="300" height="300" />
### WeChat Official Account
<img src="img/official_wechat_mp_account.png" alt="图片" width="900" height="300" />