liunux4odoo ba8d0f8e17
发版:v0.2.5 (#1620)
* 优化configs (#1474)

* remove llm_model_dict

* optimize configs

* fix get_model_path

* 更改一些默认参数,添加千帆的默认配置

* Update server_config.py.example

* fix merge conflict for #1474 (#1494)

* 修复ChatGPT api_base_url错误;用户可以在model_config在线模型配置中覆盖默认的api_base_url (#1496)

* 优化LLM模型列表获取、切换的逻辑: (#1497)

1、更准确的获取未运行的可用模型
2、优化WEBUI模型列表显示与切换的控制逻辑

* 更新migrate.py和init_database.py,加强知识库迁移工具: (#1498)

1. 添加--update-in-db参数,按照数据库信息,从本地文件更新向量库
2. 添加--increament参数,根据本地文件增量更新向量库
3. 添加--prune-db参数,删除本地文件后,自动清理相关的向量库
4. 添加--prune-folder参数,根据数据库信息,清理无用的本地文件
5. 取消--update-info-only参数。数据库中存储了向量库信息,该操作意义不大
6. 添加--kb-name参数,所有操作支持指定操作的知识库,不指定则为所有本地知识库
7. 添加知识库迁移的测试用例
8. 删除milvus_kb_service的save_vector_store方法

* feat: support volc fangzhou

* 使火山方舟正常工作,添加错误处理和测试用例

* feat: support volc fangzhou (#1501)

* feat: support volc fangzhou

---------

Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com>
Co-authored-by: liqiankun.1111 <liqiankun.1111@bytedance.com>

* 第一版初步agent实现 (#1503)

* 第一版初步agent实现

* 增加steaming参数

* 修改了weather.py

---------

Co-authored-by: zR <zRzRzRzRzRzRzR>

* 添加configs/prompt_config.py,允许用户自定义prompt模板: (#1504)

1、 默认包含2个模板,分别用于LLM对话,知识库和搜索引擎对话
2、 server/utils.py提供函数get_prompt_template,获取指定的prompt模板内容(支持热加载)
3、 api.py中chat/knowledge_base_chat/search_engine_chat接口支持prompt_name参数

* 增加其它模型的参数适配

* 增加传入矢量名称加载

* 1. 搜索引擎问答支持历史记录;
2. 修复知识库问答历史记录传参错误:用户输入被传入history,问题出在webui中重复获取历史消息,api知识库对话接口并无问题。

* langchain日志开关

* move wrap_done & get_ChatOpenAI from server.chat.utils to server.utils (#1506)

* 修复faiss_pool知识库缓存key错误 (#1507)

* fix ReadMe anchor link (#1500)

* fix : Duplicate variable and function name (#1509)

Co-authored-by: Jim <zhangpengyi@taijihuabao.com>

* Update README.md

* fix #1519: streamlit-chatbox旧版BUG,但新版有兼容问题,先在webui中作处理,并限定chatbox版本 (#1525)

close #1519

* 【功能新增】在线 LLM 模型支持阿里云通义千问 (#1534)

* feat: add qwen-api

* 使Qwen API支持temperature参数;添加测试用例

* 将online-api的sdk列为可选依赖

---------

Co-authored-by: liunux4odoo <liunux@qq.com>

* 处理序列化至磁盘的逻辑

* remove depends on volcengine

* update kb_doc_api: use Form instead of Body when upload file

* 将所有httpx请求改为使用Client,提高效率,方便以后设置代理等。 (#1554)

将所有httpx请求改为使用Client,提高效率,方便以后设置代理等。

将本项目相关服务加入无代理列表,避免fastchat的服务器请求错误。(windows下无效)

* update QR code

* update readme_en,readme,requirements_api,requirements,model_config.py.example:测试baichuan2-7b;更新相关文档

* 新增特性:1.支持vllm推理加速框架;2. 更新支持模型列表

* 更新文件:1. startup,model_config.py.example,serve_config.py.example,FAQ

* 1. debug vllm加速框架完毕;2. 修改requirements,requirements_api对vllm的依赖;3.注释掉serve_config中baichuan-7b的device为cpu的配置

* 1. 更新congif中关于vllm后端相关说明;2. 更新requirements,requirements_api;

* 增加了仅限GPT4的agent功能,陆续补充,中文版readme已写 (#1611)

* Dev (#1613)

* 增加了仅限GPT4的agent功能,陆续补充,中文版readme已写

* issue提到的一个bug

* 温度最小改成0,但是不应该支持负数

* 修改了最小的温度

* fix: set vllm based on platform to avoid error on windows

* fix: langchain warnings for import from root

* 修复webui中重建知识库以及对话界面UI错误 (#1615)

* 修复bug:webui点重建知识库时,如果存在不支持的文件会导致整个接口错误;migrate中没有导入CHUNK_SIZE

* 修复:webui对话界面的expander一直为running状态;简化历史消息获取方法

* 根据官方文档,添加对英文版的bge embedding的指示模板 (#1585)

Co-authored-by: zR <2448370773@qq.com>

* Dev (#1618)

* 增加了仅限GPT4的agent功能,陆续补充,中文版readme已写

* issue提到的一个bug

* 温度最小改成0,但是不应该支持负数

* 修改了最小的温度

* 增加了部分Agent支持和修改了启动文件的部分bug

* 修改了GPU数量配置文件

* 1

1

* 修复配置文件错误

* 更新readme,稳定测试

* 更改readme 0928 (#1619)

* 增加了仅限GPT4的agent功能,陆续补充,中文版readme已写

* issue提到的一个bug

* 温度最小改成0,但是不应该支持负数

* 修改了最小的温度

* 增加了部分Agent支持和修改了启动文件的部分bug

* 修改了GPU数量配置文件

* 1

1

* 修复配置文件错误

* 更新readme,稳定测试

* 更新readme

* fix readme

* 处理序列化至磁盘的逻辑

* update version number to v0.2.5

---------

Co-authored-by: qiankunli <qiankun.li@qq.com>
Co-authored-by: liqiankun.1111 <liqiankun.1111@bytedance.com>
Co-authored-by: zR <2448370773@qq.com>
Co-authored-by: glide-the <2533736852@qq.com>
Co-authored-by: Water Zheng <1499383852@qq.com>
Co-authored-by: Jim Zhang <dividi_z@163.com>
Co-authored-by: Jim <zhangpengyi@taijihuabao.com>
Co-authored-by: imClumsyPanda <littlepanda0716@gmail.com>
Co-authored-by: Leego <leegodev@hotmail.com>
Co-authored-by: hzg0601 <hzg0601@163.com>
Co-authored-by: WilliamChen-luckbob <58684828+WilliamChen-luckbob@users.noreply.github.com>
2023-09-28 23:30:21 +08:00

532 lines
18 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import pydantic
from pydantic import BaseModel
from typing import List
from fastapi import FastAPI
from pathlib import Path
import asyncio
from configs import (LLM_MODEL, LLM_DEVICE, EMBEDDING_DEVICE,
MODEL_PATH, MODEL_ROOT_PATH, ONLINE_LLM_MODEL,
logger, log_verbose,
FSCHAT_MODEL_WORKERS, HTTPX_DEFAULT_TIMEOUT)
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from langchain.chat_models import ChatOpenAI
import httpx
from typing import Literal, Optional, Callable, Generator, Dict, Any, Awaitable, Union
thread_pool = ThreadPoolExecutor(os.cpu_count())
async def wrap_done(fn: Awaitable, event: asyncio.Event):
"""Wrap an awaitable with a event to signal when it's done or an exception is raised."""
try:
await fn
except Exception as e:
# TODO: handle exception
msg = f"Caught exception: {e}"
logger.error(f'{e.__class__.__name__}: {msg}',
exc_info=e if log_verbose else None)
finally:
# Signal the aiter to stop.
event.set()
def get_ChatOpenAI(
model_name: str,
temperature: float,
streaming: bool = True,
callbacks: List[Callable] = [],
verbose: bool = True,
**kwargs: Any,
) -> ChatOpenAI:
config = get_model_worker_config(model_name)
model = ChatOpenAI(
streaming=streaming,
verbose=verbose,
callbacks=callbacks,
openai_api_key=config.get("api_key", "EMPTY"),
openai_api_base=config.get("api_base_url", fschat_openai_api_address()),
model_name=model_name,
temperature=temperature,
openai_proxy=config.get("openai_proxy"),
**kwargs
)
return model
class BaseResponse(BaseModel):
code: int = pydantic.Field(200, description="API status code")
msg: str = pydantic.Field("success", description="API status message")
data: Any = pydantic.Field(None, description="API data")
class Config:
schema_extra = {
"example": {
"code": 200,
"msg": "success",
}
}
class ListResponse(BaseResponse):
data: List[str] = pydantic.Field(..., description="List of names")
class Config:
schema_extra = {
"example": {
"code": 200,
"msg": "success",
"data": ["doc1.docx", "doc2.pdf", "doc3.txt"],
}
}
class ChatMessage(BaseModel):
question: str = pydantic.Field(..., description="Question text")
response: str = pydantic.Field(..., description="Response text")
history: List[List[str]] = pydantic.Field(..., description="History text")
source_documents: List[str] = pydantic.Field(
..., description="List of source documents and their scores"
)
class Config:
schema_extra = {
"example": {
"question": "工伤保险如何办理?",
"response": "根据已知信息,可以总结如下:\n\n1. 参保单位为员工缴纳工伤保险费,以保障员工在发生工伤时能够获得相应的待遇。\n"
"2. 不同地区的工伤保险缴费规定可能有所不同,需要向当地社保部门咨询以了解具体的缴费标准和规定。\n"
"3. 工伤从业人员及其近亲属需要申请工伤认定,确认享受的待遇资格,并按时缴纳工伤保险费。\n"
"4. 工伤保险待遇包括工伤医疗、康复、辅助器具配置费用、伤残待遇、工亡待遇、一次性工亡补助金等。\n"
"5. 工伤保险待遇领取资格认证包括长期待遇领取人员认证和一次性待遇领取人员认证。\n"
"6. 工伤保险基金支付的待遇项目包括工伤医疗待遇、康复待遇、辅助器具配置费用、一次性工亡补助金、丧葬补助金等。",
"history": [
[
"工伤保险是什么?",
"工伤保险是指用人单位按照国家规定,为本单位的职工和用人单位的其他人员,缴纳工伤保险费,"
"由保险机构按照国家规定的标准,给予工伤保险待遇的社会保险制度。",
]
],
"source_documents": [
"出处 [1] 广州市单位从业的特定人员参加工伤保险办事指引.docx\n\n\t"
"( 一) 从业单位 (组织) 按“自愿参保”原则, 为未建 立劳动关系的特定从业人员单项参加工伤保险 、缴纳工伤保 险费。",
"出处 [2] ...",
"出处 [3] ...",
],
}
}
def torch_gc():
import torch
if torch.cuda.is_available():
# with torch.cuda.device(DEVICE):
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
elif torch.backends.mps.is_available():
try:
from torch.mps import empty_cache
empty_cache()
except Exception as e:
msg=("如果您使用的是 macOS 建议将 pytorch 版本升级至 2.0.0 或更高版本,"
"以支持及时清理 torch 产生的内存占用。")
logger.error(f'{e.__class__.__name__}: {msg}',
exc_info=e if log_verbose else None)
def run_async(cor):
'''
在同步环境中运行异步代码.
'''
try:
loop = asyncio.get_event_loop()
except:
loop = asyncio.new_event_loop()
return loop.run_until_complete(cor)
def iter_over_async(ait, loop):
'''
将异步生成器封装成同步生成器.
'''
ait = ait.__aiter__()
async def get_next():
try:
obj = await ait.__anext__()
return False, obj
except StopAsyncIteration:
return True, None
while True:
done, obj = loop.run_until_complete(get_next())
if done:
break
yield obj
def MakeFastAPIOffline(
app: FastAPI,
static_dir = Path(__file__).parent / "static",
static_url = "/static-offline-docs",
docs_url: Optional[str] = "/docs",
redoc_url: Optional[str] = "/redoc",
) -> None:
"""patch the FastAPI obj that doesn't rely on CDN for the documentation page"""
from fastapi import Request
from fastapi.openapi.docs import (
get_redoc_html,
get_swagger_ui_html,
get_swagger_ui_oauth2_redirect_html,
)
from fastapi.staticfiles import StaticFiles
from starlette.responses import HTMLResponse
openapi_url = app.openapi_url
swagger_ui_oauth2_redirect_url = app.swagger_ui_oauth2_redirect_url
def remove_route(url: str) -> None:
'''
remove original route from app
'''
index = None
for i, r in enumerate(app.routes):
if r.path.lower() == url.lower():
index = i
break
if isinstance(index, int):
app.routes.pop(i)
# Set up static file mount
app.mount(
static_url,
StaticFiles(directory=Path(static_dir).as_posix()),
name="static-offline-docs",
)
if docs_url is not None:
remove_route(docs_url)
remove_route(swagger_ui_oauth2_redirect_url)
# Define the doc and redoc pages, pointing at the right files
@app.get(docs_url, include_in_schema=False)
async def custom_swagger_ui_html(request: Request) -> HTMLResponse:
root = request.scope.get("root_path")
favicon = f"{root}{static_url}/favicon.png"
return get_swagger_ui_html(
openapi_url=f"{root}{openapi_url}",
title=app.title + " - Swagger UI",
oauth2_redirect_url=swagger_ui_oauth2_redirect_url,
swagger_js_url=f"{root}{static_url}/swagger-ui-bundle.js",
swagger_css_url=f"{root}{static_url}/swagger-ui.css",
swagger_favicon_url=favicon,
)
@app.get(swagger_ui_oauth2_redirect_url, include_in_schema=False)
async def swagger_ui_redirect() -> HTMLResponse:
return get_swagger_ui_oauth2_redirect_html()
if redoc_url is not None:
remove_route(redoc_url)
@app.get(redoc_url, include_in_schema=False)
async def redoc_html(request: Request) -> HTMLResponse:
root = request.scope.get("root_path")
favicon = f"{root}{static_url}/favicon.png"
return get_redoc_html(
openapi_url=f"{root}{openapi_url}",
title=app.title + " - ReDoc",
redoc_js_url=f"{root}{static_url}/redoc.standalone.js",
with_google_fonts=False,
redoc_favicon_url=favicon,
)
# 从model_config中获取模型信息
def list_embed_models() -> List[str]:
'''
get names of configured embedding models
'''
return list(MODEL_PATH["embed_model"])
def list_llm_models() -> Dict[str, List[str]]:
'''
get names of configured llm models with different types.
return [(model_name, config_type), ...]
'''
workers = list(FSCHAT_MODEL_WORKERS)
if "default" in workers:
workers.remove("default")
return {
"local": list(MODEL_PATH["llm_model"]),
"online": list(ONLINE_LLM_MODEL),
"worker": workers,
}
def get_model_path(model_name: str, type: str = None) -> Optional[str]:
if type in MODEL_PATH:
paths = MODEL_PATH[type]
else:
paths = {}
for v in MODEL_PATH.values():
paths.update(v)
if path_str := paths.get(model_name): # 以 "chatglm-6b": "THUDM/chatglm-6b-new" 为例,以下都是支持的路径
path = Path(path_str)
if path.is_dir(): # 任意绝对路径
return str(path)
root_path = Path(MODEL_ROOT_PATH)
if root_path.is_dir():
path = root_path / model_name
if path.is_dir(): # use key, {MODEL_ROOT_PATH}/chatglm-6b
return str(path)
path = root_path / path_str
if path.is_dir(): # use value, {MODEL_ROOT_PATH}/THUDM/chatglm-6b-new
return str(path)
path = root_path / path_str.split("/")[-1]
if path.is_dir(): # use value split by "/", {MODEL_ROOT_PATH}/chatglm-6b-new
return str(path)
return path_str # THUDM/chatglm06b
# 从server_config中获取服务信息
def get_model_worker_config(model_name: str = None) -> dict:
'''
加载model worker的配置项。
优先级:FSCHAT_MODEL_WORKERS[model_name] > ONLINE_LLM_MODEL[model_name] > FSCHAT_MODEL_WORKERS["default"]
'''
from configs.model_config import ONLINE_LLM_MODEL
from configs.server_config import FSCHAT_MODEL_WORKERS
from server import model_workers
config = FSCHAT_MODEL_WORKERS.get("default", {}).copy()
config.update(ONLINE_LLM_MODEL.get(model_name, {}))
config.update(FSCHAT_MODEL_WORKERS.get(model_name, {}))
# 在线模型API
if model_name in ONLINE_LLM_MODEL:
config["online_api"] = True
if provider := config.get("provider"):
try:
config["worker_class"] = getattr(model_workers, provider)
except Exception as e:
msg = f"在线模型 {model_name} 的provider没有正确配置"
logger.error(f'{e.__class__.__name__}: {msg}',
exc_info=e if log_verbose else None)
config["model_path"] = get_model_path(model_name)
config["device"] = llm_device(config.get("device"))
return config
def get_all_model_worker_configs() -> dict:
result = {}
model_names = set(FSCHAT_MODEL_WORKERS.keys())
for name in model_names:
if name != "default":
result[name] = get_model_worker_config(name)
return result
def fschat_controller_address() -> str:
from configs.server_config import FSCHAT_CONTROLLER
host = FSCHAT_CONTROLLER["host"]
port = FSCHAT_CONTROLLER["port"]
return f"http://{host}:{port}"
def fschat_model_worker_address(model_name: str = LLM_MODEL) -> str:
if model := get_model_worker_config(model_name):
host = model["host"]
port = model["port"]
return f"http://{host}:{port}"
return ""
def fschat_openai_api_address() -> str:
from configs.server_config import FSCHAT_OPENAI_API
host = FSCHAT_OPENAI_API["host"]
port = FSCHAT_OPENAI_API["port"]
return f"http://{host}:{port}/v1"
def api_address() -> str:
from configs.server_config import API_SERVER
host = API_SERVER["host"]
port = API_SERVER["port"]
return f"http://{host}:{port}"
def webui_address() -> str:
from configs.server_config import WEBUI_SERVER
host = WEBUI_SERVER["host"]
port = WEBUI_SERVER["port"]
return f"http://{host}:{port}"
def get_prompt_template(name: str) -> Optional[str]:
'''
从prompt_config中加载模板内容
'''
from configs import prompt_config
import importlib
importlib.reload(prompt_config) # TODO: 检查configs/prompt_config.py文件有修改再重新加载
return prompt_config.PROMPT_TEMPLATES.get(name)
def set_httpx_config(
timeout: float = HTTPX_DEFAULT_TIMEOUT,
proxy: Union[str, Dict] = None,
):
'''
设置httpx默认timeout。httpx默认timeout是5秒在请求LLM回答时不够用。
将本项目相关服务加入无代理列表避免fastchat的服务器请求错误。(windows下无效)
对于chatgpt等在线API如要使用代理需要手动配置。搜索引擎的代理如何处置还需考虑。
'''
import httpx
import os
httpx._config.DEFAULT_TIMEOUT_CONFIG.connect = timeout
httpx._config.DEFAULT_TIMEOUT_CONFIG.read = timeout
httpx._config.DEFAULT_TIMEOUT_CONFIG.write = timeout
# 在进程范围内设置系统级代理
proxies = {}
if isinstance(proxy, str):
for n in ["http", "https", "all"]:
proxies[n + "_proxy"] = proxy
elif isinstance(proxy, dict):
for n in ["http", "https", "all"]:
if p:= proxy.get(n):
proxies[n + "_proxy"] = p
elif p:= proxy.get(n + "_proxy"):
proxies[n + "_proxy"] = p
for k, v in proxies.items():
os.environ[k] = v
# set host to bypass proxy
no_proxy = [x.strip() for x in os.environ.get("no_proxy", "").split(",") if x.strip()]
no_proxy += [
# do not use proxy for locahost
"http://127.0.0.1",
"http://localhost",
]
# do not use proxy for user deployed fastchat servers
for x in [
fschat_controller_address(),
fschat_model_worker_address(),
fschat_openai_api_address(),
]:
host = ":".join(x.split(":")[:2])
if host not in no_proxy:
no_proxy.append(host)
os.environ["NO_PROXY"] = ",".join(no_proxy)
# TODO: 简单的清除系统代理不是个好的选择影响太多。似乎修改代理服务器的bypass列表更好。
# patch requests to use custom proxies instead of system settings
# def _get_proxies():
# return {}
# import urllib.request
# urllib.request.getproxies = _get_proxies
# 自动检查torch可用的设备。分布式部署时不运行LLM的机器上可以不装torch
def detect_device() -> Literal["cuda", "mps", "cpu"]:
try:
import torch
if torch.cuda.is_available():
return "cuda"
if torch.backends.mps.is_available():
return "mps"
except:
pass
return "cpu"
def llm_device(device: str = None) -> Literal["cuda", "mps", "cpu"]:
device = device or LLM_DEVICE
if device not in ["cuda", "mps", "cpu"]:
device = detect_device()
return device
def embedding_device(device: str = None) -> Literal["cuda", "mps", "cpu"]:
device = device or EMBEDDING_DEVICE
if device not in ["cuda", "mps", "cpu"]:
device = detect_device()
return device
def run_in_thread_pool(
func: Callable,
params: List[Dict] = [],
pool: ThreadPoolExecutor = None,
) -> Generator:
'''
在线程池中批量运行任务,并将运行结果以生成器的形式返回。
请确保任务中的所有操作是线程安全的,任务函数请全部使用关键字参数。
'''
tasks = []
pool = pool or thread_pool
for kwargs in params:
thread = pool.submit(func, **kwargs)
tasks.append(thread)
for obj in as_completed(tasks):
yield obj.result()
def get_httpx_client(
use_async: bool = False,
proxies: Union[str, Dict] = None,
timeout: float = HTTPX_DEFAULT_TIMEOUT,
**kwargs,
) -> Union[httpx.Client, httpx.AsyncClient]:
'''
helper to get httpx client with default proxies that bypass local addesses.
'''
default_proxies = {
# do not use proxy for locahost
"all://127.0.0.1": None,
"all://localhost": None,
}
# do not use proxy for user deployed fastchat servers
for x in [
fschat_controller_address(),
fschat_model_worker_address(),
fschat_openai_api_address(),
]:
host = ":".join(x.split(":")[:2])
default_proxies.update({host: None})
# get proxies from system envionrent
default_proxies.update({
"http://": os.environ.get("http_proxy"),
"https://": os.environ.get("https_proxy"),
"all://": os.environ.get("all_proxy"),
})
for host in os.environ.get("no_proxy", "").split(","):
if host := host.strip():
default_proxies.update({host: None})
# merge default proxies with user provided proxies
if isinstance(proxies, str):
proxies = {"all://": proxies}
if isinstance(proxies, dict):
default_proxies.update(proxies)
# construct Client
kwargs.update(timeout=timeout, proxies=default_proxies)
if use_async:
return httpx.AsyncClient(**kwargs)
else:
return httpx.Client(**kwargs)