Update README.md

This commit is contained in:
Fangyu Liu 2023-04-15 05:55:44 +01:00 committed by GitHub
parent 3b4b80013c
commit e8f8c4a681
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -5,7 +5,7 @@
# Visual Med-Alpaca: A Parameter-Efficient Biomedical LLM with Visual Capabilities [[BLOG](https://cambridgeltl.github.io/visual-med-alpaca/)]
[Chang Shu](https://ciaranshu.github.io)<sup>1\*</sup>, [Baian Chen](https://scholar.google.com/citations?user=IFKToXUAAAAJ&hl=en&oi=ao)<sup>2\*</sup>, [Fangyu Liu](http://fangyuliu.me)<sup>1</sup>, [Zihao Fu](https://fuzihaofzh.github.io)<sup>1</sup>, [Ehsan Shareghi](https://eehsan.github.io)<sup>3</sup>, [Nigel Collier](https://sites.google.com/site/nhcollier/home/)<sup>1</sup>
[Chang Shu](https://ciaranshu.github.io)<sup>1\*</sup>, [Baian Chen](https://scholar.google.com/citations?user=IFKToXUAAAAJ&hl=en&oi=ao)<sup>2\*</sup>, [Fangyu Liu](http://fangyuliu.me/about)<sup>1</sup>, [Zihao Fu](https://fuzihaofzh.github.io)<sup>1</sup>, [Ehsan Shareghi](https://eehsan.github.io)<sup>3</sup>, [Nigel Collier](https://sites.google.com/site/nhcollier/home/)<sup>1</sup>
[University of Cambridge](https://ltl.mmll.cam.ac.uk)<sup>1</sup> Ruiping Health<sup>2</sup> [Monash University](https://www.monash.edu/it/dsai)<sup>3</sup>
@ -28,9 +28,9 @@ Domain-specific foundation models play a critical role in the biomedical field,
Modern large language models (LLMs) necessitate an unprecedented level of computational resources for full-model fine-tuning. The cost of fine-tuning even a 7-billion-parameter LLM exclusively on PubMed is prohibitively expensive for the majority of academic institutions. Pretraining models on extensive medical image datasets to attain multimodal capabilities incurs even higher costs. Consequently, researchers are exploring more cost-effective techniques such as Adapter, Instruct-Tuning, and Prompt Augmentation to develop models that can be trained and deployed on gaming-level graphics cards while maintaining adequate performance. In the context of bridging text and vision for multimodal applications, training can also be similarly expensive ([Alayrac et al., 2022](https://arxiv.org/abs/2204.14198)). Besides, to the best of our knowledge, there is no publicly available multimodal generative foundation model specifically designed for biomedical applications.
In response to these challenges, we introduce [**Visual Med-Alpaca**](https://github.com/cambridgeltl/visual-med-alpaca), an open-source, parameter-efficient biomedical foundation model that features a plug-and-play visual extension framework. To develop the Visual Med-Alpaca model, we initially create a biomedical instruction set by extracting medical questions from various medical datasets within the [BigBIO](https://github.com/bigscience-workshop/biomedical) repository. Subsequently, we prompt GPT-3.5-turbo to synthesize answers for these questions. Multiple rounds of human filtering and editing are performed to refine the question-answer pairs, resulting in a high-quality instruction set comprising 54k data points. Next, we expand Med-Alpaca into Visual Med-Alpaca by connecting the textual model with "visual medical experts," which are specialized medical computer vision models. For instance, in radiology-domain applications, we train an in-house radiology image captioning model called Med-GIT (see later for details). When given an input image, a classifier determines if or which medical visual expert is responsible for the image. The designated medical expert then converts the image into a text prompt. The prompt manager subsequently merges the converted visual information with the textual query, enabling Med-Alpaca to generate an appropriate response.
In response to these challenges, we introduce [**Visual Med-Alpaca**](https://github.com/cambridgeltl/visual-med-alpaca), an open-source, parameter-efficient biomedical foundation model that features a plug-and-play visual extension framework. To develop the Visual Med-Alpaca model, we initially create a biomedical instruction set by extracting medical questions from various medical datasets within the [BigBIO](https://github.com/bigscience-workshop/biomedical) repository ([Fries et al., 2022](https://arxiv.org/abs/2206.15076)). Subsequently, we prompt GPT-3.5-turbo to synthesize answers for these questions. Multiple rounds of human filtering and editing are performed to refine the question-answer pairs, resulting in a high-quality instruction set comprising 54k data points. Next, we expand Med-Alpaca into Visual Med-Alpaca by connecting the textual model with "visual medical experts," which are specialized medical computer vision models. For instance, in radiology-domain applications, we train an in-house radiology image captioning model called Med-GIT (see later for details). When given an input image, a classifier determines if or which medical visual expert is responsible for the image. The designated medical expert then converts the image into a text prompt. The prompt manager subsequently merges the converted visual information with the textual query, enabling Med-Alpaca to generate an appropriate response.
A paramount objective for the future is to thoroughly assess the medical proficiency and potential shortcomings of Visual Med-Alpaca, encompassing issues such as misleading medical advice and incorrect medical information. Moving beyond traditional benchmarking and manual evaluation methods, we aim to focus on different user groups, including doctors and patients, and evaluate all facets of the model through a user-centered approach. This comprehensive assessment will enable us to ensure the reliability and effectiveness of Visual Med-Alpaca in addressing various biomedical tasks and catering to the diverse needs of its users.
**Ongoing work.** A paramount objective for the future is to thoroughly assess the medical proficiency and potential shortcomings of Visual Med-Alpaca, encompassing issues such as misleading medical advice and incorrect medical information. Moving beyond traditional benchmarking and manual evaluation methods, we aim to focus on different user groups, including doctors and patients, and evaluate all facets of the model through a user-centered approach. This comprehensive assessment will enable us to ensure the reliability and effectiveness of Visual Med-Alpaca in addressing various biomedical tasks and catering to the diverse needs of its users.
**It is also important to note that Visual Med-Alpaca is strictly intended for academic research purposes and not legally approved for medical use in any country.**
@ -172,5 +172,5 @@ Visual Med-Alpaca, is intended for academic research purposes only. Any commerci
## Acknowledgement
We are deeply grateful for the contributions made by open-source projects: [LLaMA](https://github.com/facebookresearch/llama), [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), [Alpaca-LoRA](https://github.com/tloen/alpaca-lora), [Deplot](https://huggingface.co/docs/transformers/main/model_doc/deplot), [BigBio](https://huggingface.co/bigbio), [ROCO](https://github.com/razorx89/roco-dataset), [Visual-ChatGPT](https://github.com/microsoft/visual-chatgpt), [GenerativeImage2Text](https://github.com/microsoft/GenerativeImage2Text).
We are deeply grateful for the contributions made by open-source projects: [LLaMA](https://github.com/facebookresearch/llama), [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), [Alpaca-LoRA](https://github.com/tloen/alpaca-lora), [DePlot](https://huggingface.co/docs/transformers/main/model_doc/deplot), [BigBio](https://huggingface.co/bigbio), [ROCO](https://github.com/razorx89/roco-dataset), [Visual-ChatGPT](https://github.com/microsoft/visual-chatgpt), [GenerativeImage2Text](https://github.com/microsoft/GenerativeImage2Text).