4.3 KiB
Visual Med-Alpaca Code
Important Notice:
This project is produced based on LLaMA, Stanford Alpaca, Alpaca-LoRA, Deplot, BigBio, ROCO, GenerativeImage2Text, GPT-3.5-Turbo, and is for academic purpose only.
We are currently working on the necessary ethical clearance from University of Cambridge to determine whether, when and how could we provide complete inference code as well as an online interactive demo.
Currently you should still be able to easily reproduce the system with the follow this instruction. We apologize for the inconvenience.
Installation
Hardware Requirement:
Store all the data and models: 100GB free space.
Deploy the full system (8bit inference): 32GB RAM and 24GB GPU memory.
Train Med-Alpaca Lora and/or Med-GIT: 32GB RAM and 24GB GPU memory.
Train Med-Alpaca: 4 NVIDIA A100-SXM4-80GB.
CUDA Toolkit needs to be installed to take advantage GPU devices.
Environment (optional):
Install conda and create conda virtual environment:
conda create -n visual-med-alpaca python=3.9
conda activate visual-med-alpaca
Setup hugging face model cache dir:
export TRANSFORMERS_CACHE=/your/dir/to/huggingface_cache
Installation:
git clone https://github.com/cambridgeltl/visual-med-alpaca.git
cd visual-med-alpaca/code
pip install -r requirements.txt
Data
Except datasets provided in data folder, you may download ROCO image data with official code below (might be slow) or from unofficial kaggle.
git clone https://github.com/razorx89/roco-dataset.git
cd roco-dataset
python scripts/fetch.py
ROCO Dataset could take up to 20GB space, and take up to 8GB space when zipped.
More details about these data can be found here.
Training
Med-Alpaca Lora
configure data_path in finetune-med.sh and then
cd med-alpaca-lora
bash finetune-med.sh
Med-Alpaca
configure data_path in train.sh and then
cd med-alpaca
bash train.sh
Med-GIT
configure train_data_csv, train_data_folder, validation_data_csv, validation_data_folder, in Fine_tune_GIT_on_an_image_captioning_dataset.py
cd med-git
python Fine_tune_GIT_on_an_image_captioning_dataset.py
For more information, refer to Stanford Alpaca, Alpaca-LoRA, Transformer-Tutorials.
Connect all pieces together
Finally, all these pieces could be combined in a workflow illustrated in the following diagram
image
We have tested Deplot and Med-GIT as Medical Visual Foundation Model. We have tested Med-Alpaca Lora, Med-Alpaca, and GPT-3.5-Turbo as Medical Lanugage Model.
You can use the image_caption output by any Medical Visual Foundation Model to prompt any Medical Lanugage Model with the following template.
prompt_input = f"Below is an instruction that describes a task, paired with an input that provides further context of an uploaded image. Write a response that appropriately completes the request.\n\n### Instruction:\n{question}\n\n### Input:\n{image_caption}\n\n### Response:\n"
prompt_no_input = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{question}\n\n### Response:\n"