Visual Med-Alpaca: Bridging Modalities in Biomedical Language Models
Chang Shu1* Baian Chen2* Fangyu Liu1 Zihao Fu1 Ehsan Shareghi 1 Nigel Collier1
1University of Cambridge      2Ruiping Health


Abstract Here

Demo (insert GIF here) (Baian)

Please register for Hugging Face and fill out this form [link] to access the online demo of Visual Med-Alpaca. Warning: Only for academic usage and do not apply it to real clinical scenarios!
Introduction

We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. On our preliminary evaluation of single-turn instruction following, Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce.


Assets released:
Overview: Model and Training Recipe

Overview of the model architecture and training procedure.
Domain Adaptation: Self-Instruct in Biomedical Domain (Baian)

How to generate the instruct-tuning set
Visual Adaptation: Deplot and Medical VQA (Baian)

We also build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Sampled Frames (Full-resolution samples are here):



The list of original videos

The list of all full-length original videos can be found here, and youtube-dl can be used to batch download them. We reused some of utilities by AoT Dataset for scene detection/camera stabilization to generate these video clips and please refer to this repository for more details.

We further process these 89,800 video clips to generate the following two subsets.

Triplet dataset (for temporal frame interpolation):

The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are
  • Testing set only (17GB): zip
  • Both training and test set (33GB): zip
Septuplet dataset (for video denoising, deblocking, and super-resoluttion):

Notice: we have recently updated our testing denoising dataset to fix a bug in denoising test data generation. The new quantitative result of our algorithm is reported in our updated paper

The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90K. This dataset is designed to video denoising, deblocking, and super-resolution.
  • The test set for video denoising (16GB): zip
  • The test set for video deblocking (11GB): zip
  • The test set for video super-resolution (6GB): zip
  • The original test set (not downsampled or downgraded by noise) (15GB): zip
  • The original training + test set (82GB): zip
Implementation Details

Hyper-parameter Training time
Evaluation and Known Limitations

We evaluate Limited Human evaluation (Links Here)
Comparison with Other Methods

Compare with ChatGPT / Alpaca / Galactica
Future Work

Compare with ChatGPT / Alpaca / Galactica
Acknowledgement

Thanks to