mirror of
https://github.com/RYDE-WORK/visual-med-alpaca.git
synced 2026-01-28 19:33:23 +08:00
344 lines
15 KiB
HTML
344 lines
15 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml"><head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<title>Project title</title>
|
||
<link rel="shortcut icon" href="favicon.ico">
|
||
<link rel="stylesheet" href="files/style.css">
|
||
<link rel="stylesheet" href="files/font.css">
|
||
|
||
</head>
|
||
|
||
<style type="text/css">
|
||
#myvalignContainer1O { position:relative }
|
||
#myvalignContainer1I { position:absolute; top:50%; height:10em; margin-top:-5em }
|
||
</style>
|
||
<style type="text/css">
|
||
#myvalignContainer2 { line-height:4em }
|
||
</style>
|
||
|
||
<body>
|
||
|
||
<script src="files/analytics.js" async=""></script><script>
|
||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||
|
||
ga('create', 'UA-40457306-2', 'mit.edu');
|
||
ga('send', 'pageview');
|
||
|
||
</script>
|
||
|
||
<!-- Title -->
|
||
<div class="container">
|
||
|
||
<!-- <span class="title">Task-Oriented Flow Utilization</span> -->
|
||
<!-- <span class="venue">Conference name</span> -->
|
||
<span class="title">Visual Med-Alpaca: Bridging Modalities in Biomedical Language Models</span>
|
||
|
||
<table align="center" border="0" width="1000" class="authors">
|
||
<tbody><tr>
|
||
<td class="author"> Chang Shu</a><sup>1*</sup></td>
|
||
<td class="author"> Baian Chen<sup>2*</sup></td>
|
||
<td class="author"> Fangyu Liu</a><sup>1</sup></td>
|
||
<td class="author"> Zihao Fu</a><sup>1</sup></td>
|
||
<td class="author"> Ehsan Shareghi </a><sup>1</sup></td>
|
||
<td class="author"> <a href="https://sites.google.com/site/nhcollier/home/">Nigel Collier</a><sup>1</sup></td>
|
||
</tr></tbody>
|
||
</table>
|
||
|
||
<table align="center" border="0" width="1000" class="affiliations">
|
||
<tbody>
|
||
<tr>
|
||
<td class="affliation" align="center">
|
||
<sup>1</sup><a href="https://www.cam.ac.uk/">University of Cambridge</a>
|
||
    
|
||
<sup>2</sup>Ruiping Health</a>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
|
||
<br>
|
||
<br>
|
||
<table align="center"><tbody><tr>
|
||
<td><center><img src="files/ltl_logo.jpg" width="1100" ></center></td>
|
||
</tr>
|
||
<tr><td>
|
||
<table border="0">
|
||
</tbody>
|
||
<tr><td class="caption">Abstract Here</td></tr>
|
||
</tbody></table>
|
||
<br>
|
||
|
||
<!-- Result -->
|
||
<div class="section">
|
||
<span class="section-title"> Demo (insert GIF here) (Baian) </span>
|
||
</br></br>
|
||
<table align="center"><tbody>
|
||
<tr><td><center>
|
||
<iframe width="900" height="506" src="xxxx.gif" frameborder="0" allowfullscreen></iframe>
|
||
</td></tr>
|
||
<tr><td><center>
|
||
Please register for Hugging Face and fill out this form [link] to access the online demo of Visual Med-Alpaca. Warning: Only for academic usage and do not apply it to real clinical scenarios!
|
||
</center></td></tr>
|
||
</table>
|
||
</div>
|
||
|
||
<!-- Abstract -->
|
||
<div class="section">
|
||
<span class="section-title">Introduction </span>
|
||
<p> We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. On our preliminary evaluation of single-turn instruction following, Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce. </p>
|
||
<!-- <p class="bibtex">
|
||
@article{xue2019video,
|
||
title={Video Enhancement with Task-Oriented Flow},
|
||
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
|
||
journal={International Journal of Computer Vision (IJCV)},
|
||
volume={127},
|
||
number={8},
|
||
pages={1106--1125},
|
||
year={2019},
|
||
publisher={Springer}
|
||
}
|
||
</p> -->
|
||
<br>
|
||
<b>Assets released:</b></br>
|
||
<ul>
|
||
<li> Demo: <a href="https://">HuggingFace Space</a>
|
||
</li>
|
||
<li> Data: <a href="https://github.com/cambridgeltl/">Github</a>
|
||
</li>
|
||
<li> Data Generation: <a href="https://github.com/cambridgeltl/">Github</a>
|
||
</li>
|
||
<li> Visual Adaptation: <a href="https://github.com/cambridgeltl/">Github</a>
|
||
</li>
|
||
<li> Training Code: <a href="https://github.com/cambridgeltl/">Github</a>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Overview: Model and Training Recipe </span>
|
||
</br></br>
|
||
Overview of the model architecture and training procedure.
|
||
</div>
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Domain Adaptation: Self-Instruct in Biomedical Domain (Baian) </span>
|
||
</br></br>
|
||
How to generate the instruct-tuning set
|
||
</div>
|
||
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Visual Adaptation: Deplot and Medical VQA (Baian)</span>
|
||
|
||
</br></br>
|
||
We also build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from <a href='http://toflow.csail.mit.edu/vimeo.com'>vimeo.com</a>, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.
|
||
|
||
</br></br>
|
||
|
||
Sampled Frames (Full-resolution samples are <a href="http://data.csail.mit.edu/tofu/dataset.html">here</a>):</br>
|
||
</br>
|
||
<!-- <center><img src="files/dataset.png" width="1100" ></center></br> -->
|
||
<table>
|
||
<tr>
|
||
<td><img src="files/dataset/0059.png" width="264" ></td>
|
||
<td><img src="files/dataset/0043.png" width="264" ></td>
|
||
<td><img src="files/dataset/0039.png" width="264" ></td>
|
||
<td><img src="files/dataset/0028.png" width="264" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/dataset/0017.png" width="264" ></td>
|
||
<td><img src="files/dataset/0008.png" width="264" ></td>
|
||
<td><img src="files/dataset/0045.png" width="264" ></td>
|
||
<td><img src="files/dataset/0035.png" width="264" ></td>
|
||
</tr>
|
||
</table>
|
||
<!-- -->
|
||
<table>
|
||
<tr>
|
||
<td><img src="files/datasetS/0001.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0002.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0003.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0004.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0005.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0006.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0007.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0008.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0009.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0010.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0011.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0012.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0013.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0014.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0015.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0016.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0017.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0018.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0019.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0020.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0021.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0022.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0023.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0024.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0025.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0026.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0027.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0028.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0029.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0030.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0031.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0032.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0033.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0034.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0035.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0036.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0037.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0038.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0039.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0040.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0041.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0042.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0043.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0044.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0045.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0046.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0047.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0048.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0049.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0050.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0051.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0052.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0053.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0054.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0055.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0056.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0057.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0058.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0059.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0060.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0061.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0062.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0063.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0064.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0065.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0066.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0067.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0068.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0069.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0070.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0071.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0072.png" width="86" ></td>
|
||
</tr>
|
||
<tr>
|
||
<td><img src="files/datasetS/0073.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0074.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0075.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0076.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0077.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0078.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0079.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0080.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0081.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0082.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0083.png" width="86" ></td>
|
||
<td><img src="files/datasetS/0084.png" width="86" ></td>
|
||
</tr>
|
||
</table>
|
||
</br></br>
|
||
|
||
<a name='original_video'></a>
|
||
<b><a href="http://toflow.csail.mit.edu/index.html#original_video">The list of original videos</a></b></br></br>
|
||
|
||
The list of all full-length original videos can be found <a href="http://data.csail.mit.edu/tofu/dataset/original_video_list.txt">here</a>, and <a href="https://rg3.github.io/youtube-dl/">youtube-dl</a> can be used to batch download them. We reused some of utilities by AoT Dataset for scene detection/camera stabilization to generate these video clips and please refer to this <a href="https://github.com/donglaiw/AoT_Dataset">repository</a> for more details.</br></br>
|
||
|
||
We further process these 89,800 video clips to generate the following two subsets.</br></br>
|
||
|
||
<a name='triplet'></a>
|
||
<b><a href="http://toflow.csail.mit.edu/index.html#triplet">Triplet dataset (for temporal frame interpolation):</a></b></br></br>
|
||
The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are
|
||
<ul>
|
||
<li> Testing set only (17GB): <a href="http://data.csail.mit.edu/tofu/testset/vimeo_interp_test.zip">zip</a> </li>
|
||
<li> Both training and test set (33GB): <a href="http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip">zip</a> </li>
|
||
</ul>
|
||
|
||
<a name='septuplet'></a>
|
||
<b><a href="http://toflow.csail.mit.edu/index.html#septuplet">Septuplet dataset (for video denoising, deblocking, and super-resoluttion):</a></b></br></br>
|
||
<b> Notice: we have recently updated our testing denoising dataset to fix a bug in denoising test data generation. The new quantitative result of our algorithm is reported in <a href="http://toflow.csail.mit.edu/toflow_ijcv.pdf">our updated paper</a> </b></br></br>
|
||
The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90K. This dataset is designed to video denoising, deblocking, and super-resolution.
|
||
<ul>
|
||
<li> The test set for video denoising (16GB): <a href="http://data.csail.mit.edu/tofu/testset/vimeo_denoising_test_20191102.zip">zip</a> </li>
|
||
<li> The test set for video deblocking (11GB): <a href="http://data.csail.mit.edu/tofu/testset/vimeo_deblocking_test.zip">zip</a> </li>
|
||
<li> The test set for video super-resolution (6GB): <a href="http://data.csail.mit.edu/tofu/testset/vimeo_super_resolution_test.zip">zip</a> </li>
|
||
<li> The original test set (not downsampled or downgraded by noise) (15GB): <a href="http://data.csail.mit.edu/tofu/testset/vimeo_test_clean.zip">zip</a> </li>
|
||
<li> The original training + test set (82GB): <a href="http://data.csail.mit.edu/tofu/dataset/vimeo_septuplet.zip">zip</a> </li>
|
||
</ul>
|
||
<!--<b>Description of Vimeo-90K Dataset: </b><a href="https://github.com/anchen1011/toflow#the-vimeo-dataset">github</a> </br>-->
|
||
</div>
|
||
|
||
<!-- div class="section">
|
||
<span class="section-title">Results </span></br>
|
||
<p class="subsection">Interpolation</p -->
|
||
<!-- Result start -->
|
||
<!-- Result end -->
|
||
|
||
<!-- p class="subsection">Visualization</p -->
|
||
<!-- Visualization start -->
|
||
<!-- Visualization end -->
|
||
<!-- /div -->
|
||
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Implementation Details </span>
|
||
</br></br>
|
||
Hyper-parameter
|
||
Training time
|
||
</div>
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Evaluation and Known Limitations </span>
|
||
</br></br>
|
||
We evaluate
|
||
Limited Human evaluation (Links Here)
|
||
</div>
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Comparison with Other Methods </span>
|
||
</br></br>
|
||
Compare with ChatGPT / Alpaca / Galactica
|
||
</div>
|
||
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Future Work </span>
|
||
</br></br>
|
||
Compare with ChatGPT / Alpaca / Galactica
|
||
</div>
|
||
|
||
|
||
<div class="section">
|
||
<span class="section-title"> Acknowledgement </span>
|
||
</br></br>
|
||
Thanks to
|
||
</div>
|
||
|
||
<p> </p>
|
||
<!-- end .container --></div>
|
||
|
||
|
||
|
||
</body></html>
|