ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.
ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.
ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.
ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.
ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.
Rank | ReXGradient | MIMIC-CXR | IU-Xray | CheXpert Plus |
---|---|---|---|---|
1 |
UniRG-CXR
Microsoft Research |
UniRG-CXR
Microsoft Research |
UniRG-CXR
Microsoft Research |
UniRG-CXR
Microsoft Research |
2 |
MoERad-IU
IIT Madras |
MedVersa
Harvard |
MoERad-IU
IIT Madras |
RadPhi3.5Vision
Microsoft |
3 |
MedGemma
|
Libra
University of Glasgow |
MedVersa
Harvard |
CheXpertPlus-CheX-MIMIC
Stanford |
4 |
MedVersa
Harvard |
RadPhi3.5Vision
Microsoft |
CXRMate-RRG24
CSIRO |
CXRMate-RRG24
CSIRO |
5 |
MAIRA-2
Microsoft |
CXRMate-ED
CSIRO |
VLCI-IU
SYSU |
MAIRA-2
Microsoft |
6 |
VLCI-IU
SYSU |
CXRMate-RRG24
CSIRO |
MedGemma
|
CheXpertPlus-CheX
Stanford |
7 |
RadPhi3.5Vision
Microsoft |
CheXpertPlus-CheX-MIMIC
Stanford |
MAIRA-2
Microsoft |
DD-LLaVA-X
SNUH |
8 |
RGRG
TUM |
DD-LLaVA-X
SNUH |
Cvt2distilgpt2-IU
CSIRO |
CXRMate-ED
CSIRO |
9 |
DD-LLaVA-X
SNUH |
RaDialog
TUM |
CXRMate-ED
CSIRO |
MedVersa
Harvard |
10 |
Libra
University of Glasgow |
CheXpertPlus-MIMIC
Stanford |
DD-LLaVA-X
SNUH |
Libra
University of Glasgow |
11 |
RaDialog
TUM |
RGRG
TUM |
RadFM
SJTU |
RaDialog
TUM |
12 |
CXRMate-ED
CSIRO |
MedGemma
|
CheXpertPlus-CheX-MIMIC
Stanford |
MedGemma
|
13 |
Cvt2distilgpt2-MIMIC
CSIRO |
CheXagent
Stanford |
Libra
University of Glasgow |
RGRG
TUM |
14 |
Cvt2distilgpt2-IU
CSIRO |
MoERad-MIMIC
IIT Madras |
RGRG
TUM |
CheXpertPlus-MIMIC
Stanford |
15 |
CheXpertPlus-CheX-MIMIC
Stanford |
Cvt2distilgpt2-MIMIC
CSIRO |
RadPhi3.5Vision
Microsoft |
MoERad-MIMIC
IIT Madras |
16 |
CXRMate-RRG24
CSIRO |
CheXpertPlus-CheX
Stanford |
Cvt2distilgpt2-MIMIC
CSIRO |
CheXagent
Stanford |
17 |
CheXpertPlus-CheX
Stanford |
MAIRA-2
Microsoft |
RaDialog
TUM |
Cvt2distilgpt2-MIMIC
CSIRO |
18 |
CheXpertPlus-MIMIC
Stanford |
VLCI-MIMIC
SYSU |
MoERad-MIMIC
IIT Madras |
MoERad-IU
IIT Madras |
19 |
RadFM
SJTU |
RadFM
SJTU |
CheXpertPlus-MIMIC
Stanford |
VLCI-MIMIC
SYSU |
20 |
BiomedGPT-IU
Lehigh University |
MoERad-IU
IIT Madras |
BiomedGPT-IU
Lehigh University |
Cvt2distilgpt2-IU
CSIRO |
21 |
MoERad-MIMIC
IIT Madras |
Cvt2distilgpt2-IU
CSIRO |
CheXpertPlus-CheX
Stanford |
RadFM
SJTU |
22 |
VLCI-MIMIC
SYSU |
VLCI-IU
SYSU |
VLCI-MIMIC
SYSU |
GPT4V
OpenAI |
23 |
CheXagent
Stanford |
GPT4V
OpenAI |
CheXagent
Stanford |
VLCI-IU
SYSU |
24 |
GPT4V
OpenAI |
BiomedGPT-IU
Lehigh University |
GPT4V
OpenAI |
BiomedGPT-IU
Lehigh University |
25 |
LLM-CXR
KAIST |
LLM-CXR
KAIST |
LLM-CXR
KAIST |
LLM-CXR
KAIST |
Rank | ReXVQA |
---|---|
1 |
MedGemma-4B-it
|
2 |
Janus-Pro-7B
DeepSeek |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
4 |
Eagle2-9B
NVIDIA |
5 |
Gemini-1.5-Pro
|
6 |
Qwen2VL-7B-Instruct
Alibaba |
7 |
Phi35-Vision-Instruct
Microsoft |
8 |
LLaVA-1.5-7B
Meta |
ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN |
---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.674 | 0.093 | 0.305 | 0.366 | 0.08 | 0.428 | 0.241 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.777 | 0.154 | 0.341 | 0.442 | 0.13 | 0.501 | 0.52 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.787 | 0.143 | 0.361 | 0.431 | 0.124 | 0.476 | 0.411 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.83 | 0.169 | 0.372 | 0.442 | 0.154 | 0.517 | 0.489 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.866 | 0.186 | 0.374 | 0.46 | 0.176 | 0.524 | 0.514 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.842 | 0.178 | 0.395 | 0.405 | 0.167 | 0.52 | 0.47 |
7 2024 |
MedVersa
Harvard |
1.008 | 0.21 | 0.431 | 0.498 | 0.202 | 0.527 | 0.532 |
8 2023 |
RadFM
SJTU |
0.775 | 0.157 | 0.365 | 0.392 | 0.135 | 0.504 | 0.406 |
9 2023 |
RaDialog
TUM |
0.876 | 0.188 | 0.402 | 0.45 | 0.158 | 0.522 | 0.435 |
10 2023 |
RGRG
TUM |
0.888 | 0.19 | 0.391 | 0.47 | 0.169 | 0.54 | 0.487 |
11 2023 |
VLCI-MIMIC
SYSU |
0.721 | 0.157 | 0.31 | 0.402 | 0.122 | 0.488 | 0.477 |
12 2023 |
VLCI-IU
SYSU |
0.897 | 0.214 | 0.365 | 0.467 | 0.215 | 0.573 | 0.536 |
13 2024 |
LLM-CXR
KAIST |
0.507 | 0.043 | 0.182 | 0.142 | 0.029 | 0.317 | 0.044 |
14 2024 |
GPT4V
OpenAI |
0.629 | 0.075 | 0.214 | 0.337 | 0.138 | 0.47 | 0.497 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.771 | 0.099 | 0.317 | 0.437 | 0.157 | 0.472 | 0.388 |
16 2024 |
MAIRA-2
Microsoft |
0.963 | 0.205 | 0.436 | 0.462 | 0.187 | 0.559 | 0.531 |
17 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.202 | 0.398 | 0.415 | 0.187 | 0.564 | 0.518 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.792 | 0.15 | 0.327 | 0.462 | 0.152 | 0.518 | 0.408 |
19 2024 |
Libra
University of Glasgow |
0.881 | 0.165 | 0.385 | 0.474 | 0.168 | 0.544 | 0.555 |
20 2025 |
MoERad-IU
IIT Madras |
1.018 | 0.227 | 0.434 | 0.446 | 0.247 | 0.575 | 0.494 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.756 | 0.145 | 0.351 | 0.406 | 0.116 | 0.508 | 0.431 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.891 | 0.209 | 0.383 | 0.488 | 0.169 | 0.544 | 0.453 |
23 2025 |
DD-LLaVA-X
SNUH |
0.886 | 0.166 | 0.387 | 0.469 | 0.174 | 0.542 | 0.504 |
24 2025 |
MedGemma
|
1.008 | 0.2 | 0.427 | 0.479 | 0.223 | 0.617 | 0.566 |
25 2025 |
UniRG-CXR
Microsoft Research |
1.621 | 0.291 | 0.538 | 0.576 | 0.298 | 0.622 | 0.476 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.791 | 0.177 | 0.364 | 0.431 | 0.139 | 0.481 | 0.523 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.748 | 0.165 | 0.333 | 0.395 | 0.148 | 0.502 | 0.468 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.838 | 0.196 | 0.389 | 0.429 | 0.166 | 0.5 | 0.508 |
4 2024 |
MedVersa
Harvard |
0.984 | 0.172 | 0.438 | 0.48 | 0.188 | 0.527 | 0.524 |
5 2023 |
RadFM
SJTU |
0.737 | 0.132 | 0.338 | 0.375 | 0.131 | 0.466 | 0.405 |
6 2024 |
GPT4V
OpenAI |
0.605 | 0.072 | 0.214 | 0.364 | 0.175 | 0.456 | 0.356 |
7 2025 |
UniRG-CXR
Microsoft Research |
1.59 | 0.3 | 0.532 | 0.573 | 0.3 | 0.612 | 0.494 |
MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN |
---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.741 | 0.113 | 0.346 | 0.347 | 0.148 | 0.474 | 0.257 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.788 | 0.145 | 0.361 | 0.375 | 0.17 | 0.485 | 0.311 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.698 | 0.077 | 0.314 | 0.325 | 0.142 | 0.469 | 0.225 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.805 | 0.142 | 0.367 | 0.379 | 0.181 | 0.49 | 0.305 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.719 | 0.126 | 0.331 | 0.329 | 0.149 | 0.432 | 0.268 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.613 | 0.055 | 0.303 | 0.191 | 0.103 | 0.448 | 0.164 |
7 2024 |
MedVersa
Harvard |
1.103 | 0.209 | 0.448 | 0.466 | 0.273 | 0.55 | 0.374 |
8 2023 |
RadFM
SJTU |
0.65 | 0.087 | 0.313 | 0.259 | 0.109 | 0.45 | 0.185 |
9 2023 |
RaDialog
TUM |
0.799 | 0.127 | 0.363 | 0.387 | 0.172 | 0.485 | 0.273 |
10 2023 |
RGRG
TUM |
0.755 | 0.13 | 0.348 | 0.344 | 0.168 | 0.491 | 0.273 |
11 2023 |
VLCI-MIMIC
SYSU |
0.68 | 0.136 | 0.304 | 0.305 | 0.14 | 0.45 | 0.256 |
12 2023 |
VLCI-IU
SYSU |
0.599 | 0.075 | 0.263 | 0.212 | 0.109 | 0.449 | 0.21 |
13 2024 |
LLM-CXR
KAIST |
0.516 | 0.037 | 0.181 | 0.156 | 0.046 | 0.341 | 0.043 |
14 2024 |
GPT4V
OpenAI |
0.558 | 0.068 | 0.207 | 0.214 | 0.084 | 0.423 | 0.161 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.544 | 0.02 | 0.192 | 0.224 | 0.059 | 0.36 | 0.123 |
16 2024 |
MAIRA-2
Microsoft |
0.694 | 0.088 | 0.308 | 0.339 | 0.131 | 0.517 | 0.224 |
17 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.208 | 0.383 | 0.396 | 0.223 | 0.531 | 0.327 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.87 | 0.198 | 0.367 | 0.423 | 0.22 | 0.521 | 0.338 |
19 2024 |
Libra
University of Glasgow |
0.898 | 0.232 | 0.402 | 0.403 | 0.218 | 0.523 | 0.356 |
20 2025 |
MoERad-IU
IIT Madras |
0.643 | 0.064 | 0.321 | 0.213 | 0.122 | 0.455 | 0.174 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.726 | 0.163 | 0.341 | 0.334 | 0.143 | 0.465 | 0.24 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.888 | 0.223 | 0.386 | 0.431 | 0.207 | 0.534 | 0.294 |
23 2025 |
DD-LLaVA-X
SNUH |
0.801 | 0.154 | 0.348 | 0.402 | 0.182 | 0.505 | 0.301 |
24 2025 |
MedGemma
|
0.744 | 0.165 | 0.346 | 0.339 | 0.159 | 0.549 | 0.293 |
25 2025 |
UniRG-CXR
Microsoft Research |
1.217 | 0.248 | 0.493 | 0.487 | 0.265 | 0.596 | 0.352 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.802 | 0.165 | 0.353 | 0.382 | 0.193 | 0.511 | 0.377 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.715 | 0.127 | 0.3 | 0.342 | 0.173 | 0.51 | 0.302 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.825 | 0.166 | 0.362 | 0.391 | 0.203 | 0.52 | 0.367 |
4 2024 |
MedVersa
Harvard |
0.919 | 0.193 | 0.43 | 0.315 | 0.273 | 0.554 | 0.421 |
5 2023 |
RadFM
SJTU |
0.625 | 0.081 | 0.281 | 0.245 | 0.111 | 0.448 | 0.214 |
6 2024 |
GPT4V
OpenAI |
0.549 | 0.065 | 0.204 | 0.19 | 0.085 | 0.429 | 0.127 |
7 2025 |
UniRG-CXR
Microsoft Research |
1.108 | 0.193 | 0.443 | 0.485 | 0.269 | 0.612 | 0.355 |
IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN |
---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.827 | 0.116 | 0.353 | 0.488 | 0.139 | 0.503 | 0.389 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.988 | 0.178 | 0.386 | 0.593 | 0.169 | 0.585 | 0.661 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.92 | 0.157 | 0.413 | 0.495 | 0.153 | 0.534 | 0.541 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.179 | 0.198 | 0.453 | 0.593 | 0.211 | 0.618 | 0.648 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
1.126 | 0.199 | 0.422 | 0.609 | 0.209 | 0.606 | 0.682 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
1.283 | 0.244 | 0.482 | 0.548 | 0.265 | 0.62 | 0.686 |
7 2024 |
MedVersa
Harvard |
1.46 | 0.206 | 0.527 | 0.606 | 0.235 | 0.65 | 0.631 |
8 2023 |
RadFM
SJTU |
1.187 | 0.2 | 0.459 | 0.566 | 0.23 | 0.627 | 0.615 |
9 2023 |
RaDialog
TUM |
1.086 | 0.201 | 0.444 | 0.544 | 0.205 | 0.586 | 0.586 |
10 2023 |
RGRG
TUM |
1.174 | 0.216 | 0.437 | 0.602 | 0.223 | 0.62 | 0.665 |
11 2023 |
VLCI-MIMIC
SYSU |
0.913 | 0.139 | 0.364 | 0.483 | 0.22 | 0.578 | 0.474 |
12 2023 |
VLCI-IU
SYSU |
1.381 | 0.268 | 0.455 | 0.619 | 0.288 | 0.679 | 0.698 |
13 2024 |
LLM-CXR
KAIST |
0.486 | 0.033 | 0.186 | 0.057 | 0.023 | 0.28 | 0.025 |
14 2024 |
GPT4V
OpenAI |
0.708 | 0.076 | 0.274 | 0.405 | 0.146 | 0.517 | 0.651 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.956 | 0.142 | 0.375 | 0.522 | 0.213 | 0.543 | 0.523 |
16 2024 |
MAIRA-2
Microsoft |
1.298 | 0.219 | 0.477 | 0.604 | 0.233 | 0.627 | 0.194 |
17 2024 |
CXRMate-ED
CSIRO |
1.22 | 0.225 | 0.464 | 0.557 | 0.249 | 0.655 | 0.685 |
18 2024 |
CXRMate-RRG24
CSIRO |
1.458 | 0.245 | 0.456 | 0.638 | 0.302 | 0.666 | 0.68 |
19 2024 |
Libra
University of Glasgow |
1.176 | 0.183 | 0.441 | 0.614 | 0.21 | 0.624 | 0.698 |
20 2025 |
MoERad-IU
IIT Madras |
1.922 | 0.277 | 0.525 | 0.641 | 0.341 | 0.684 | 0.665 |
21 2025 |
MoERad-MIMIC
IIT Madras |
1.02 | 0.171 | 0.42 | 0.559 | 0.178 | 0.603 | 0.584 |
22 2025 |
RadPhi3.5Vision
Microsoft |
1.166 | 0.248 | 0.433 | 0.607 | 0.22 | 0.634 | 0.597 |
23 2025 |
DD-LLaVA-X
SNUH |
1.204 | 0.189 | 0.443 | 0.6 | 0.233 | 0.636 | 0.671 |
24 2025 |
MedGemma
|
1.34 | 0.217 | 0.475 | 0.6 | 0.26 | 0.678 | 0.724 |
25 2025 |
UniRG-CXR
Microsoft Research |
1.977 | 0.265 | 0.565 | 0.659 | 0.286 | 0.69 | 0.639 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
1.111 | 0.227 | 0.449 | 0.594 | 0.187 | 0.57 | 0.681 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.995 | 0.198 | 0.394 | 0.55 | 0.211 | 0.604 | 0.706 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.249 | 0.244 | 0.476 | 0.598 | 0.232 | 0.606 | 0.694 |
4 2024 |
MedVersa
Harvard |
1.452 | 0.195 | 0.518 | 0.601 | 0.244 | 0.628 | 0.658 |
5 2023 |
RadFM
SJTU |
1.22 | 0.196 | 0.479 | 0.556 | 0.234 | 0.596 | 0.644 |
6 2024 |
GPT4V
OpenAI |
0.683 | 0.079 | 0.235 | 0.403 | 0.16 | 0.519 | 0.399 |
7 2025 |
UniRG-CXR
Microsoft Research |
2.007 | 0.295 | 0.574 | 0.653 | 0.285 | 0.669 | 0.668 |
CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN |
---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.638 | 0.123 | 0.278 | 0.269 | 0.125 | 0.434 | 0.183 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.663 | 0.14 | 0.292 | 0.294 | 0.134 | 0.43 | 0.238 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.786 | 0.15 | 0.342 | 0.377 | 0.191 | 0.487 | 0.237 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.808 | 0.153 | 0.335 | 0.404 | 0.207 | 0.497 | 0.274 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.626 | 0.124 | 0.267 | 0.266 | 0.119 | 0.42 | 0.215 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.577 | 0.084 | 0.267 | 0.155 | 0.098 | 0.382 | 0.147 |
7 2024 |
MedVersa
Harvard |
0.719 | 0.129 | 0.323 | 0.344 | 0.147 | 0.47 | 0.243 |
8 2023 |
RadFM
SJTU |
0.572 | 0.081 | 0.235 | 0.216 | 0.08 | 0.396 | 0.096 |
9 2023 |
RaDialog
TUM |
0.709 | 0.131 | 0.312 | 0.353 | 0.138 | 0.445 | 0.211 |
10 2023 |
RGRG
TUM |
0.674 | 0.154 | 0.315 | 0.274 | 0.14 | 0.453 | 0.216 |
11 2023 |
VLCI-MIMIC
SYSU |
0.589 | 0.12 | 0.229 | 0.251 | 0.101 | 0.384 | 0.165 |
12 2023 |
VLCI-IU
SYSU |
0.555 | 0.106 | 0.22 | 0.17 | 0.094 | 0.418 | 0.194 |
13 2024 |
LLM-CXR
KAIST |
0.519 | 0.041 | 0.162 | 0.211 | 0.037 | 0.321 | 0.022 |
14 2024 |
GPT4V
OpenAI |
0.568 | 0.081 | 0.215 | 0.234 | 0.082 | 0.415 | 0.152 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.552 | 0.022 | 0.2 | 0.241 | 0.056 | 0.351 | 0.118 |
16 2024 |
MAIRA-2
Microsoft |
0.788 | 0.163 | 0.359 | 0.355 | 0.189 | 0.485 | 0.273 |
17 2024 |
CXRMate-ED
CSIRO |
0.723 | 0.157 | 0.324 | 0.316 | 0.175 | 0.498 | 0.265 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.801 | 0.157 | 0.315 | 0.411 | 0.218 | 0.521 | 0.276 |
19 2024 |
Libra
University of Glasgow |
0.718 | 0.157 | 0.319 | 0.323 | 0.169 | 0.466 | 0.253 |
20 2025 |
MoERad-IU
IIT Madras |
0.595 | 0.075 | 0.284 | 0.175 | 0.102 | 0.39 | 0.127 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.641 | 0.122 | 0.267 | 0.3 | 0.12 | 0.434 | 0.166 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.86 | 0.198 | 0.353 | 0.437 | 0.217 | 0.51 | 0.243 |
23 2025 |
DD-LLaVA-X
SNUH |
0.753 | 0.085 | 0.318 | 0.385 | 0.172 | 0.476 | 0.206 |
24 2025 |
MedGemma
|
0.706 | 0.147 | 0.328 | 0.325 | 0.137 | 0.511 | 0.246 |
25 2025 |
UniRG-CXR
Microsoft Research |
1.008 | 0.19 | 0.428 | 0.46 | 0.236 | 0.564 | 0.279 |
1 2024 |
CheXpertPlus_MIMIC
Stanford |
0.482 | 0.103 | 0.002 | 0.318 | 0.049 | 0.429 | 0.293 |
2 2024 |
CheXpertPlus_CheX
Stanford |
0.512 | 0.142 | 0.02 | 0.38 | 0.07 | 0.492 | 0.363 |
3 2024 |
CheXpertPlus_CheX_MIMIC
Stanford |
0.511 | 0.14 | 0.011 | 0.388 | 0.071 | 0.503 | 0.382 |
4 2024 |
MedVersa
Harvard |
0.493 | 0.09 | 0.013 | 0.337 | 0.05 | 0.452 | 0.334 |
5 2023 |
RadFM
SJTU |
0.443 | 0.067 | -0.038 | 0.229 | 0.027 | 0.39 | 0.137 |
6 2024 |
GPT4V
OpenAI |
0.431 | 0.055 | -0.065 | 0.208 | 0.028 | 0.393 | 0.182 |
7 2025 |
UniRG-CXR
Microsoft Research |
0.53 | 0.13 | 0.026 | 0.433 | 0.072 | 0.536 | 0.298 |
Performance comparison of various vision-language models on medical VQA tasks.
Rank | Model | Overall Accuracy | Differential Diagnosis | Geometric Information | Location Assessment | Negation Assessment | Presence Assessment |
---|---|---|---|---|---|---|---|
1 |
MedGemma-4B-it
|
0.8217 | 0.7671 | 0.8045 | 0.8347 | 0.8503 | 0.8521 |
2 |
Janus-Pro-7B
DeepSeek |
0.6656 | 0.5634 | 0.7542 | 0.6462 | 0.7573 | 0.6070 |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
0.6555 | 0.6361 | 0.6648 | 0.6324 | 0.8327 | 0.5114 |
4 |
Eagle2-9B
NVIDIA |
0.6443 | 0.6817 | 0.5698 | 0.5695 | 0.8632 | 0.5375 |
5 |
Gemini-1.5-Pro
|
0.6331 | 0.6221 | 0.4689 | 0.5960 | 0.8568 | 0.6217 |
6 |
Qwen2VL-7B-Instruct
Alibaba |
0.5470 | 0.5265 | 0.4494 | 0.5405 | 0.6269 | 0.5915 |
7 |
Phi35-Vision-Instruct
Microsoft |
0.4749 | 0.6224 | 0.2215 | 0.3711 | 0.7950 | 0.3644 |
8 |
LLaVA-1.5-7B
Meta |
0.2661 | 0.2161 | 0.2346 | 0.2761 | 0.2402 | 0.3633 |