ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.
ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.
ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.
ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.
ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.
Rank | ReXVQA |
---|---|
1 |
MedGemma-4B-it
|
2 |
Janus-Pro-7B
DeepSeek |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
4 |
Eagle2-9B
NVIDIA |
5 |
Gemini-1.5-Pro
|
6 |
Qwen2VL-7B-Instruct
Alibaba |
7 |
Phi35-Vision-Instruct
Microsoft |
8 |
LLaVA-1.5-7B
Meta |
ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN | 1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.674 | 0.093 | 0.305 | 0.366 | 0.08 | 0.428 | 0.241 | 0.456 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.777 | 0.154 | 0.341 | 0.442 | 0.13 | 0.501 | 0.52 | 0.473 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.787 | 0.143 | 0.361 | 0.431 | 0.124 | 0.476 | 0.411 | 0.414 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.83 | 0.169 | 0.372 | 0.442 | 0.154 | 0.517 | 0.489 | 0.465 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.866 | 0.186 | 0.374 | 0.46 | 0.176 | 0.524 | 0.514 | 0.47 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.842 | 0.178 | 0.395 | 0.405 | 0.167 | 0.52 | 0.47 | 0.457 |
7 2024 |
MedVersa
Harvard |
1.008 | 0.21 | 0.431 | 0.498 | 0.202 | 0.527 | 0.532 | 0.475 |
8 2023 |
RadFM
SJTU |
0.775 | 0.157 | 0.365 | 0.392 | 0.135 | 0.504 | 0.406 | 0.438 |
9 2023 |
RaDialog
TUM |
0.876 | 0.188 | 0.402 | 0.45 | 0.158 | 0.522 | 0.435 | 0.456 |
10 2023 |
RGRG
TUM |
0.888 | 0.19 | 0.391 | 0.47 | 0.169 | 0.54 | 0.487 | 0.46 |
11 2023 |
VLCI-MIMIC
SYSU |
0.721 | 0.157 | 0.31 | 0.402 | 0.122 | 0.488 | 0.477 | 0.455 |
12 2023 |
VLCI-IU
SYSU |
0.897 | 0.214 | 0.365 | 0.467 | 0.215 | 0.573 | 0.536 | 0.452 |
13 2024 |
LLM-CXR
KAIST |
0.507 | 0.043 | 0.182 | 0.142 | 0.029 | 0.317 | 0.044 | 0.326 |
14 2024 |
GPT4V
OpenAI |
0.629 | 0.075 | 0.214 | 0.337 | 0.138 | 0.47 | 0.497 | 0.43 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.771 | 0.099 | 0.317 | 0.437 | 0.157 | 0.472 | 0.388 | 0.451 |
16 2024 |
MAIRA-2
Microsoft |
0.963 | 0.205 | 0.436 | 0.462 | 0.187 | 0.559 | 0.531 | 0.475 |
17 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.202 | 0.398 | 0.415 | 0.187 | 0.564 | 0.518 | 0.472 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.792 | 0.15 | 0.327 | 0.462 | 0.152 | 0.518 | 0.408 | 0.458 |
19 2024 |
Libra
University of Glasgow |
0.881 | 0.165 | 0.385 | 0.474 | 0.168 | 0.544 | 0.555 | 0.473 |
20 2025 |
MoERad-IU
IIT Madras |
1.018 | 0.227 | 0.434 | 0.446 | 0.247 | 0.575 | 0.494 | 0.468 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.756 | 0.145 | 0.351 | 0.406 | 0.116 | 0.508 | 0.431 | 0.446 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.891 | 0.209 | 0.383 | 0.488 | 0.169 | 0.544 | 0.453 | 0.458 |
23 2025 |
DD-LLaVA-X
SNUH |
0.886 | 0.166 | 0.387 | 0.469 | 0.174 | 0.542 | 0.504 | 0.459 |
24 2025 |
MedGemma
|
1.008 | 0.2 | 0.427 | 0.479 | 0.223 | 0.617 | 0.566 | 0.457 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.791 | 0.177 | 0.364 | 0.431 | 0.139 | 0.481 | 0.523 | 0.465 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.748 | 0.165 | 0.333 | 0.395 | 0.148 | 0.502 | 0.468 | 0.425 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.838 | 0.196 | 0.389 | 0.429 | 0.166 | 0.5 | 0.508 | 0.466 |
4 2024 |
MedVersa
Harvard |
0.984 | 0.172 | 0.438 | 0.48 | 0.188 | 0.527 | 0.524 | 0.467 |
5 2023 |
RadFM
SJTU |
0.737 | 0.132 | 0.338 | 0.375 | 0.131 | 0.466 | 0.405 | 0.429 |
6 2024 |
GPT4V
OpenAI |
0.605 | 0.072 | 0.214 | 0.364 | 0.175 | 0.456 | 0.356 | 0.423 |
MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments. * denotes the model was trained on this dataset.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN | 1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.741 | 0.113 | 0.346 | 0.347 | 0.148 | 0.474 | 0.257 | 0.355 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.788 | 0.145 | 0.361 | 0.375 | 0.17 | 0.485 | 0.311 | 0.363 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.698 | 0.077 | 0.314 | 0.325 | 0.142 | 0.469 | 0.225 | 0.351 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.805 | 0.142 | 0.367 | 0.379 | 0.181 | 0.49 | 0.305 | 0.363 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.719 | 0.126 | 0.331 | 0.329 | 0.149 | 0.432 | 0.268 | 0.362 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.613 | 0.055 | 0.303 | 0.191 | 0.103 | 0.448 | 0.164 | 0.347 |
7 2024 |
MedVersa
Harvard |
1.103 | 0.209 | 0.448 | 0.466 | 0.273 | 0.55 | 0.374 | 0.365 |
8 2023 |
RadFM
SJTU |
0.65 | 0.087 | 0.313 | 0.259 | 0.109 | 0.45 | 0.185 | 0.351 |
9 2023 |
RaDialog
TUM |
0.799 | 0.127 | 0.363 | 0.387 | 0.172 | 0.485 | 0.273 | 0.359 |
10 2023 |
RGRG
TUM |
0.755 | 0.13 | 0.348 | 0.344 | 0.168 | 0.491 | 0.273 | 0.352 |
11 2023 |
VLCI-MIMIC
SYSU |
0.68 | 0.136 | 0.304 | 0.305 | 0.14 | 0.45 | 0.256 | 0.357 |
12 2023 |
VLCI-IU
SYSU |
0.599 | 0.075 | 0.263 | 0.212 | 0.109 | 0.449 | 0.21 | 0.347 |
13 2024 |
LLM-CXR
KAIST |
0.516 | 0.037 | 0.181 | 0.156 | 0.046 | 0.341 | 0.043 | 0.307 |
14 2024 |
GPT4V
OpenAI |
0.558 | 0.068 | 0.207 | 0.214 | 0.084 | 0.423 | 0.161 | 0.343 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.544 | 0.02 | 0.192 | 0.224 | 0.059 | 0.36 | 0.123 | 0.341 |
16 2024 |
MAIRA-2
Microsoft |
0.694 | 0.088 | 0.308 | 0.339 | 0.131 | 0.517 | 0.224 | 0.359 |
17 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.208 | 0.383 | 0.396 | 0.223 | 0.531 | 0.327 | 0.358 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.87 | 0.198 | 0.367 | 0.423 | 0.22 | 0.521 | 0.338 | 0.359 |
19 2024 |
Libra
University of Glasgow |
0.898 | 0.232 | 0.402 | 0.403 | 0.218 | 0.523 | 0.356 | 0.362 |
20 2025 |
MoERad-IU
IIT Madras |
0.643 | 0.064 | 0.321 | 0.213 | 0.122 | 0.455 | 0.174 | 0.347 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.726 | 0.163 | 0.341 | 0.334 | 0.143 | 0.465 | 0.24 | 0.354 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.888 | 0.223 | 0.386 | 0.431 | 0.207 | 0.534 | 0.294 | 0.356 |
23 2025 |
DD-LLaVA-X
SNUH |
0.801 | 0.154 | 0.348 | 0.402 | 0.182 | 0.505 | 0.301 | 0.361 |
24 2025 |
MedGemma
|
0.744 | 0.165 | 0.346 | 0.339 | 0.159 | 0.549 | 0.293 | 0.349 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.802 | 0.165 | 0.353 | 0.382 | 0.193 | 0.511 | 0.377 | 0.365 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.715 | 0.127 | 0.3 | 0.342 | 0.173 | 0.51 | 0.302 | 0.355 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.825 | 0.166 | 0.362 | 0.391 | 0.203 | 0.52 | 0.367 | 0.365 |
4 2024 |
MedVersa
Harvard |
0.919 | 0.193 | 0.43 | 0.315 | 0.273 | 0.554 | 0.421 | 0.361 |
5 2023 |
RadFM
SJTU |
0.625 | 0.081 | 0.281 | 0.245 | 0.111 | 0.448 | 0.214 | 0.346 |
6 2024 |
GPT4V
OpenAI |
0.549 | 0.065 | 0.204 | 0.19 | 0.085 | 0.429 | 0.127 | 0.331 |
IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN | 1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.827 | 0.116 | 0.353 | 0.488 | 0.139 | 0.503 | 0.389 | 0.574 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.988 | 0.178 | 0.386 | 0.593 | 0.169 | 0.585 | 0.661 | 0.622 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.92 | 0.157 | 0.413 | 0.495 | 0.153 | 0.534 | 0.541 | 0.548 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.179 | 0.198 | 0.453 | 0.593 | 0.211 | 0.618 | 0.648 | 0.576 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
1.126 | 0.199 | 0.422 | 0.609 | 0.209 | 0.606 | 0.682 | 0.608 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
1.283 | 0.244 | 0.482 | 0.548 | 0.265 | 0.62 | 0.686 | 0.563 |
7 2024 |
MedVersa
Harvard |
1.46 | 0.206 | 0.527 | 0.606 | 0.235 | 0.65 | 0.631 | 0.569 |
8 2023 |
RadFM
SJTU |
1.187 | 0.2 | 0.459 | 0.566 | 0.23 | 0.627 | 0.615 | 0.572 |
9 2023 |
RaDialog
TUM |
1.086 | 0.201 | 0.444 | 0.544 | 0.205 | 0.586 | 0.586 | 0.543 |
10 2023 |
RGRG
TUM |
1.174 | 0.216 | 0.437 | 0.602 | 0.223 | 0.62 | 0.665 | 0.596 |
11 2023 |
VLCI-MIMIC
SYSU |
0.913 | 0.139 | 0.364 | 0.483 | 0.22 | 0.578 | 0.474 | 0.488 |
12 2023 |
VLCI-IU
SYSU |
1.381 | 0.268 | 0.455 | 0.619 | 0.288 | 0.679 | 0.698 | 0.551 |
13 2024 |
LLM-CXR
KAIST |
0.486 | 0.033 | 0.186 | 0.057 | 0.023 | 0.28 | 0.025 | 0.302 |
14 2024 |
GPT4V
OpenAI |
0.708 | 0.076 | 0.274 | 0.405 | 0.146 | 0.517 | 0.651 | 0.55 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.956 | 0.142 | 0.375 | 0.522 | 0.213 | 0.543 | 0.523 | 0.543 |
16 2024 |
MAIRA-2
Microsoft |
1.298 | 0.219 | 0.477 | 0.604 | 0.233 | 0.627 | 0.194 | 0.599 |
17 2024 |
CXRMate-ED
CSIRO |
1.22 | 0.225 | 0.464 | 0.557 | 0.249 | 0.655 | 0.685 | 0.597 |
18 2024 |
CXRMate-RRG24
CSIRO |
1.458 | 0.245 | 0.456 | 0.638 | 0.302 | 0.666 | 0.68 | 0.598 |
19 2024 |
Libra
University of Glasgow |
1.176 | 0.183 | 0.441 | 0.614 | 0.21 | 0.624 | 0.698 | 0.593 |
20 2025 |
MoERad-IU
IIT Madras |
1.922 | 0.277 | 0.525 | 0.641 | 0.341 | 0.684 | 0.665 | 0.587 |
21 2025 |
MoERad-MIMIC
IIT Madras |
1.02 | 0.171 | 0.42 | 0.559 | 0.178 | 0.603 | 0.584 | 0.579 |
22 2025 |
RadPhi3.5Vision
Microsoft |
1.166 | 0.248 | 0.433 | 0.607 | 0.22 | 0.634 | 0.597 | 0.552 |
23 2025 |
DD-LLaVA-X
SNUH |
1.204 | 0.189 | 0.443 | 0.6 | 0.233 | 0.636 | 0.671 | 0.574 |
24 2025 |
MedGemma
|
1.34 | 0.217 | 0.475 | 0.6 | 0.26 | 0.678 | 0.724 | 0.57 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
1.111 | 0.227 | 0.449 | 0.594 | 0.187 | 0.57 | 0.681 | 0.615 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.995 | 0.198 | 0.394 | 0.55 | 0.211 | 0.604 | 0.706 | 0.568 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.249 | 0.244 | 0.476 | 0.598 | 0.232 | 0.606 | 0.694 | 0.588 |
4 2024 |
MedVersa
Harvard |
1.452 | 0.195 | 0.518 | 0.601 | 0.244 | 0.628 | 0.658 | 0.583 |
5 2023 |
RadFM
SJTU |
1.22 | 0.196 | 0.479 | 0.556 | 0.234 | 0.596 | 0.644 | 0.551 |
6 2024 |
GPT4V
OpenAI |
0.683 | 0.079 | 0.235 | 0.403 | 0.16 | 0.519 | 0.399 | 0.528 |
CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.
Rank | Model | 1/RadCliQ-v1 | BLEU | BertScore | SembScore | RadGraph | RaTEScore | GREEN | 1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
0.638 | 0.123 | 0.278 | 0.269 | 0.125 | 0.434 | 0.183 | 0.341 |
2 2024 |
CheXpertPlus-MIMIC
Stanford |
0.663 | 0.14 | 0.292 | 0.294 | 0.134 | 0.43 | 0.238 | 0.344 |
3 2024 |
CheXpertPlus-CheX
Stanford |
0.786 | 0.15 | 0.342 | 0.377 | 0.191 | 0.487 | 0.237 | 0.343 |
4 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.808 | 0.153 | 0.335 | 0.404 | 0.207 | 0.497 | 0.274 | 0.348 |
5 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.626 | 0.124 | 0.267 | 0.266 | 0.119 | 0.42 | 0.215 | 0.346 |
6 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.577 | 0.084 | 0.267 | 0.155 | 0.098 | 0.382 | 0.147 | 0.332 |
7 2024 |
MedVersa
Harvard |
0.719 | 0.129 | 0.323 | 0.344 | 0.147 | 0.47 | 0.243 | 0.343 |
8 2023 |
RadFM
SJTU |
0.572 | 0.081 | 0.235 | 0.216 | 0.08 | 0.396 | 0.096 | 0.333 |
9 2023 |
RaDialog
TUM |
0.709 | 0.131 | 0.312 | 0.353 | 0.138 | 0.445 | 0.211 | 0.333 |
10 2023 |
RGRG
TUM |
0.674 | 0.154 | 0.315 | 0.274 | 0.14 | 0.453 | 0.216 | 0.337 |
11 2023 |
VLCI-MIMIC
SYSU |
0.589 | 0.12 | 0.229 | 0.251 | 0.101 | 0.384 | 0.165 | 0.33 |
12 2023 |
VLCI-IU
SYSU |
0.555 | 0.106 | 0.22 | 0.17 | 0.094 | 0.418 | 0.194 | 0.339 |
13 2024 |
LLM-CXR
KAIST |
0.519 | 0.041 | 0.162 | 0.211 | 0.037 | 0.321 | 0.022 | 0.291 |
14 2024 |
GPT4V
OpenAI |
0.568 | 0.081 | 0.215 | 0.234 | 0.082 | 0.415 | 0.152 | 0.339 |
15 2024 |
BiomedGPT-IU
Lehigh University |
0.552 | 0.022 | 0.2 | 0.241 | 0.056 | 0.351 | 0.118 | 0.32 |
16 2024 |
MAIRA-2
Microsoft |
0.788 | 0.163 | 0.359 | 0.355 | 0.189 | 0.485 | 0.273 | 0.352 |
17 2024 |
CXRMate-ED
CSIRO |
0.723 | 0.157 | 0.324 | 0.316 | 0.175 | 0.498 | 0.265 | 0.367 |
18 2024 |
CXRMate-RRG24
CSIRO |
0.801 | 0.157 | 0.315 | 0.411 | 0.218 | 0.521 | 0.276 | 0.35 |
19 2024 |
Libra
University of Glasgow |
0.718 | 0.157 | 0.319 | 0.323 | 0.169 | 0.466 | 0.253 | 0.344 |
20 2025 |
MoERad-IU
IIT Madras |
0.595 | 0.075 | 0.284 | 0.175 | 0.102 | 0.39 | 0.127 | 0.341 |
21 2025 |
MoERad-MIMIC
IIT Madras |
0.641 | 0.122 | 0.267 | 0.3 | 0.12 | 0.434 | 0.166 | 0.343 |
22 2025 |
RadPhi3.5Vision
Microsoft |
0.86 | 0.198 | 0.353 | 0.437 | 0.217 | 0.51 | 0.243 | 0.356 |
23 2025 |
DD-LLaVA-X
SNUH |
0.753 | 0.085 | 0.318 | 0.385 | 0.172 | 0.476 | 0.206 | 0.343 |
24 2025 |
MedGemma
|
0.706 | 0.147 | 0.328 | 0.325 | 0.137 | 0.511 | 0.246 | 0.337 |
1 2024 |
CheXpertPlus_MIMIC
Stanford |
0.482 | 0.103 | 0.002 | 0.318 | 0.049 | 0.429 | 0.293 | 0.347 |
2 2024 |
CheXpertPlus_CheX
Stanford |
0.512 | 0.142 | 0.02 | 0.38 | 0.07 | 0.492 | 0.363 | 0.353 |
3 2024 |
CheXpertPlus_CheX_MIMIC
Stanford |
0.511 | 0.14 | 0.011 | 0.388 | 0.071 | 0.503 | 0.382 | 0.36 |
4 2024 |
MedVersa
Harvard |
0.493 | 0.09 | 0.013 | 0.337 | 0.05 | 0.452 | 0.334 | 0.354 |
5 2023 |
RadFM
SJTU |
0.443 | 0.067 | -0.038 | 0.229 | 0.027 | 0.39 | 0.137 | 0.34 |
6 2024 |
GPT4V
OpenAI |
0.431 | 0.055 | -0.065 | 0.208 | 0.028 | 0.393 | 0.182 | 0.329 |
Performance comparison of various vision-language models on medical VQA tasks.
Rank | Model | Overall Accuracy | Differential Diagnosis | Geometric Information | Location Assessment | Negation Assessment | Presence Assessment |
---|---|---|---|---|---|---|---|
1 |
MedGemma-4B-it
|
0.8217 | 0.7671 | 0.8045 | 0.8347 | 0.8503 | 0.8521 |
2 |
Janus-Pro-7B
DeepSeek |
0.6656 | 0.5634 | 0.7542 | 0.6462 | 0.7573 | 0.6070 |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
0.6555 | 0.6361 | 0.6648 | 0.6324 | 0.8327 | 0.5114 |
4 |
Eagle2-9B
NVIDIA |
0.6443 | 0.6817 | 0.5698 | 0.5695 | 0.8632 | 0.5375 |
5 |
Gemini-1.5-Pro
|
0.6331 | 0.6221 | 0.4689 | 0.5960 | 0.8568 | 0.6217 |
6 |
Qwen2VL-7B-Instruct
Alibaba |
0.5470 | 0.5265 | 0.4494 | 0.5405 | 0.6269 | 0.5915 |
7 |
Phi35-Vision-Instruct
Microsoft |
0.4749 | 0.6224 | 0.2215 | 0.3711 | 0.7950 | 0.3644 |
8 |
LLaVA-1.5-7B
Meta |
0.2661 | 0.2161 | 0.2346 | 0.2761 | 0.2402 | 0.3633 |