ReXrank

Chest X-ray Interpretation Leaderboard

What is ReXrank?

ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.


ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.


ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.


ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.


ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.

ReXrank Challenge V1.0 Leaderboard (RRG)

Rank ReXGradient MIMIC-CXR IU-Xray CheXpert Plus

1

MedVersa

Harvard

MedVersa

Harvard

CheXpertPlus-MIMIC

Stanford

CXRMate-ED

CSIRO

2

MAIRA-2

Microsoft

CheXpertPlus-MIMIC

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

RadPhi3.5Vision

Microsoft

3

Libra

University of Glasgow

CheXpertPlus-CheX-MIMIC

Stanford

MAIRA-2

Microsoft

MAIRA-2

Microsoft

4

CheXpertPlus-MIMIC

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

CXRMate-RRG24

CSIRO

CXRMate-RRG24

CSIRO

5

CXRMate-ED

CSIRO

Libra

University of Glasgow

CXRMate-ED

CSIRO

CheXpertPlus-CheX-MIMIC

Stanford

6

Cvt2distilgpt2-MIMIC

CSIRO

DD-LLaVA-X

SNUH

RGRG

TUM

Cvt2distilgpt2-MIMIC

CSIRO

7

MoERad-IU

IIT Madras

MAIRA-2

Microsoft

Libra

University of Glasgow

CheXpertPlus-MIMIC

Stanford

8

CheXpertPlus-CheX-MIMIC

Stanford

RaDialog

TUM

MoERad-IU

IIT Madras

Libra

University of Glasgow

9

RGRG

TUM

CXRMate-RRG24

CSIRO

MoERad-MIMIC

IIT Madras

CheXpertPlus-CheX

Stanford

10

DD-LLaVA-X

SNUH

CXRMate-ED

CSIRO

CheXpertPlus-CheX-MIMIC

Stanford

DD-LLaVA-X

SNUH

11

RadPhi3.5Vision

Microsoft

VLCI-MIMIC

SYSU

CheXagent

Stanford

MoERad-MIMIC

IIT Madras

12

CXRMate-RRG24

CSIRO

RadPhi3.5Vision

Microsoft

DD-LLaVA-X

SNUH

MedVersa

Harvard

13

MedGemma

Google

CheXagent

Stanford

RadFM

SJTU

CheXagent

Stanford

14

Cvt2distilgpt2-IU

CSIRO

MoERad-MIMIC

IIT Madras

MedGemma

Google

MoERad-IU

IIT Madras

15

CheXagent

Stanford

RGRG

TUM

MedVersa

Harvard

VLCI-IU

SYSU

16

RaDialog

TUM

CheXpertPlus-CheX

Stanford

Cvt2distilgpt2-IU

CSIRO

GPT4V

OpenAI

17

VLCI-MIMIC

SYSU

RadFM

SJTU

RadPhi3.5Vision

Microsoft

RGRG

TUM

18

VLCI-IU

SYSU

MedGemma

Google

VLCI-IU

SYSU

MedGemma

Google

19

BiomedGPT-IU

Lehigh University

Cvt2distilgpt2-IU

CSIRO

GPT4V

OpenAI

RaDialog

TUM

20

MoERad-MIMIC

IIT Madras

VLCI-IU

SYSU

CheXpertPlus-CheX

Stanford

RadFM

SJTU

21

RadFM

SJTU

MoERad-IU

IIT Madras

RaDialog

TUM

Cvt2distilgpt2-IU

CSIRO

22

GPT4V

OpenAI

GPT4V

OpenAI

BiomedGPT-IU

Lehigh University

VLCI-MIMIC

SYSU

23

CheXpertPlus-CheX

Stanford

BiomedGPT-IU

Lehigh University

VLCI-MIMIC

SYSU

BiomedGPT-IU

Lehigh University

24

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

ReXrank Challenge V2.0 Leaderboard (VQA)

Rank ReXVQA

1

MedGemma-4B-it

Google

2

Janus-Pro-7B

DeepSeek

3

Qwen2.5VL-7B-Instruct

Qwen

4

Eagle2-9B

NVIDIA

5

Gemini-1.5-Pro

Google

6

Qwen2VL-7B-Instruct

Alibaba

7

Phi35-Vision-Instruct

Microsoft

8

LLaVA-1.5-7B

Meta

ReXrank Challenge V1.0 - Model Performance on ReXGradient

ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN 1/FineRadScore

1

2024
CheXagent

Stanford

0.674 0.093 0.305 0.366 0.08 0.428 0.241 0.456

2

2024
CheXpertPlus-MIMIC

Stanford

0.777 0.154 0.341 0.442 0.13 0.501 0.52 0.473

3

2024
CheXpertPlus-CheX

Stanford

0.787 0.143 0.361 0.431 0.124 0.476 0.411 0.414

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.83 0.169 0.372 0.442 0.154 0.517 0.489 0.465

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.866 0.186 0.374 0.46 0.176 0.524 0.514 0.47

6

2023
Cvt2distilgpt2-IU

CSIRO

0.842 0.178 0.395 0.405 0.167 0.52 0.47 0.457

7

2024
MedVersa

Harvard

1.008 0.21 0.431 0.498 0.202 0.527 0.532 0.475

8

2023
RadFM

SJTU

0.775 0.157 0.365 0.392 0.135 0.504 0.406 0.438

9

2023
RaDialog

TUM

0.876 0.188 0.402 0.45 0.158 0.522 0.435 0.456

10

2023
RGRG

TUM

0.888 0.19 0.391 0.47 0.169 0.54 0.487 0.46

11

2023
VLCI-MIMIC

SYSU

0.721 0.157 0.31 0.402 0.122 0.488 0.477 0.455

12

2023
VLCI-IU

SYSU

0.897 0.214 0.365 0.467 0.215 0.573 0.536 0.452

13

2024
LLM-CXR

KAIST

0.507 0.043 0.182 0.142 0.029 0.317 0.044 0.326

14

2024
GPT4V

OpenAI

0.629 0.075 0.214 0.337 0.138 0.47 0.497 0.43

15

2024
BiomedGPT-IU

Lehigh University

0.771 0.099 0.317 0.437 0.157 0.472 0.388 0.451

16

2024
MAIRA-2

Microsoft

0.963 0.205 0.436 0.462 0.187 0.559 0.531 0.475

17

2024
CXRMate-ED

CSIRO

0.872 0.202 0.398 0.415 0.187 0.564 0.518 0.472

18

2024
CXRMate-RRG24

CSIRO

0.792 0.15 0.327 0.462 0.152 0.518 0.408 0.458

19

2024
Libra

University of Glasgow

0.881 0.165 0.385 0.474 0.168 0.544 0.555 0.473

20

2025
MoERad-IU

IIT Madras

1.018 0.227 0.434 0.446 0.247 0.575 0.494 0.468

21

2025
MoERad-MIMIC

IIT Madras

0.756 0.145 0.351 0.406 0.116 0.508 0.431 0.446

22

2025
RadPhi3.5Vision

Microsoft

0.891 0.209 0.383 0.488 0.169 0.544 0.453 0.458

23

2025
DD-LLaVA-X

SNUH

0.886 0.166 0.387 0.469 0.174 0.542 0.504 0.459

24

2025
MedGemma

Google

1.008 0.2 0.427 0.479 0.223 0.617 0.566 0.457

ReXrank Challenge V1.0 - Model Performance on MIMIC-CXR

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments. * denotes the model was trained on this dataset.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN 1/FineRadScore

1

2024
CheXagent

Stanford

0.741 0.113 0.346 0.347 0.148 0.474 0.257 0.355

2

2024
CheXpertPlus-MIMIC

Stanford

0.788 0.145 0.361 0.375 0.17 0.485 0.311 0.363

3

2024
CheXpertPlus-CheX

Stanford

0.698 0.077 0.314 0.325 0.142 0.469 0.225 0.351

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.805 0.142 0.367 0.379 0.181 0.49 0.305 0.363

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.719 0.126 0.331 0.329 0.149 0.432 0.268 0.362

6

2023
Cvt2distilgpt2-IU

CSIRO

0.613 0.055 0.303 0.191 0.103 0.448 0.164 0.347

7

2024
MedVersa

Harvard

1.103 0.209 0.448 0.466 0.273 0.55 0.374 0.365

8

2023
RadFM

SJTU

0.65 0.087 0.313 0.259 0.109 0.45 0.185 0.351

9

2023
RaDialog

TUM

0.799 0.127 0.363 0.387 0.172 0.485 0.273 0.359

10

2023
RGRG

TUM

0.755 0.13 0.348 0.344 0.168 0.491 0.273 0.352

11

2023
VLCI-MIMIC

SYSU

0.68 0.136 0.304 0.305 0.14 0.45 0.256 0.357

12

2023
VLCI-IU

SYSU

0.599 0.075 0.263 0.212 0.109 0.449 0.21 0.347

13

2024
LLM-CXR

KAIST

0.516 0.037 0.181 0.156 0.046 0.341 0.043 0.307

14

2024
GPT4V

OpenAI

0.558 0.068 0.207 0.214 0.084 0.423 0.161 0.343

15

2024
BiomedGPT-IU

Lehigh University

0.544 0.02 0.192 0.224 0.059 0.36 0.123 0.341

16

2024
MAIRA-2

Microsoft

0.694 0.088 0.308 0.339 0.131 0.517 0.224 0.359

17

2024
CXRMate-ED

CSIRO

0.872 0.208 0.383 0.396 0.223 0.531 0.327 0.358

18

2024
CXRMate-RRG24

CSIRO

0.87 0.198 0.367 0.423 0.22 0.521 0.338 0.359

19

2024
Libra

University of Glasgow

0.898 0.232 0.402 0.403 0.218 0.523 0.356 0.362

20

2025
MoERad-IU

IIT Madras

0.643 0.064 0.321 0.213 0.122 0.455 0.174 0.347

21

2025
MoERad-MIMIC

IIT Madras

0.726 0.163 0.341 0.334 0.143 0.465 0.24 0.354

22

2025
RadPhi3.5Vision

Microsoft

0.888 0.223 0.386 0.431 0.207 0.534 0.294 0.356

23

2025
DD-LLaVA-X

SNUH

0.801 0.154 0.348 0.402 0.182 0.505 0.301 0.361

24

2025
MedGemma

Google

0.744 0.165 0.346 0.339 0.159 0.549 0.293 0.349

ReXrank Challenge V1.0 - Model Performance on IU Xray

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN 1/FineRadScore

1

2024
CheXagent

Stanford

0.827 0.116 0.353 0.488 0.139 0.503 0.389 0.574

2

2024
CheXpertPlus-MIMIC

Stanford

0.988 0.178 0.386 0.593 0.169 0.585 0.661 0.622

3

2024
CheXpertPlus-CheX

Stanford

0.92 0.157 0.413 0.495 0.153 0.534 0.541 0.548

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

1.179 0.198 0.453 0.593 0.211 0.618 0.648 0.576

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

1.126 0.199 0.422 0.609 0.209 0.606 0.682 0.608

6

2023
Cvt2distilgpt2-IU

CSIRO

1.283 0.244 0.482 0.548 0.265 0.62 0.686 0.563

7

2024
MedVersa

Harvard

1.46 0.206 0.527 0.606 0.235 0.65 0.631 0.569

8

2023
RadFM

SJTU

1.187 0.2 0.459 0.566 0.23 0.627 0.615 0.572

9

2023
RaDialog

TUM

1.086 0.201 0.444 0.544 0.205 0.586 0.586 0.543

10

2023
RGRG

TUM

1.174 0.216 0.437 0.602 0.223 0.62 0.665 0.596

11

2023
VLCI-MIMIC

SYSU

0.913 0.139 0.364 0.483 0.22 0.578 0.474 0.488

12

2023
VLCI-IU

SYSU

1.381 0.268 0.455 0.619 0.288 0.679 0.698 0.551

13

2024
LLM-CXR

KAIST

0.486 0.033 0.186 0.057 0.023 0.28 0.025 0.302

14

2024
GPT4V

OpenAI

0.708 0.076 0.274 0.405 0.146 0.517 0.651 0.55

15

2024
BiomedGPT-IU

Lehigh University

0.956 0.142 0.375 0.522 0.213 0.543 0.523 0.543

16

2024
MAIRA-2

Microsoft

1.298 0.219 0.477 0.604 0.233 0.627 0.194 0.599

17

2024
CXRMate-ED

CSIRO

1.22 0.225 0.464 0.557 0.249 0.655 0.685 0.597

18

2024
CXRMate-RRG24

CSIRO

1.458 0.245 0.456 0.638 0.302 0.666 0.68 0.598

19

2024
Libra

University of Glasgow

1.176 0.183 0.441 0.614 0.21 0.624 0.698 0.593

20

2025
MoERad-IU

IIT Madras

1.922 0.277 0.525 0.641 0.341 0.684 0.665 0.587

21

2025
MoERad-MIMIC

IIT Madras

1.02 0.171 0.42 0.559 0.178 0.603 0.584 0.579

22

2025
RadPhi3.5Vision

Microsoft

1.166 0.248 0.433 0.607 0.22 0.634 0.597 0.552

23

2025
DD-LLaVA-X

SNUH

1.204 0.189 0.443 0.6 0.233 0.636 0.671 0.574

24

2025
MedGemma

Google

1.34 0.217 0.475 0.6 0.26 0.678 0.724 0.57

ReXrank Challenge V1.0 - Model Performance on CheXpert Plus

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN 1/FineRadScore

1

2024
CheXagent

Stanford

0.638 0.123 0.278 0.269 0.125 0.434 0.183 0.341

2

2024
CheXpertPlus-MIMIC

Stanford

0.663 0.14 0.292 0.294 0.134 0.43 0.238 0.344

3

2024
CheXpertPlus-CheX

Stanford

0.786 0.15 0.342 0.377 0.191 0.487 0.237 0.343

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.808 0.153 0.335 0.404 0.207 0.497 0.274 0.348

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.626 0.124 0.267 0.266 0.119 0.42 0.215 0.346

6

2023
Cvt2distilgpt2-IU

CSIRO

0.577 0.084 0.267 0.155 0.098 0.382 0.147 0.332

7

2024
MedVersa

Harvard

0.719 0.129 0.323 0.344 0.147 0.47 0.243 0.343

8

2023
RadFM

SJTU

0.572 0.081 0.235 0.216 0.08 0.396 0.096 0.333

9

2023
RaDialog

TUM

0.709 0.131 0.312 0.353 0.138 0.445 0.211 0.333

10

2023
RGRG

TUM

0.674 0.154 0.315 0.274 0.14 0.453 0.216 0.337

11

2023
VLCI-MIMIC

SYSU

0.589 0.12 0.229 0.251 0.101 0.384 0.165 0.33

12

2023
VLCI-IU

SYSU

0.555 0.106 0.22 0.17 0.094 0.418 0.194 0.339

13

2024
LLM-CXR

KAIST

0.519 0.041 0.162 0.211 0.037 0.321 0.022 0.291

14

2024
GPT4V

OpenAI

0.568 0.081 0.215 0.234 0.082 0.415 0.152 0.339

15

2024
BiomedGPT-IU

Lehigh University

0.552 0.022 0.2 0.241 0.056 0.351 0.118 0.32

16

2024
MAIRA-2

Microsoft

0.788 0.163 0.359 0.355 0.189 0.485 0.273 0.352

17

2024
CXRMate-ED

CSIRO

0.723 0.157 0.324 0.316 0.175 0.498 0.265 0.367

18

2024
CXRMate-RRG24

CSIRO

0.801 0.157 0.315 0.411 0.218 0.521 0.276 0.35

19

2024
Libra

University of Glasgow

0.718 0.157 0.319 0.323 0.169 0.466 0.253 0.344

20

2025
MoERad-IU

IIT Madras

0.595 0.075 0.284 0.175 0.102 0.39 0.127 0.341

21

2025
MoERad-MIMIC

IIT Madras

0.641 0.122 0.267 0.3 0.12 0.434 0.166 0.343

22

2025
RadPhi3.5Vision

Microsoft

0.86 0.198 0.353 0.437 0.217 0.51 0.243 0.356

23

2025
DD-LLaVA-X

SNUH

0.753 0.085 0.318 0.385 0.172 0.476 0.206 0.343

24

2025
MedGemma

Google

0.706 0.147 0.328 0.325 0.137 0.511 0.246 0.337

ReXrank Challenge V2.0 - Model Performance

Performance comparison of various vision-language models on medical VQA tasks.

Rank Model Overall Accuracy Differential Diagnosis Geometric Information Location Assessment Negation Assessment Presence Assessment
1 MedGemma-4B-it

Google

0.8217 0.7671 0.8045 0.8347 0.8503 0.8521
2 Janus-Pro-7B

DeepSeek

0.6656 0.5634 0.7542 0.6462 0.7573 0.6070
3 Qwen2.5VL-7B-Instruct

Qwen

0.6555 0.6361 0.6648 0.6324 0.8327 0.5114
4 Eagle2-9B

NVIDIA

0.6443 0.6817 0.5698 0.5695 0.8632 0.5375
5 Gemini-1.5-Pro

Google

0.6331 0.6221 0.4689 0.5960 0.8568 0.6217
6 Qwen2VL-7B-Instruct

Alibaba

0.5470 0.5265 0.4494 0.5405 0.6269 0.5915
7 Phi35-Vision-Instruct

Microsoft

0.4749 0.6224 0.2215 0.3711 0.7950 0.3644
8 LLaVA-1.5-7B

Meta

0.2661 0.2161 0.2346 0.2761 0.2402 0.3633