ReXrank

Chest X-ray Interpretation Leaderboard

What is ReXrank?

ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.


ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.


ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.


ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.


ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.

ReXrank Challenge V1.0 Leaderboard (RRG)

Rank ReXGradient MIMIC-CXR IU-Xray CheXpert Plus

1

UniRG-CXR

Microsoft Research

UniRG-CXR

Microsoft Research

UniRG-CXR

Microsoft Research

UniRG-CXR

Microsoft Research

2

MoERad-IU

IIT Madras

MedVersa

Harvard

MoERad-IU

IIT Madras

RadPhi3.5Vision

Microsoft

3

MedGemma

Google

Libra

University of Glasgow

MedVersa

Harvard

CheXpertPlus-CheX-MIMIC

Stanford

4

MedVersa

Harvard

RadPhi3.5Vision

Microsoft

CXRMate-RRG24

CSIRO

CXRMate-RRG24

CSIRO

5

MAIRA-2

Microsoft

CXRMate-ED

CSIRO

VLCI-IU

SYSU

MAIRA-2

Microsoft

6

VLCI-IU

SYSU

CXRMate-RRG24

CSIRO

MedGemma

Google

CheXpertPlus-CheX

Stanford

7

RadPhi3.5Vision

Microsoft

CheXpertPlus-CheX-MIMIC

Stanford

MAIRA-2

Microsoft

DD-LLaVA-X

SNUH

8

RGRG

TUM

DD-LLaVA-X

SNUH

Cvt2distilgpt2-IU

CSIRO

CXRMate-ED

CSIRO

9

DD-LLaVA-X

SNUH

RaDialog

TUM

CXRMate-ED

CSIRO

MedVersa

Harvard

10

Libra

University of Glasgow

CheXpertPlus-MIMIC

Stanford

DD-LLaVA-X

SNUH

Libra

University of Glasgow

11

RaDialog

TUM

RGRG

TUM

RadFM

SJTU

RaDialog

TUM

12

CXRMate-ED

CSIRO

MedGemma

Google

CheXpertPlus-CheX-MIMIC

Stanford

MedGemma

Google

13

Cvt2distilgpt2-MIMIC

CSIRO

CheXagent

Stanford

Libra

University of Glasgow

RGRG

TUM

14

Cvt2distilgpt2-IU

CSIRO

MoERad-MIMIC

IIT Madras

RGRG

TUM

CheXpertPlus-MIMIC

Stanford

15

CheXpertPlus-CheX-MIMIC

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

RadPhi3.5Vision

Microsoft

MoERad-MIMIC

IIT Madras

16

CXRMate-RRG24

CSIRO

CheXpertPlus-CheX

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

CheXagent

Stanford

17

CheXpertPlus-CheX

Stanford

MAIRA-2

Microsoft

RaDialog

TUM

Cvt2distilgpt2-MIMIC

CSIRO

18

CheXpertPlus-MIMIC

Stanford

VLCI-MIMIC

SYSU

MoERad-MIMIC

IIT Madras

MoERad-IU

IIT Madras

19

RadFM

SJTU

RadFM

SJTU

CheXpertPlus-MIMIC

Stanford

VLCI-MIMIC

SYSU

20

BiomedGPT-IU

Lehigh University

MoERad-IU

IIT Madras

BiomedGPT-IU

Lehigh University

Cvt2distilgpt2-IU

CSIRO

21

MoERad-MIMIC

IIT Madras

Cvt2distilgpt2-IU

CSIRO

CheXpertPlus-CheX

Stanford

RadFM

SJTU

22

VLCI-MIMIC

SYSU

VLCI-IU

SYSU

VLCI-MIMIC

SYSU

GPT4V

OpenAI

23

CheXagent

Stanford

GPT4V

OpenAI

CheXagent

Stanford

VLCI-IU

SYSU

24

GPT4V

OpenAI

BiomedGPT-IU

Lehigh University

GPT4V

OpenAI

BiomedGPT-IU

Lehigh University

25

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

ReXrank Challenge V2.0 Leaderboard (VQA)

Rank ReXVQA

1

MedGemma-4B-it

Google

2

Janus-Pro-7B

DeepSeek

3

Qwen2.5VL-7B-Instruct

Qwen

4

Eagle2-9B

NVIDIA

5

Gemini-1.5-Pro

Google

6

Qwen2VL-7B-Instruct

Alibaba

7

Phi35-Vision-Instruct

Microsoft

8

LLaVA-1.5-7B

Meta

ReXrank Challenge V1.0 - RRG Performance on ReXGradient

ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN

1

2024
CheXagent

Stanford

0.674 0.093 0.305 0.366 0.08 0.428 0.241

2

2024
CheXpertPlus-MIMIC

Stanford

0.777 0.154 0.341 0.442 0.13 0.501 0.52

3

2024
CheXpertPlus-CheX

Stanford

0.787 0.143 0.361 0.431 0.124 0.476 0.411

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.83 0.169 0.372 0.442 0.154 0.517 0.489

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.866 0.186 0.374 0.46 0.176 0.524 0.514

6

2023
Cvt2distilgpt2-IU

CSIRO

0.842 0.178 0.395 0.405 0.167 0.52 0.47

7

2024
MedVersa

Harvard

1.008 0.21 0.431 0.498 0.202 0.527 0.532

8

2023
RadFM

SJTU

0.775 0.157 0.365 0.392 0.135 0.504 0.406

9

2023
RaDialog

TUM

0.876 0.188 0.402 0.45 0.158 0.522 0.435

10

2023
RGRG

TUM

0.888 0.19 0.391 0.47 0.169 0.54 0.487

11

2023
VLCI-MIMIC

SYSU

0.721 0.157 0.31 0.402 0.122 0.488 0.477

12

2023
VLCI-IU

SYSU

0.897 0.214 0.365 0.467 0.215 0.573 0.536

13

2024
LLM-CXR

KAIST

0.507 0.043 0.182 0.142 0.029 0.317 0.044

14

2024
GPT4V

OpenAI

0.629 0.075 0.214 0.337 0.138 0.47 0.497

15

2024
BiomedGPT-IU

Lehigh University

0.771 0.099 0.317 0.437 0.157 0.472 0.388

16

2024
MAIRA-2

Microsoft

0.963 0.205 0.436 0.462 0.187 0.559 0.531

17

2024
CXRMate-ED

CSIRO

0.872 0.202 0.398 0.415 0.187 0.564 0.518

18

2024
CXRMate-RRG24

CSIRO

0.792 0.15 0.327 0.462 0.152 0.518 0.408

19

2024
Libra

University of Glasgow

0.881 0.165 0.385 0.474 0.168 0.544 0.555

20

2025
MoERad-IU

IIT Madras

1.018 0.227 0.434 0.446 0.247 0.575 0.494

21

2025
MoERad-MIMIC

IIT Madras

0.756 0.145 0.351 0.406 0.116 0.508 0.431

22

2025
RadPhi3.5Vision

Microsoft

0.891 0.209 0.383 0.488 0.169 0.544 0.453

23

2025
DD-LLaVA-X

SNUH

0.886 0.166 0.387 0.469 0.174 0.542 0.504

24

2025
MedGemma

Google

1.008 0.2 0.427 0.479 0.223 0.617 0.566

25

2025
UniRG-CXR

Microsoft Research

1.621 0.291 0.538 0.576 0.298 0.622 0.476

ReXrank Challenge V1.0 - RRG Performance on MIMIC-CXR

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN

1

2024
CheXagent

Stanford

0.741 0.113 0.346 0.347 0.148 0.474 0.257

2

2024
CheXpertPlus-MIMIC

Stanford

0.788 0.145 0.361 0.375 0.17 0.485 0.311

3

2024
CheXpertPlus-CheX

Stanford

0.698 0.077 0.314 0.325 0.142 0.469 0.225

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.805 0.142 0.367 0.379 0.181 0.49 0.305

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.719 0.126 0.331 0.329 0.149 0.432 0.268

6

2023
Cvt2distilgpt2-IU

CSIRO

0.613 0.055 0.303 0.191 0.103 0.448 0.164

7

2024
MedVersa

Harvard

1.103 0.209 0.448 0.466 0.273 0.55 0.374

8

2023
RadFM

SJTU

0.65 0.087 0.313 0.259 0.109 0.45 0.185

9

2023
RaDialog

TUM

0.799 0.127 0.363 0.387 0.172 0.485 0.273

10

2023
RGRG

TUM

0.755 0.13 0.348 0.344 0.168 0.491 0.273

11

2023
VLCI-MIMIC

SYSU

0.68 0.136 0.304 0.305 0.14 0.45 0.256

12

2023
VLCI-IU

SYSU

0.599 0.075 0.263 0.212 0.109 0.449 0.21

13

2024
LLM-CXR

KAIST

0.516 0.037 0.181 0.156 0.046 0.341 0.043

14

2024
GPT4V

OpenAI

0.558 0.068 0.207 0.214 0.084 0.423 0.161

15

2024
BiomedGPT-IU

Lehigh University

0.544 0.02 0.192 0.224 0.059 0.36 0.123

16

2024
MAIRA-2

Microsoft

0.694 0.088 0.308 0.339 0.131 0.517 0.224

17

2024
CXRMate-ED

CSIRO

0.872 0.208 0.383 0.396 0.223 0.531 0.327

18

2024
CXRMate-RRG24

CSIRO

0.87 0.198 0.367 0.423 0.22 0.521 0.338

19

2024
Libra

University of Glasgow

0.898 0.232 0.402 0.403 0.218 0.523 0.356

20

2025
MoERad-IU

IIT Madras

0.643 0.064 0.321 0.213 0.122 0.455 0.174

21

2025
MoERad-MIMIC

IIT Madras

0.726 0.163 0.341 0.334 0.143 0.465 0.24

22

2025
RadPhi3.5Vision

Microsoft

0.888 0.223 0.386 0.431 0.207 0.534 0.294

23

2025
DD-LLaVA-X

SNUH

0.801 0.154 0.348 0.402 0.182 0.505 0.301

24

2025
MedGemma

Google

0.744 0.165 0.346 0.339 0.159 0.549 0.293

25

2025
UniRG-CXR

Microsoft Research

1.217 0.248 0.493 0.487 0.265 0.596 0.352

ReXrank Challenge V1.0 - RRG Performance on IU Xray

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN

1

2024
CheXagent

Stanford

0.827 0.116 0.353 0.488 0.139 0.503 0.389

2

2024
CheXpertPlus-MIMIC

Stanford

0.988 0.178 0.386 0.593 0.169 0.585 0.661

3

2024
CheXpertPlus-CheX

Stanford

0.92 0.157 0.413 0.495 0.153 0.534 0.541

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

1.179 0.198 0.453 0.593 0.211 0.618 0.648

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

1.126 0.199 0.422 0.609 0.209 0.606 0.682

6

2023
Cvt2distilgpt2-IU

CSIRO

1.283 0.244 0.482 0.548 0.265 0.62 0.686

7

2024
MedVersa

Harvard

1.46 0.206 0.527 0.606 0.235 0.65 0.631

8

2023
RadFM

SJTU

1.187 0.2 0.459 0.566 0.23 0.627 0.615

9

2023
RaDialog

TUM

1.086 0.201 0.444 0.544 0.205 0.586 0.586

10

2023
RGRG

TUM

1.174 0.216 0.437 0.602 0.223 0.62 0.665

11

2023
VLCI-MIMIC

SYSU

0.913 0.139 0.364 0.483 0.22 0.578 0.474

12

2023
VLCI-IU

SYSU

1.381 0.268 0.455 0.619 0.288 0.679 0.698

13

2024
LLM-CXR

KAIST

0.486 0.033 0.186 0.057 0.023 0.28 0.025

14

2024
GPT4V

OpenAI

0.708 0.076 0.274 0.405 0.146 0.517 0.651

15

2024
BiomedGPT-IU

Lehigh University

0.956 0.142 0.375 0.522 0.213 0.543 0.523

16

2024
MAIRA-2

Microsoft

1.298 0.219 0.477 0.604 0.233 0.627 0.194

17

2024
CXRMate-ED

CSIRO

1.22 0.225 0.464 0.557 0.249 0.655 0.685

18

2024
CXRMate-RRG24

CSIRO

1.458 0.245 0.456 0.638 0.302 0.666 0.68

19

2024
Libra

University of Glasgow

1.176 0.183 0.441 0.614 0.21 0.624 0.698

20

2025
MoERad-IU

IIT Madras

1.922 0.277 0.525 0.641 0.341 0.684 0.665

21

2025
MoERad-MIMIC

IIT Madras

1.02 0.171 0.42 0.559 0.178 0.603 0.584

22

2025
RadPhi3.5Vision

Microsoft

1.166 0.248 0.433 0.607 0.22 0.634 0.597

23

2025
DD-LLaVA-X

SNUH

1.204 0.189 0.443 0.6 0.233 0.636 0.671

24

2025
MedGemma

Google

1.34 0.217 0.475 0.6 0.26 0.678 0.724

25

2025
UniRG-CXR

Microsoft Research

1.977 0.265 0.565 0.659 0.286 0.69 0.639

ReXrank Challenge V1.0 - RRG Performance on CheXpert Plus

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation.

Rank Model 1/RadCliQ-v1 BLEU BertScore SembScore RadGraph RaTEScore GREEN

1

2024
CheXagent

Stanford

0.638 0.123 0.278 0.269 0.125 0.434 0.183

2

2024
CheXpertPlus-MIMIC

Stanford

0.663 0.14 0.292 0.294 0.134 0.43 0.238

3

2024
CheXpertPlus-CheX

Stanford

0.786 0.15 0.342 0.377 0.191 0.487 0.237

4

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.808 0.153 0.335 0.404 0.207 0.497 0.274

5

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.626 0.124 0.267 0.266 0.119 0.42 0.215

6

2023
Cvt2distilgpt2-IU

CSIRO

0.577 0.084 0.267 0.155 0.098 0.382 0.147

7

2024
MedVersa

Harvard

0.719 0.129 0.323 0.344 0.147 0.47 0.243

8

2023
RadFM

SJTU

0.572 0.081 0.235 0.216 0.08 0.396 0.096

9

2023
RaDialog

TUM

0.709 0.131 0.312 0.353 0.138 0.445 0.211

10

2023
RGRG

TUM

0.674 0.154 0.315 0.274 0.14 0.453 0.216

11

2023
VLCI-MIMIC

SYSU

0.589 0.12 0.229 0.251 0.101 0.384 0.165

12

2023
VLCI-IU

SYSU

0.555 0.106 0.22 0.17 0.094 0.418 0.194

13

2024
LLM-CXR

KAIST

0.519 0.041 0.162 0.211 0.037 0.321 0.022

14

2024
GPT4V

OpenAI

0.568 0.081 0.215 0.234 0.082 0.415 0.152

15

2024
BiomedGPT-IU

Lehigh University

0.552 0.022 0.2 0.241 0.056 0.351 0.118

16

2024
MAIRA-2

Microsoft

0.788 0.163 0.359 0.355 0.189 0.485 0.273

17

2024
CXRMate-ED

CSIRO

0.723 0.157 0.324 0.316 0.175 0.498 0.265

18

2024
CXRMate-RRG24

CSIRO

0.801 0.157 0.315 0.411 0.218 0.521 0.276

19

2024
Libra

University of Glasgow

0.718 0.157 0.319 0.323 0.169 0.466 0.253

20

2025
MoERad-IU

IIT Madras

0.595 0.075 0.284 0.175 0.102 0.39 0.127

21

2025
MoERad-MIMIC

IIT Madras

0.641 0.122 0.267 0.3 0.12 0.434 0.166

22

2025
RadPhi3.5Vision

Microsoft

0.86 0.198 0.353 0.437 0.217 0.51 0.243

23

2025
DD-LLaVA-X

SNUH

0.753 0.085 0.318 0.385 0.172 0.476 0.206

24

2025
MedGemma

Google

0.706 0.147 0.328 0.325 0.137 0.511 0.246

25

2025
UniRG-CXR

Microsoft Research

1.008 0.19 0.428 0.46 0.236 0.564 0.279

ReXrank Challenge V2.0 - Model Performance

Performance comparison of various vision-language models on medical VQA tasks.

Rank Model Overall Accuracy Differential Diagnosis Geometric Information Location Assessment Negation Assessment Presence Assessment
1 MedGemma-4B-it

Google

0.8217 0.7671 0.8045 0.8347 0.8503 0.8521
2 Janus-Pro-7B

DeepSeek

0.6656 0.5634 0.7542 0.6462 0.7573 0.6070
3 Qwen2.5VL-7B-Instruct

Qwen

0.6555 0.6361 0.6648 0.6324 0.8327 0.5114
4 Eagle2-9B

NVIDIA

0.6443 0.6817 0.5698 0.5695 0.8632 0.5375
5 Gemini-1.5-Pro

Google

0.6331 0.6221 0.4689 0.5960 0.8568 0.6217
6 Qwen2VL-7B-Instruct

Alibaba

0.5470 0.5265 0.4494 0.5405 0.6269 0.5915
7 Phi35-Vision-Instruct

Microsoft

0.4749 0.6224 0.2215 0.3711 0.7950 0.3644
8 LLaVA-1.5-7B

Meta

0.2661 0.2161 0.2346 0.2761 0.2402 0.3633