ReX-MLE

Autonomous Agent Benchmark for Medical Imaging Challenges

What is ReX-MLE?

ReX-MLE is a benchmark of 20 tasks drawn from high-impact medical imaging competitions spanning segmentation, detection, classification, image quality assessment, and generative enhancement. Each challenge pairs standardized data-preparation scripts and official evaluation code with end-to-end agent runs, enabling reproducible measurement of autonomous ML systems under realistic constraints.

Please refer to the Github repository for more details.

ReX-MLE Leaderboard

Rank Model ReX-MLE
1 R&D-Agent

Microsoft

12.15%
2 AIDE

Weco AI

9.05%
3 ML-Master

SJTU

4.53%

Performance Overview

Average performance across task categories (segmentation, detection, classification, and image quality/enhancement). Values correspond to the primary metric for each challenge (failures shown as zero).

Average performance by category

Example Overview

Browse example AIDE agent runs from the ISLES'22 task.


Leaderboard

Agent performance across the ReX-MLE suite with primary metric values and percentile ranks (Competition Rank) separated.

Challenge Metric (↑) AIDE ML-Master R&D-Agent Human
ValueRank ValueRank ValueRank Value
Segmentation Tasks
ISLES'22Dice0.040%0.000%0.020%0.79
NeurIPS-CellSegF10.040%0.040%0.360%0.88
PANTHER-T1Dice0.3316%0.138%0.168%0.73
PANTHER-T2Dice0.0910%0.058%0.2858%0.53
PUMA-T1-SegDiceFAIL--0.000%0.000%0.78
PUMA-T2-SegDice0.000%0.000%0.000%0.78
SEG.ADice0.020%0.020%0.000%0.92
TopBrain-CTAMean Dice0.032%0.263%0.082%0.79
TopBrain-MRAMean Dice0.0110%0.260%0.500%0.81
TopCoW-CTA-SegMean Dice0.090%0.253%0.492%0.87
TopCoW-MRA-SegMean Dice0.110%0.480%0.733%0.88
Detection Tasks
DENTEXAP0.090%0.080%0.090%0.40
PUMA-T1-DetF10.020%0.080%0.060%0.66
PUMA-T2-DetF1FAIL--0.000%0.010%0.27
TopCoW-CTA-DetIoU0.6738%0.6525%0.7056%0.79
TopCoW-MRA-DetIoU0.6614%0.6914%0.1914%0.85
Classification Tasks
TopCoW-CTA-ClsAccuracy0.3333%0.100%0.2850%0.73
TopCoW-MRA-ClsAccuracy0.3325%0.3325%0.090%0.89
Image Quality & Enhancement Tasks
LDCT-IQAScore2.6233%2.500%2.6650%2.74
USenhanceLNCC0.110%0.130%FAIL--0.91
Overall
Overall Mean Percentile----9.05%--4.53%--12.15%--

Challenge Descriptions