publications | Moucheng Xu

2025

4D VQ-GAN: A World Model for Synthesising Medical Scans at Any Time Point for Personalised Disease Progression Modelling of Idiopathic Pulmonary Fibrosis

An Zhao^*, Moucheng Xu^*, Ahmed H. Shahin, and 4 more authors

International Conference on Medical Imaging with Deep Learning (MIDL) 2025, Feb 2025

Abs PDF Poster

Understanding the progression trajectories of diseases is crucial for early diagnosis and effective treatment planning. This is especially vital for life-threatening conditions such as Idiopathic Pulmonary Fibrosis (IPF), a chronic, progressive lung disease with a prognosis comparable to many cancers. Computed tomography (CT) imaging has been established as a reliable diagnostic tool for IPF. Accurately predicting future CT scans of early-stage IPF patients can aid in developing better treatment strategies, thereby improving survival outcomes. As inspired by the recent success of world models in generating video-based virtual physical worlds, we present the first world model for IPF, to synthesize realistic scans of early-stage IPF patients at any time point. We term our model 4D Vector Quantised Generative Adversarial Networks (4D-VQ-GAN). Our model is trained using a two-stage approach. In the first stage, a 3D-VQ-GAN is trained to reconstruct CT volumes. In the second stage, a Neural Ordinary Differential Equation (ODE) model is trained to capture the temporal dynamics of the quantised embeddings, which are generated by the encoder trained in the first stage. For clinical validation, we conduct survival analysis using imaging biomarkers derived from generated CT scans and achieve a C-index either better than or comparable to that of biomarkers derived from the real CT scans. The survival analysis results suggest the potential clinical utility inherent to generated longitudinal CT scans, showing that they can reliably predict survival outcomes.

2024

MRI Parameter Mapping via Gaussian Mixture VAE: Breaking the Assumption of Independent Pixels

Moucheng Xu, Yukun Zhou, Tobias Goodwin-Allcock, and 4 more authors

NeurIPS Workshop on Machine Learning and the Physical Sciences 2024, Dec 2024

Abs PDF Code

We introduce and demonstrate a new paradigm for quantitative parameter mapping in MRI. Parameter mapping techniques, such as diffusion MRI and quantitative MRI, have the potential to robustly and repeatably measure biologically-relevant tissue maps that strongly relate to underlying microstructure. Quantitative maps are calculated by fitting a model to multiple images, e.g. with least-squares or machine learning. However, the overwhelming majority of model fitting techniques assume that each voxel is independent, ignoring any co-dependencies in the data. This makes model fitting sensitive to voxelwise measurement noise, hampering reliability and repeatability. We propose a self-supervised deep variational approach that breaks the assumption of independent pixels, leveraging redundancies in the data to effectively perform data-driven regularisation of quantitative maps. We demonstrate that our approach outperforms current model fitting techniques in dMRI simulations and real data, especially with a Gaussian mixture prior. Our approach enables improved quantitative maps and/or reduced acquisition times, and can hence support the clinical adoption of parameter mapping methods such as dMRI and qMRI.
In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding

Moucheng Xu, Evangelos Chatzaroulas, Luc McCutcheon, and 4 more authors

NeurIPS Workshop on Video-Language Models 2024, Dec 2024

Abs PDF Code

A Standard Operating Procedure (SOP) defines a step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose In-Context Ensemble Learning, to aggregate pseudo labels of SOPs. The proposed in-context ensemble learning increases test-time compute and enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOPs, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.
Expectation maximisation pseudo labels

Moucheng Xu, Yukun Zhou, Chen Jin, and 5 more authors

Medical Image Analysis (Impact Factor: 10.7), Feb 2024

Abs DOI

In this paper, we study pseudo-labelling. Pseudo-labelling employs raw inferences on unlabelled data as pseudo-labels for self-training. We elucidate the empirical successes of pseudo-labelling by establishing a link between this technique and the Expectation Maximisation algorithm. Through this, we realise that the original pseudo-labelling serves as an empirical estimation of its more comprehensive underlying formulation. Following this insight, we present a full generalisation of pseudo-labels under Bayes’ theorem, termed Bayesian Pseudo Labels. Subsequently, we introduce a variational approach to generate these Bayesian Pseudo Labels, involving the learning of a threshold to automatically select high-quality pseudo labels. In the remainder of the paper, we showcase the applications of pseudo-labelling and its generalised form, Bayesian Pseudo-Labelling, in the semi-supervised segmentation of medical images. Specifically, we focus on: (1) 3D binary segmentation of lung vessels from CT volumes; (2) 2D multi-class segmentation of brain tumours from MRI volumes; (3) 3D binary segmentation of whole brain tumours from MRI volumes; and (4) 3D binary segmentation of prostate from MRI volumes. We further demonstrate that pseudo-labels can enhance the robustness of the learned representations. The code is released in the following GitHub repository: https://github.com/moucheng2017/EMSSL.
CF-Loss: Clinically-relevant feature optimised loss function for retinal multi-class vessel segmentation and vascular feature measurement

Yukun Zhou, MouCheng Xu, Yipeng Hu, and 5 more authors

Medical Image Analysis (Impact Factor: 10.7), Mar 2024

Abs DOI

Characterising clinically-relevant vascular features, such as vessel density and fractal dimension, can benefit biomarker discovery and disease diagnosis for both ophthalmic and systemic diseases. In this work, we explicitly encode vascular features into an end-to-end loss function for multi-class vessel segmentation, categorising pixels into artery, vein, uncertain pixels, and background. This clinically-relevant feature optimised loss function (CF-Loss) regulates networks to segment accurate multi-class vessel maps that produce precise vascular features. Our experiments first verify that CF-Loss significantly improves both multi-class vessel segmentation and vascular feature estimation, with two standard segmentation networks, on three publicly available datasets. We reveal that pixel-based segmentation performance is not always positively correlated with accuracy of vascular features, thus highlighting the importance of optimising vascular features directly via CF-Loss. Finally, we show that improved vascular features from CF-Loss, as biomarkers, can yield quantitative improvements in the prediction of ischaemic stroke, a real-world clinical downstream task. The code is available at https://github.com/rmaphoh/feature-loss.

2023

A Foundation Model for Generalizable Disease Detection from Retinal Images

Yukun Zhou, Chia Mark A., Siegfried K. Wagner, and 14 more authors

🍃Nature main journal, Sep 2023

Abs PDF Code

Medical artificial intelligence (AI) offers great potential for recognizing signs of health conditions in retinal images and expediting the diagnosis of eye diseases and systemic disorders1. However, the development of AI models requires substantial annotation and models are usually task-specific with limited generalizability to different clinical applications2. Here, we present RETFound, a foundation model for retinal images that learns generalizable representations from unlabelled retinal images and provides a basis for label-efficient model adaptation in several applications. Specifically, RETFound is trained on 1.6 million unlabelled retinal images by means of self-supervised learning and then adapted to disease detection tasks with explicit labels. We show that adapted RETFound consistently outperforms several comparison models in the diagnosis and prognosis of sight-threatening eye diseases, as well as incident prediction of complex systemic disorders such as heart failure and myocardial infarction with fewer labelled data. RETFound provides a generalizable solution to improve model performance and alleviate the annotation workload of experts to enable broad clinical AI applications from retinal imaging.
MisMatch: Calibrated Segmentation via Consistency on Differential Morphological Feature Perturbations with Limited Labels

Moucheng Xu, Yukun Zhou, Chen Jin, and 4 more authors

IEEE Transactions on Medical Imaging (Impact Factor: 11), May 2023

DOI PDF Code

2022

🏆MICCAI Best Paper Finalist (top 0.8%, 15/1825)

Bayesian Pseudo Labels: Expectation Maximization for Robust and Efficient Semi-Supervised Segmentation

Moucheng Xu, Yukun Zhou, Chen Jin, and 5 more authors

International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2022, Sep 2022

Abs PDF Code

This paper concerns pseudo labelling in segmentation. Our contribution is fourfold. Firstly, we present a new formulation of pseudo-labelling as an Expectation-Maximization (EM) algorithm for clear statistical interpretation. Secondly, we propose a semi-supervised medical image segmentation method purely based on the original pseudo labelling, namely SegPL. We demonstrate SegPL is a competitive approach against state-of-the-art consistency regularisation based methods on semi-supervised segmentation on a 2D multi-class MRI brain tumour segmentation task and a 3D binary CT lung vessel segmentation task. The simplicity of SegPL allows less computational cost comparing to prior methods. Thirdly, we demonstrate that the effectiveness of SegPL may originate from its robustness against out-of-distribution noises and adversarial attacks. Lastly, under the EM framework, we introduce a probabilistic generalisation of SegPL via variational inference, which learns a dynamic threshold for pseudo labelling during the training. We show that SegPL with variational inference can perform uncertainty estimation on par with the gold-standard method Deep Ensemble.
Learning Morphological Feature Perturbations for Calibrated Semi-Supervised Segmentation

Moucheng Xu, Yukun Zhou, Chen Jin, and 6 more authors

International Conference on Medical Imaging with Deep Learning (MIDL) 2022, Jul 2022

Abs PDF Code

We propose MisMatch, a novel consistency-driven semi-supervised segmentation framework which produces predictions that are invariant to learnt feature perturbations. MisMatch consists of an encoder and a two-head decoders. One decoder learns positive attention to the foreground regions of interest (RoI) on unlabelled images thereby generating dilated features. The other decoder learns negative attention to the foreground on the same unlabelled images thereby generating eroded features. We then apply a consistency regularisation on the paired predictions. MisMatch outperforms state-of-the-art semi-supervised methods on a CT-based pulmonary vessel segmentation task and a MRI-based brain tumour segmentation task. In addition, we show that the effectiveness of MisMatch comes from better model calibration than its supervised learning counterpart.
Airway Measurement by Refinement of Synthetic Images Improves Mortality Prediction in Idiopathic Pulmonary Fibrosis

Ashkan Pakzad, Moucheng Xu, Wing Keung Cheung, and 7 more authors

MICCAI Workshop on Deep Generative Models 2022, Oct 2022

Abs PDF

Several chronic lung diseases, like idiopathic pulmonary fibrosis (IPF) are characterised by abnormal dilatation of the airways. Quantification of airway features on computed tomography (CT) can help characterise disease severity and progression. Physics based airway measurement algorithms that have been developed have met with limited success, in part due to the sheer diversity of airway morphology seen in clinical practice. Supervised learning methods are not feasible due to the high cost of obtaining precise airway annotations. We propose synthesising airways by style transfer using perceptual losses to train our model: Airway Transfer Network (ATN). We compare our ATN model with a state-of-the-art GAN-based network (simGAN) using a) qualitative assessment; b) assessment of the ability of ATN and simGAN based CT airway metrics to predict mortality in a population of 113 patients with IPF. ATN was shown to be quicker and easier to train than simGAN. ATN-based airway measurements showed consistently stronger associations with mortality than simGAN-derived airway metrics on IPF CTs. Airway synthesis by a transformation network that refines synthetic data using perceptual losses is a realistic alternative to GAN-based methods for clinical CT analyses of idiopathic pulmonary fibrosis. Our source code can be found at https://github.com/ashkanpakzad/ATNthat is compatible with the existing open-source airway analysis framework, AirQuant.

2021

Learning to Address Intra-segment Misclassification in Retinal Imaging

Yukun Zhou, Moucheng Xu, Yipeng Hu, and 4 more authors

International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021, Sep 2021

Abs PDF

Accurate multi-class segmentation is a long-standing challenge in medical imaging, especially in scenarios where classes share strong similarity. Segmenting retinal blood vessels in retinal photographs is one such scenario, in which arteries and veins need to be identified and differentiated from each other and from the background. Intra-segment misclassification, i.e. veins classified as arteries or vice versa, frequently occurs when arteries and veins intersect, whereas in binary retinal vessel segmentation, error rates are much lower. We thus propose a new approach that decomposes multi-class segmentation into multiple binary, followed by a binary-to-multi-class fusion network. The network merges representations of artery, vein, and multi-class feature maps, each of which are supervised by expert vessel annotation in adversarial training. A skip-connection based merging process explicitly maintains class-specific gradients to avoid gradient vanishing in deep layers, to favor the discriminative features. The results show that, our model respectively improves F1-score by 4.4%, 5.1%, and 4.2% compared with three state-of-the-art deep learning based methods on DRIVE-AV, LES-AV, and HRF-AV data sets. Code: https://github.com/rmaphoh/Learning-AVSegmentation

2020

Disentangling Human Error from Ground Truth in Segmentation of Medical Images

Le Zhang, Ryutaro Tanno, Moucheng Xu, and 5 more authors

Conference on Neural Information Processing Systems (NeurIPS) 2020, Dec 2020

PDF Code
Foveation for Segmentation of Mega-Pixel Histology Images

Chen Jin, Ryutaro Tanno, Moucheng Xu, and 2 more authors

International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2020, Sep 2020

Abs PDF

Segmenting histology images is challenging because of the sheer size of the images with millions or even billions of pixels. Typical solutions pre-process each histology image by dividing it into patches of fixed size and/or down-sampling to meet memory constraints. Such operations incur information loss in the field-of-view (FoV) (i.e., spatial coverage) and the image resolution. The impact on segmentation performance is, however, as yet understudied. In this work, we first show under typical memory constraints (e.g., 10G GPU memory) that the trade-off between FoV and resolution considerably affects segmentation performance on histology images, and its influence also varies spatially according to local patterns in different areas (see Fig. 1). Based on this insight, we then introduce foveation module, a learnable “dataloader” which, for a given histology image, adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model at each spatial location (Fig. 1). The foveation module is jointly trained with the segmentation network to maximise the task performance. We demonstrate, on the Gleason2019 challenge dataset for histopathology segmentation, that the foveation module improves segmentation performance over the cases trained with patches of fixed FoV/resolution trade-off. Moreover, our model achieves better segmentation accuracy for the two most clinically important and ambiguous classes (Gleason Grade 3 and 4) than the top performers in the challenge by 13.1% and 7.5%, and improves on the average performance of 6 human experts by 6.5% and 7.5%.