
|
|
|
1Qinghai University 2Tsinghua University |
Hepatic echinococcosis (HE) is a widespread parasitic disease in underdeveloped pastoral areas with limited medical resources. While CNN-based and Transformer-based models have been widely applied to medical image segmentation, CNNs lack global context modeling due to local receptive fields, and Transformers, though capable of capturing long-range dependencies, are computationally expensive. Recently, state space models (SSMs), such as Mamba, have gained attention for their ability to model long sequences with linear complexity. In this paper, we propose EAGLE, a U-shaped network that integrates CNNs, Transformers, and SSMs. We introduce the Convolutional Vision State Space Block (CVSSB) to fuse local and global features and employ the Haar Wavelet Transformation Block (HWTB) for lossless downsampling. EAGLE leverages the synergy between the proposed Progressive Visual State Space (PVSS) Encoder and Hybrid Visual State Space (HVSS) Decoder to achieve efficient and accurate segmentation of HE lesions. Due to the lack of publicly available HE datasets, we collected CT slices from 260 patients at a local hospital. Experimental results show that EAGLE achieves state-of-the-art performance with a Dice Similarity Coefficient (DSC) of 89.76%, surpassing MSVM-UNet by 1.61%.
We collected abdominal CT scans from 260 patients who were clinically diagnosed with hepatic echinococcosis (HE), including 130 cases of cystic echinococcosis (CE) and 130 cases of alveolar echinococcosis (AE) . Here are some representative samples.
Comparison of different methods on the HE dataset. The optimal values are highlighted in bold, while the second-best values are underlined. Specifically, we compared to the CNN-based UNet, EAGLE achieves a 4.54% improvement in DSC . When compared with the Transformer-based SwinUNet, EAGLE yields a 2.21% gain in DSC. Additionally, EAGLE outperforms the SSMs-based MSVM-UNet by 1.61% in DSC. Furthermore, EAGLE also achieves the highest scores in both Precision and Recall metrics.
Ablation study on different CVSSB stacking configurations. Configurations marked with CUDA Out of Memory could not be evaluated due to GPU memory limitations. We focused on enhancing stage 3 while keeping the overall parameter count within a manageable range. The final configuration, L = [2, 2, 4, 2], proved to be the optimal setting.
Ablation study of different modules. The optimal values are highlighted in bold. We use a Vanilla UNet as the baseline. Specifically, when all three modules are applied together, the DSC, Precision, and Recall are improved by 3.1%, 1.68%, and 2.24%, respectively.
The following demos illustrate a comparison between EAGLE and other existing segmentation models on our dataset. In the visualization, true positives (TP) are highlighted in green , false positives (FP) in blue, and false negatives (FN) in hotpink.
@article{chen2025eagle,
title={EAGLE: An Efficient Global Attention Lesion Segmentation Model for Hepatic Echinococcosis},
author={Chen, Jiayan and Li, Kai and Zhao, Yulu and Huang, Jianqiang and Wang, Zhan},
journal={arXiv preprint arXiv:2506.20333},
year={2025}
}