An unsupervised three-dimensional medical image registration method based on shifted window Transformer and convolutional neural network_Journal of Biomedical Engineering

Authors：

 LI Yang , RUAN Chenmiao , RUAN Dongsheng

School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Sci-Tech University, Hangzhou 310018, P. R. China;

Corresponding?author：

LI Yang, Email: yangli@zstu.edu.cn

Keywords：

Medical image registration; Deep learning; Convolutional neural network; Shifted window Transformer

DOI：

10.7507/1001-5515.202502039

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Three-dimensional (3D) deformable image registration plays a critical role in 3D medical image processing. This technique aligns images from different time points, modalities, or individuals in 3D space, enabling the comparison and fusion of anatomical or functional information. To simultaneously capture the local details of anatomical structures and the long-range dependencies in 3D medical images, while reducing the high costs of manual annotations, this paper proposes an unsupervised 3D medical image registration method based on shifted window Transformer and convolutional neural network (CNN), termed Swin Transformer-CNN-hybrid network (STCHnet). In the encoder part, STCHnet uses Swin Transformer and CNN to extract global and local features from 3D images, respectively, and optimizes feature representation through feature fusion. In the decoder part, STCHnet utilizes Swin Transformer to integrate information globally, and CNN to refine local details, reducing the complexity of the deformation field while maintaining registration accuracy. Experiments on the information extraction from images (IXI) and open access series of imaging studies (OASIS) datasets, along with qualitative and quantitative comparisons with existing registration methods, demonstrate that the proposed STCHnet outperforms baseline methods in terms of Dice similarity coefficient (DSC) and standard deviation of the log-Jacobian determinant (SDlogJ), achieving improved 3D medical image registration performance under unsupervised conditions.

Citation： LI Yang, RUAN Chenmiao, RUAN Dongsheng. An unsupervised three-dimensional medical image registration method based on shifted window Transformer and convolutional neural network. Journal of Biomedical Engineering, 2025, 42(6): 1226-1234. doi: 10.7507/1001-5515.202502039 Copy

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.	Chen J, He Y, Frey E C, et al. Vit-V-Net: vision transformer for unsupervised volumetric medical image registration. arXiv preprint, 2021, arXiv: 2104.06468.
15.
16.	Chen J, Liu Y, Wei S, et al. A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond. Med Image Anal, 2024: 103385.
17.	Luo W, Li Y, Urtasun R, et al. Understanding the effective receptive field in deep convolutional neural networks//Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), Red Hook: Curran Associates Inc, 2016: 4905-4913.
18.	Zeiler M D, Fergus R. Visualizing and understanding convolutional networks//Computer Vision-ECCV 2014: 13th European Conference, Zurich: Springer International Publishing, 2014: 818-833.
19.	Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint, 2015, arXiv: 1511.07122.
20.	Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017: 2881-2890.
21.	Richter M L, Pal C. Receptive field refinement for convolutional neural networks reliably improves predictive performance. arXiv preprint, 2022, arXiv: 2211.14487.
22.	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Red Hook: Curran Associates Inc, 2017: 6000-6010.
23.	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint, 2020, arXiv: 2010.11929.
24.	Li S, Sui X, Luo X, et al. Medical image segmentation using squeeze-and-expansion transformers. arXiv preprint, 2021, arXiv: 2105.09511.
25.	Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows//Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021: 10012-10022.
26.
27.
28.	University College London. Information extraction from images dataset. (2022-01-29) [2023-10-25]. https://brain-development.org/ixi-dataset.
29.
30.
31.
32.
33.
34.	Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions//Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021: 568-578.
35.
36.	Li Z, Li X, Fan J, et al. Non-iterative pyramid network for unsupervised deformable medical image registration//Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul: IEEE, 2024: 1956-1960.

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14. Chen J, He Y, Frey E C, et al. Vit-V-Net: vision transformer for unsupervised volumetric medical image registration. arXiv preprint, 2021, arXiv: 2104.06468.
15.
16. Chen J, Liu Y, Wei S, et al. A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond. Med Image Anal, 2024: 103385.
17. Luo W, Li Y, Urtasun R, et al. Understanding the effective receptive field in deep convolutional neural networks//Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), Red Hook: Curran Associates Inc, 2016: 4905-4913.
18. Zeiler M D, Fergus R. Visualizing and understanding convolutional networks//Computer Vision-ECCV 2014: 13th European Conference, Zurich: Springer International Publishing, 2014: 818-833.
19. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint, 2015, arXiv: 1511.07122.
20. Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017: 2881-2890.
21. Richter M L, Pal C. Receptive field refinement for convolutional neural networks reliably improves predictive performance. arXiv preprint, 2022, arXiv: 2211.14487.
22. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Red Hook: Curran Associates Inc, 2017: 6000-6010.
23. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint, 2020, arXiv: 2010.11929.
24. Li S, Sui X, Luo X, et al. Medical image segmentation using squeeze-and-expansion transformers. arXiv preprint, 2021, arXiv: 2105.09511.
25. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows//Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021: 10012-10022.
26.
27.
28. University College London. Information extraction from images dataset. (2022-01-29) [2023-10-25]. https://brain-development.org/ixi-dataset.
29.
30.
31.
32.
33.
34. Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions//Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021: 568-578.
35.
36. Li Z, Li X, Fan J, et al. Non-iterative pyramid network for unsupervised deformable medical image registration//Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul: IEEE, 2024: 1956-1960.

Previous Article
Shape-aware cross-modal domain adaptive segmentation model
Next Article
Application of an interpretable neural network framework based on the LASSO-proj algorithm for warfarin dose prediction

Journal of Biomedical Engineering

An unsupervised three-dimensional medical image registration method based on shifted window Transformer and convolutional neural network

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content