Three-dimensional (3D) deformable image registration plays a critical role in 3D medical image processing. This technique aligns images from different time points, modalities, or individuals in 3D space, enabling the comparison and fusion of anatomical or functional information. To simultaneously capture the local details of anatomical structures and the long-range dependencies in 3D medical images, while reducing the high costs of manual annotations, this paper proposes an unsupervised 3D medical image registration method based on shifted window Transformer and convolutional neural network (CNN), termed Swin Transformer-CNN-hybrid network (STCHnet). In the encoder part, STCHnet uses Swin Transformer and CNN to extract global and local features from 3D images, respectively, and optimizes feature representation through feature fusion. In the decoder part, STCHnet utilizes Swin Transformer to integrate information globally, and CNN to refine local details, reducing the complexity of the deformation field while maintaining registration accuracy. Experiments on the information extraction from images (IXI) and open access series of imaging studies (OASIS) datasets, along with qualitative and quantitative comparisons with existing registration methods, demonstrate that the proposed STCHnet outperforms baseline methods in terms of Dice similarity coefficient (DSC) and standard deviation of the log-Jacobian determinant (SDlogJ), achieving improved 3D medical image registration performance under unsupervised conditions.