Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 28.
Published in final edited form as: Proc Int Conf Image Proc. 2023 Sep 11;2023:2750–2754. doi: 10.1109/icip49359.2023.10223163

ACCURATE REGISTRATION BETWEEN ULTRA-WIDE-FIELD AND NARROW ANGLE RETINA IMAGES WITH 3D EYEBALL SHAPE OPTIMIZATION

Junkang Zhang 1,*, Bo Wen 1,*, Fritz Gerald P Kalaw 2, Melina Cavichini 2, Dirk-Uwe G Bartsch 2, William R Freeman 2, Truong Q Nguyen 1, Cheolhong An 1
PMCID: PMC11211856  NIHMSID: NIHMS2002977  PMID: 38946915

Abstract

The Ultra-Wide-Field (UWF) retina images have attracted wide attentions in recent years in the study of retina. However, accurate registration between the UWF images and the other types of retina images could be challenging due to the distortion in the peripheral areas of an UWF image, which a 2D warping can not handle. In this paper, we propose a novel 3D distortion correction method which sets up a 3D projection model and optimizes a dense 3D retina mesh to correct the distortion in the UWF image. The corrected UWF image can then be accurately aligned to the target image using 2D alignment methods. The experimental results show that our proposed method outperforms the state-of-the-art method by 30%.

Index Terms—: Image registration, retina, distortion correction, 3D eyeball shape optimization, ultra-wide-field retina image

1. INTRODUCTION

Registration of multi-model retina images is an important tool for ophthalmologists to diagnose retina diseases. By providing a synthetic view of the retina (e.g. checkerboard view), ophthalmologists can observe the different appearances of the same anatomical structure in a single image. Registration between retina images with conventional Field of View (FoV) has been widely studied (e.g. Narrow Angle (NA) images) and deep learning based methods dominate the literature [1, 2, 3, 4, 5, 6, 7]. Mahapatra et al. [3] uses a GAN to predict the synthesis image after warping, but the training requires accurately aligned images as ground truth. Tian et al. [4] uses a CNN to estimate the optical flow between two images, but such a method is restricted to small pixel displacement. Arikan et al. [1] uses a segmentation network to extract vessel maps of the retina and detects keypoints on the two vessel maps. Then, RANSAC is used for outlier rejection and for predicting the transformation matrix. Nevertheless, RANSAC has limited performance and the estimated transformation matrix lacks precision. Zhang et al. [7] proposed to use an outlier rejection CNN to accurately select the best keypoint matchings. By combing with subsequent optical flow estimation and a two-step registration pipeline, this method achieves state-of-the-art performance.

However, NA images can only provide limited information of the retina because of their small FoV. As a result, the UWF retina images have become increasingly popular in recent years as they have much larger FoV and can provide much more information than NA images in a single fast shot. Ding et al. [2] proposed a pipeline to use 2D warping for UWF image registration. Nevertheless, the UWF image has huge distortion at the peripherals (Fig. 1) [8] due to the 3D structure of eyeball, which can not be handled by 2D warping, and results in deterioration in the registration quality. Previous studies had proposed methods to correct the distortion in the UWF image. [9, 10] proposed to use a 3D eyeball model to project an NA image onto an UWF image and correct the distortion in the UWF image by the optical flow between the UWF and the projected NA image. This method uses an ideal sphere as the eyeball model. However, the spherical model is not very accurate because the actual eyeball structure varies and can be more complex [11].

Fig. 1.

Fig. 1.

An example of UWF retina image showing peripheral distortion (out of the green mask) caused by the 3D eyeball structure compared with a typical NA image FoV (green mask) with less distortion (left) and the 3D projection model we propose (right).

To improve the distortion correction, we need a more accurate estimation of the eyeball shape. There have been several works on estimating the 3D eyeball structure: Cansizoglu et al. [12] estimates the eyeball shape based on several geometric assumptions and optimizes the parameters of these geometric models via bundle adjustment using multiple NA images taken from different camera poses. Matas et al. [13, 14] makes spherical or ellipsoidal assumptions of the eyeball and jointly optimizes the camera pose with the eyeball model parameters via particle swarm optimization. Chanwimaluang et al. [15] also adopts ellipsoid model and utilizes structure from motion techniques and a series of NA images to estimate the curvature of the retina surface. Nevertheless, those ideal geometric models can barely handle the irregular shape of the retina. We will show in our experiment that using an well-recognized geometric model of retina (spheroid) [11], and by parametric optimization, the improvement in distortion correction and the final alignment result is still quite small.

Therefore, instead of estimating the parameters of some geometric models, we propose a novel 3D distortion correction method with an optimization scheme that sets up a dense 3D mesh for the retina surface and jointly optimizes the 3D mesh and the camera pose in a self-supervised manner. Our proposed method considerably outperforms the state-of-the-art method. The corrected UWF images can be aligned with the NA images in very high precision using a pre-calculated 2D transformation matrix and the alignment error can barely be observed by human eyes.

2. 3D CORRECTION AND 2D ALIGNMENT

2.1. Prior Work to Estimate 2D Transformation Matrix

We first use the 3D distortion correction algorithm [9, 10] and 2D registration scheme [7] we proposed in the past to estimate the 2D-to-2D polynomial transformation matrix M and the initial pose for the NA (Correction) camera (Fig. 2). We first set up a 2D-to-2D registration pipeline: A segmentation network is used to extract the vessel maps of both UWF and NA images. Then, the vessel maps are fed into a SuperPoint [16] network for keypoint detection and description. Next, coarse matchings of keypoints are obtained by a bi-directional search based on minimum euclidean distance. We then use an outlier rejection network [17] to pick the best matchings and use weighted least square to estimate a polynomial transformation matrix. The transformation matrix is then used to create an optical flow map which is used by a Spatial Transformer Network [18] to warp the UWF image and align with the NA image. Next, in the 3D distortion correction algorithm, a 3D projection model (using ideal spherical model) is set up and a searching-based algorithm is used to search for the best NA camera pose which can correct the UWF image to achieve the best registration result based on the 2D registration pipeline. The searched NA camera pose and the 2D polynomial transformation matrix for each image pair are cached and will be used by the method we proposed in this paper.

Fig. 2.

Fig. 2.

Prior pipeline for estimating 2D transformation matrix and initial NA camera pose.

2.2. 3D-to-2D Scene Projection

We first set up an initial spherical eyeball model in the world coordinate. As shown in Fig. 1, the eyeball is centered at (0,0,0) and has radius 1. The UWF camera is set at the cornea center (0,0,−1) and the UWF image plane is set at z = f + (1), where f is the focal length of the UWF camera. In this work, we set f = 2, i.e., we assume that the UWF image plane is a tangent plane of the sphere and the image plane center intersects the sphere at (0,0,1). The pose of the NA (correction) camera is initialized by the searched parameters in Section 2.1. Then, we define an initial 3D mesh with vertices V = {vR3 | ||v||2 = 1}. The vertices on the 3D mesh are arranged in the shape [W/s + 1, W/s + 1, 3], where W is the 2D image pixel width and s is the sampling step of the mesh grid. Each three adjacent vertices form a triangular face, while the grouping of vertices varies from quadrant to quadrant, as shown in Fig. 3. A 2D mesh on the UWF image is also defined in the same way, and the 2D mesh has shape [W/s + 1, W/s + 1]. Each vertex [X,Y] on the 2D mesh corresponds to a vertex v on the 3D mesh by the stereo-graphic projection:

v=2fX,2fY,f2X2Y2f2+X2+Y2 (1)

Fig. 3.

Fig. 3.

An example of the mesh defined on a square UWF image (left) and an UWF image rendered on an optimized 3D mesh (right).

The UWF image is projected to 3D and rendered on the 3D mesh by the faces (Fig. 3). Then, the 3D image is re-projected based on the NA camera pose, which is defined by its position xnaR3 and orientation θnaR3. We also include an intrinsic parameter α for the NA camera to describe the range of projection. Finally, the re-projected UWF image can be written as:

Iuwf=projectIuwf,V+ΔV,xna,θna,α (2)

where ΔV is the movements of 3D vertices after the optimization and project (.) is the projection function which is implemented in PyTorch3D and is able to back-propagate.

2.3. Optimization Scheme

As shown in Fig. 4, the 3D-to-2D scene projection is first used to get a re-projected UWF image. Then, we use the 2D polynomial transformation matrix we cached in Section 2.1 to warp the re-projected image and align with the NA image. The warped UWF image is cropped to only keep the registration region (since UWF has much larger FoV than NA), and we calculate an MSE loss:

mse=meaniInaImskIuwfImsk22 (3)

where i refers to pixels in the images, Imsk is an image mask that determines the effective registration area and Iuwf is the corrected and warped UWF image. The loss is used to jointly optimize the 3D vertices and the NA camera parameters. The new vertices and NA parameters are used by the 3D-to-2D projection in the next iteration. We repeat this process until the improvement of alignment quality of the current iteration over the last iteration is smaller than a threshold.

Fig. 4.

Fig. 4.

Proposed optimization scheme.

2.4. Optimization with Soft Constraint

We first adopt a “hard” constraint which only allows the vertices to move along the UWF projection ray direction nu. An intuition here is that if we re-project by the UWF camera, the re-projected image should be the same as the original image. From the experiments, we found that it’s difficult to achieve good alignment results using the hard constraint. One possible explanation for this is that the projection ray is also subject to the distortion and refraction caused by the lens and the other eye tissues between the retina and the cornea. Therefore, we release the hard constraint and allow the vertices to move in arbitrary directions. We also adopt a smoothing loss to eliminate abrupt changes on the optimized mesh to better imitate an actual retina surface:

sm=meani,j,kΔVi,jΔVi+1,j2+meani,j,kΔVi,jΔVi,j+12 (4)

where i, j are the vertex indices in x, y direction, k ∈ {1, 2, 3} represents the three dimensions of the vertices. In addition, any vertex moving directions deviating too much from the ray projection direction are also undesirable since they will cause the optimized mesh to lose physical meaning. Therefore, we introduce a direction loss to penalize the inconsistency between the moving direction of Δv and the UWF projection ray direction nu:

dir=meanΔvΔVΔvΔv,nunu22 (5)

Finally, the total soft constraint loss is defined as:

soft=mse+λsmsm+λdirdir (6)

2.5. Implementation Details

A 3D mesh of size [251, 251, 3] is set up with 2D UWF image width W = 2000 and the sampling step s = 8. For optimization, we use the Adam optimizer and set learning rate for vertices to 1e-2 and set learning rate for NA camera parameters to 1e-4. The weights for each loss term in soft constraint are λsm = 1e4 and λdir = 1e3. During optimization, we set a break threshold of 1e-3 for the Dice score and a maximum iteration number of 100. For implementation details of the prior pipeline in Section 2.1, please refer to our previous publications [9, 10].

3. EXPERIMENT

3.1. Dataset and Evaluation Method

We experiment on the UWc dataset with 505 pairs of UWF and NA images, which were collected by the Jacobs Retina Center in UC San Diego. Each UWF image has size 2000 × 2000 while each NA Color Fundus image has size 768 × 768. We divide the dataset into two groups: training (in the prior pipeline) on group 1 and testing on group 2 is denoted as UWc1-2 and the opposite is denoted as UWc2-1. A Dice Score is computed on the aligned vessel map to numerically evaluate the final registration quality.

3.2. Results and Analysis

An example containing the peripheral area in an UWF image is shown in Fig. 5. The proposed 3D mesh optimization method with soft constraint successfully eliminates the alignment error (pink and mint circles) produced by the previous methods and achieves huge improvement in the Dice score (Table 1).

Fig. 5.

Fig. 5.

First row: Checkerboard view; Second row: Vessels overlay; (i) 2D-RP+DC-Sphere, Dice=0.5485; (ii) 3D Mesh Optim-hard, Dice=0.5514; (iii) 3D Mesh Optim-soft-SS=8, Dice=0.7525; (iv) 3D Mesh Optim-soft-SS=2, Dice=0.7859.

Table 1.

Average Dice values (standard deviation). 2D-DP and DC refer to the prior 2D Registration Pipeline and 3D Distortion Correction algorithm, as described in Section 2.1.

Methods for UWF Registration UWc1-2(255) UWc2-1(250)
Before Alignment 0.1649 0.1611
L.Ding et al. [2] 0.4162 0.3503
2D-RP [7] 0.4882 0.4380
2D-RP+DC-Sphere [10] 0.5674(0.1062) 0.5176(0.1298)
2D-RP+DC-Spheroid 0.5692(0.0941) 0.5267(0.1153)
3D Mesh Optim - hard 0.5840(0.1077) 0.5345(0.1328)
3D Mesh Optim - soft 0.7148(0.1204) 0.6612(0.1563)

3.3. Ablation Study on Mesh Size

We also find that by increasing the resolution of the 3D mesh, the alignment quality can be further improved (Table 2) at the cost of more processing time. According to our experiment, a sample step of 8 (251 × 251 mesh) achieves the finest balance between the registration quality and the time cost, as further increasing the mesh resolution won’t produce significant visual benefits.

Table 2.

Comparison between different 3D mesh sizes, SS refers to the sampling step in Section 2.2.

SS UWc1-2(255) UWc2-1(250)
16 0.6796(0.1232) 0.6231(0.1560)
8 0.7148(0.1204) 0.6612(0.1563)
4 0.7245(0.1209) 0.6718(0.1577)
2 0.7371(0.1213) 0.6843(0.1584)

4. CONCLUSION

We propose a novel 3D distortion correction method which adopts a 3D mesh to represent the retina surface and jointly optimizes the camera pose with the 3D mesh to accurately correct the distortion in ultra-wide-field retinal images. Experiment shows that the corrected ultra-wide-field image can be aligned with other NA retinal images at very high precision, and the alignment error is so small that human eyes can barely observe. The registration quality hugely outperforms the state-of-the-art method.

REFERENCES

  • [1].Arikan M, Sadeghipour A, Gerendas B, Told R, and Schmidt-Erfurt U, “Deep learning based multi-modal registration for retinal imaging,” in Interpretability of Machine Intelligence in Medical Image Computing and Multimodal Learning for Clinical Decision Support, 2019, pp. 75–82. [Google Scholar]
  • [2].Ding L, Kuriyan A, Ramchandran R, Wykoff C, and Sharma G, “Weakly supervised vessel detection in ultra-widefield fundus photography via iterative multi-modal registration and learning,” IEEE Transactions on Medical Imaging, vol. 40, no. 10, pp. 2748–2758, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Mahapatra D, Antony B, Sedai S, and Garnavi R, “Deformable medical image registration using generative adversarial networks,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI), 2018, pp. 1449–1453. [Google Scholar]
  • [4].Tian Y, Hu Y, Ma Y, Hao H, Mou L, Yang J, Zhao Y, and Liu J, “Multi-scale u-net with edge guidance for multimodal retinal image deformable registration,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), 2020, pp. 1360–1363. [DOI] [PubMed] [Google Scholar]
  • [5].Wang Y, Zhang J, An C, Cavichini M, Jhingan M, Amador-Patarroyo M, Long C, Bartsch D, Freeman W, and Nguyen T, “A segmentation based robust deep learning framework for multi-modal retinal image registration,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1369–1373. [Google Scholar]
  • [6].Zhang J, An C, Dai J, Amador M, Bartsch D, Borooah S, Freeman W, and Nguyen T, “Joint vessel segmentation and deformable registration on multi-modal retinal images based on style transfer,” in 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 839–843. [Google Scholar]
  • [7].Zhang J, Wang Y, Dai J, Cavichini M, Bartsch D-UG, Freeman WR, Nguyen TQ, and An C, “Two-step registration on multi-modal retinal images via deep neural networks,” IEEE Transactions on Image Processing, vol. 31, pp. 823–838, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Nicholson L, Vazquez-Alfageme C, Clemo M, Luo Y, Hykin P, Bainbridge J, and Sivaprasad S, “Quantifying retinal area in ultra-widefield imaging using a 3-dimensional printed eye model,” Ophthalmology Retina, vol. 2, no. 1, pp. 65–71, 2018. [DOI] [PubMed] [Google Scholar]
  • [9].Zhang J, Wang Y, Bartsch D, Freeman W, Nguyen T, and An C, “Perspective distortion correction for multi-modal registration between ultra-widefield and narrow-angle retinal images,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), 2021, pp. 4086–4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Zhang J, Wang Y, Kalaw F. Gerald P., Cavichini M, Bartsch DG, Freeman W, Nguyen T, and An C, “Multimodal global registration between ultra-widefield and narrow angle retinal images via distortion correction network,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (submitted), 2023. [Google Scholar]
  • [11].Verkicharla P, Mathur A, Mallen E, Pope J, and Atchison D, “Eye shape and retinal shape, and their relation to peripheral refraction,” Ophthalmic and Physiological Optics, vol. 32, no. 3, pp. 184–199, 2012. [DOI] [PubMed] [Google Scholar]
  • [12].Cansizoglu E, Taguchi Y, Cramer J, Chiang MF, and Erdogmus D, “Analysis of shape assumptions in 3d reconstruction of retina from multiple fundus images,” in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), 2015, pp. 1502–1505. [Google Scholar]
  • [13].Hernandez-Matas C, Zabulis X, Triantafyllou A, Anyfanti P, and Argyros A, “Retinal image registration under the assumption of a spherical eye,” Computerized Medical Imaging and Graphics, vol. 55, pp. 95–105, 2017. [DOI] [PubMed] [Google Scholar]
  • [14].Hernandez-Matas C, Zabulis X, and Argyros A, “Rempe: Registration of retinal images through eye modelling and pose estimation,” IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 12, pp. 3362–3373, 2020. [DOI] [PubMed] [Google Scholar]
  • [15].Chanwimaluang T and Fan G, “Constrained optimization for retinal curvature estimation using an affine camera,” in 2007 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8. [Google Scholar]
  • [16].DeTone D, Malisiewicz T, and Rabinovich A, “Superpoint: Self-supervised interest point detection and description,” in 2018 IEEE/CVF conference on computer vision and pattern recognition workshops, 2018, pp. 224–236. [Google Scholar]
  • [17].Yi K, Trulls E, Ono Y, Lepetit V, Salzmann M, and Fua P, “Learning to find good correspondences,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2666–2674. [Google Scholar]
  • [18].Jaderberg M, Simonyan K, and Zisserman A, “Spatial transformer networks,” in 2015 Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2017–2025. [Google Scholar]

RESOURCES