1. Introduction
Quickly and accurately acquiring a 3D point cloud of an object’s surface is important in numerous fields, such as quality control, robotic assembly, medical treatment, virtual reality, and reverse engineering [
1,
2,
3]. With the advantages of non-contact, high speed, and high accuracy, fringe projection profilometry (FPP) has become one of the most promising 3D imaging techniques. In conventional FPP system, a set of 8-bit sinusoidal patterns will be projected onto the object surface. As an 8-bit gray pattern has limited projection rate (within 120 Hz), the measurement speed of FPP system is thus restricted [
4].
By applying 1-bit binary patterns which have much higher projection rates (up to 20 kHz), the binary defocusing technique can greatly improve 3D imaging speed [
5,
6,
7]. The squared binary defocusing method (SBM) is the simplest binarization strategy, which utilizes the binary patterns with the shape of square wave to create sinusoidal fringes [
8]. Conventional binary defocusing techniques require a proper defocusing degree to achieve ideal sinusoidal fringes, otherwise significant measurement errors may arise. It thus is sensitive to the defocusing degree and has a small DoF. Various advanced binarization strategies have been proposed to enhance DoF, such as the sinusoidal pulse width modulation (SPWM) [
9], optimal pulse width modulation (OPWM) [
10], and the dithering method [
11]. As these methods still require a proper defocusing degree for generating sinusoidal fringes, their enhancements in DoF are rather limited.
Many methods were introduced to minimize the measurement errors which are caused by using improper defocusing degrees. After projecting the 8-bit gray patterns and binary patterns to a white board, respectively, Xu et al. obtained the phase error distribution in a large depth range and then built a mathematical model to eliminate the phase error at arbitrary depth ranges [
12]. Hu et al. utilized the depth-discrete Fourier series fitting to reduce the complexity of the phase error model [
13]. In Zhu’s model, more influence factors were taken into account (including defocusing level, intensity noise, and fringe frequency), and the optimal fringe frequency of the binary error-diffusion fringe pattern can be selected [
5]. Yu et al. achieved an accurate 3D reconstruction in a large DoF by directly transforming the captured patterns into the desired phase with deep learning models [
14]. Although these methods may work well in error compensation, however, it is a tedious process to collect accurate phase errors in a large DoF, and expensive equipment is also required.
In this paper, a time-domain Gaussian fitting method is proposed to suppress sensitivity of defocusing degree. Different from the phase-shifting algorithm, projector coordinates can be achieved by projecting Gaussian fringes and determining the peak positions of time-domain Gaussian curves. The neural network technique is applied to rapidly compute peak positions of time-domain Gaussian curves. Finally, by generating Gaussian fringes with defocused binary patterns, the time-domain Gaussian fitting method can be combined with the binary defocusing technique. The high projection rate can be then applied in FPP with much lower sensitivity of defocusing degree, which helps to achieve fast three-dimensional profilometry with a large DoF.
The centerline extraction technique is also adopt in line structured light [
15] and multi-line shift profilometry [
16], and lots of algorithms have been proposed, such as the Steger algorithm [
17] and Skeleton extraction method [
18]. However, these algorithms work with the spatial distribution of Gaussian fringes, which are modulated by the object’s surface and may deform accordingly. This will cause difficulty in obtaining accurate measurements of complex surfaces. Comparatively, as the time-domain Gaussian curve is extracted from an individual image pixel, its shape will not deform according to the complex surface. This is beneficial to acquire accurate measuring results of complex surfaces.
Compared with the traditional strategy, which imitates sinusoidal stripes with a proper defocusing degree, Gaussian stripes can be easily generated with a simple binary pattern. Different from extracting phase information with sinusoidal stripes, the peak positions of Gaussian stripes are the key information for 3D scanning. Although the varied defocusing degree may lead to a variation in the blur radius of Gaussian stripes, the peak positions of Gaussian fringes, however, will keep fixed. The varied defocusing degree thus may have much less of an impact on the proposed method. Although the neural network technique can be use to reduce the computing time, the calculation process of extracting peak positions of time-domain Gaussian curves is indeed more complex than that of the calculating phase with sinusoidal stripes.
The rest of this paper is organized as follows. The principle of time-domain Gaussian fitting method is explained in
Section 2. The neural network-based rapid calculation approach is stated in detail in
Section 3. Sensitivities to defocusing degree and complex surface are analyzed, respectively, in
Section 4. The performance of the proposed method is verified in
Section 5, and its characters are summarized in
Section 6.
3. Rapid Calculation Method
Since the calculation process of the Levenberg–Marquardt algorithm involves iterative optimization, it may yield accurate peak positions, as well as causes low computational efficiency. To address this issue, a neural network-based approach is proposed to rapidly extract peak positions of time–domain Gaussian curves. The basic principle of this neural network-based approach is shown in
Figure 2.
The proposed neural network consists of an input layer, an output layer, and a hidden layer. The intensity sequence, Ii (x, y) (), is taken as the input of the neural network. The number of neurons in input layer is n, and the output of this layer is (α1, α2, ···, αn). The hidden layer contains q neurons, and yields the result (β1, β2, ···, βq). The output layer finally exports the peak position vp of the time–domain Gaussian curve. The weight matrix from the input layer to the hidden layer is Wh, and Wo represents the weight matrix from the hidden layer to the output layer.
Actually, most time–domain Gaussian curves are the sampling results of two adjacent Gaussian fringes. They cause the cyclic shift in the time–domain Gaussian curves, as shown in
Figure 3. For this reason, while the time–domain Gaussian curve shifts continuously, the values of peak positions, however, have mutations in the edge region. This discontinuous correspondence would lead to a difficultly in computing accurate peak positions with the neural network.
Therefore, before taking it to be the input data of the neural network, the time–domain Gaussian curve should be preprocessed with additional circular shifting (see
Figure 3). The shifting distance
ds (
x,
y) can be approximately estimated by subtracting the position of the maximum value of the time–domain Gaussian curve
vmax (
x,
y) from the middle position,
.
With additional circular shifting, the peak position of the time–domain Gaussian curve will be changed to the middle area (see
Figure 3). The discontinuous correspondence in the edge region can be avoided. The practical process of computing peak positions with the neural network is shown in
Figure 4. Since the neural network merely yields the peak positions of circularly shifted Gaussian curves
, the actual peak positions,
vp, can be achieved by adding the shifting distance,
ds (
).
In order to determine the parameters of the neural network, the training data can be obtained using the Levenberg–Marquardt algorithm. While applying this algorithm, initial values may significantly influence computing efficiency. It is recommended that the minimum value vmin, the maximum value vmax, and the middle position vmid of the circularly shifted time–domain Gaussian curve can be applied as the initial values of λ, η, and vp in Equation (1).
5. Experiments
Experiments have been carried out to verify the performance of our proposed method. A homemade FPP system, which consists of a DLP projector (LightCrafter 4500, Wintech, Beijing, China) and a CCD camera (MER-050-560U3M, Daheng, Beijng, China) with 8 mm lens (Computar, M0814-MP2, CBC Corporation, Tokyo, Japan), is applied to implement experiments. The captured images are processed using the MATLAB software (2012a). Two plaster statues (with the height of about 150 mm) and several planar targets are taken as the experimental subjects. The complementary gray-code unwrapping method [
20] is applied in this paper to achieve the absolute projector column coordinates. And the calibrated third-order polynomial model [
21] is used then to convert the absolute projector column coordinates into the height values.
In the first experiment, the performance of the proposed method is tested with the minimum shifting step (
n = 4) and the minimum distance of shifting step (one column in projector plane, d
v = 1). The distance between the two adjacent lines is four columns in the projector plane. The projector coordinates are computed with the Levenberg–Marquardt algorithm and Equation (1). During the experiment, four multi-line patterns are sequentially projected onto a plaster statue, and fringe images are captured simultaneously (see
Figure 8a–d). It can be seen from the 3D reconstruction result (
Figure 8e) that the proposed method can acquire a crowded and smooth point cloud of a complex surface, which proves that this method is suitable for measuring complex surfaces.
Although accurate projector coordinates can be achieved by using the Levenberg–Marquardt algorithm, it has a low calculation efficiency. In this experiment, 587 s are required to compute projector coordinates. The low calculation efficiency may result in the inability to make timely use of rapidly acquired 3D point cloud data.
By contrast, a neural network can rapidly yield projector coordinates. In our work, the numbers of neurons in the input layer and hidden layer are four (
n = 4) and six (
q = 6), respectively. The activation function of
Tansig is applied in the input layer, output layer, and hidden layer. The plaster statues are placed in different depths, and are sequentially illuminated by four multi-line patterns with a larger distance of shifting step (two columns in the projector plane, d
v = 2). The distance between the two adjacent lines is eight columns in the projector plane. The input part (time–domain Gaussian curves) of training data can be extracted from the simultaneously captured fringe images (see
Figure 9a). The results of the Levenberg–Marquardt algorithm are computed with eight multi-line patterns (the shifting steps are eight in number and the distance of a shifting step is one column in the projector) and set as the output part of the training data, as shown in
Figure 9b.
It is demonstrated from
Figure 9d,e,g that, while training data are preprocessed without circular shift, the trained neural network tends to smooth the mutation of peak positions in edge region and thus yields inaccurate results. Comparatively, these inaccurate peak positions in the edge region can be effectively avoided by adding circular shift in the preprocessing procedures (see
Figure 9c).
As shown in
Figure 9h, when the step distance becomes larger (two columns in the projector plane), the periodic error also can be found in the result of the Levenberg–Marquardt algorithm (with four shifting steps). In comparison, the periodic error can be greatly reduced in the result of the trained neural network (
Figure 9f). Most importantly, by using the neural network technique, the computing time can be decreased significantly (from 587s to 11 ms), which may meet the requirements for real-time measurement or detection.
Finally, the sensitivity to defocusing degree is tested, with several planar targets which are evenly placed from 0 mm to 750 mm (the interval is about 150 mm), as shown in
Figure 10. For comparison, the sinusoidal patterns and the imitated sinusoidal patterns, which are generated using SBM technique and dithering technique, respectively, are applied in this experiment. The identical shifting step (four shifting steps) is applied, and the same fringe interval is used to generate a sinusoidal pattern, imitated sinusoidal pattern (SBM), and multi-line pattern in our proposed method (eight columns in the projector plane). A bigger fringe interval (16 columns in the projector plane) is used in the dithering technique. Due to the large depth between the planar targets (750 mm), the blur radius of the Gaussian fringes is also remarkably varied, from 1.61 (
σ1 = 1.61) to 1.02 (
σ6 = 1.02) (see
Figure 10d).
The 3D reconstruction result of phase-shifting algorithm with 16 shifting steps is achieved and taken as the reference to calculate the 3D reconstruction errors of the different methods. The mean absolute errors (the average absolute value of the 3D reconstruction errors) are computed for comparison between the 3D reconstruction errors. With the varied defocusing degrees, the 3D reconstruction error of the phase-shifting algorithm with a sinusoidal pattern stays at a low level (
Figure 11a,e). In comparison, the periodic errors in the reconstructed results of the SBM technique and the dithering technique increase rapidly (
Figure 11b,c,e). It should be noted that the much larger error in the dithering technique just means that a greater defocusing degree is required by this technique. Comparatively, the proposed method shows much lower sensitivity to the varied defocusing degree (as shown in
Figure 11d,e), and the periodic error can be suppressed without sacrificing acquisition speed. It is obvious that, with the same shifting steps, our proposed method has much greater DoF (about 750 mm) than that of the SBM technique (300 mm). The mean absolute errors are summarized in
Table 1.
In this experiment, thirteen projector patterns (four multi-line patterns and nine gray-code patterns) are projected to achieve a 3D reconstruction result of the proposed method. Corresponding images are captured with a framerate of 400 Hz. The acquisition time of the 3D scan is 32.5 ms. By calculating the projector coordinates with the neural network model and by converting it into 3D coordinates with the polynomial reconstruction model, the computing time of the proposed method is squeezed into 35 ms (including 9 ms for computing projector coordinates, 11 ms for coordinate unwrapping, and 15 ms for calculating 3D coordinates). By contrast, it is shorter than that of the 3D scanning technique using the phase-shifting algorithm (358 ms).
With respect to the measurement accuracy of the 3D scanning technique, the reflectivity of the object surface is also an important influencing factor. Actually, the non-uniform reflectivity will lead to obvious errors in the computed peak positions. Its generation mechanism and compensation method need to be further studied.