1. Introduction and Motivation
The ageing infrastructure of sea and inland ports requires new technologies and methods in the preparation and implementation of life cycle management processes. The traditional processes are usually time- and labour-intensive, and should be replaced by new automated, smart and innovative measurement and analysis processes to ensure transparency, resource efficiency and reliability for a more dependable lifetime prediction.
Port infrastructure, such as quay walls for loading and unloading ships, bridges, locks and flood gates, are mostly made of concrete, bricks, steel and, in the case of very old structures, wood. They are subject to severe degradation due to especial environmental conditions and human activities throughout their lifetime. The material of seaports is especially profoundly affected by saltwater, which damages the concrete structures, sheet pile walls or wooden structures. It is crucial to detect any damage and categorize its importance to ensure the safety and stability of the infrastructure. Identifying structural damage in time allows early maintenance and avoids expensive repairs and the collapse of the infrastructure.
Nowadays, the monitoring of port infrastructural buildings is divided into the parts above and below water. The structural testing of port infrastructure above water is carried out by manual and visual inspections. The recording and documentation of the condition of damage below water involve considerably more effort; the infrastructure is tested sample-wise every 50 to 100 m; the divers slide down the structure and try to sense the wall with their hands. The results depend directly on human sensory tests. Therefore, damage inspections below water with divers are highly variable in quality and quantity. Damage classification and development are not reproducible due to the subjective perception. In addition, there is usually no comprehensive inspection below water, thus, only a small percent of the structure can be inspected by divers. One way to deal with this problem is by utilising sensors that detect the shape of the object. Such sensors provide point clouds and include laser scanners for surfaces above water and echo-sonars for those below water. The focus in this paper is on the general process of damage detection in point clouds. We use two datasets to validate the overall procedure. The first is a simulated dataset of a sheet pile wall below water. The second is a real dataset of a concrete quay wall from the northern German city harbour of Lübeck, measured with a laser scanner above water. It is not possible to detect small damages, such as cracks, especially in the area below water, due to the point spacing, which is 20 mm. Therefore, the main focus is on the detection of spalling damages larger then 20 mm.
It is essential when monitoring harbour structures to assure a transparent, efficient and quality-controlled process. This can be achieved by a comprehensive visual inspection at short time intervals during the whole life cycle of the structure. However, a quality-controlled visual inspection is nearly impossible in regions such as the Ems, Weser and Elbe due to the high level of sedimentation. In this research, a fully automated, quality-controlled and reproducible three-dimensional (3D) sensing and damage detection of port infrastructures, above and below water, is proposed. Based on the results obtained, the port operator has more reliable information to efficiently plan maintenance and construction work. This approach will reduce the expenses significantly by lowering the downtimes of the port facilities and well-planned construction. Damage detection is usually performed in modern data processing based on pattern recognition methods (see [
1] for more information). This is a reliable approach to detect any damages and make a well-founded assessment of the current state of the structure. Not only exact but also high-resolution 3D data for the above and below water parts of the building are required for the acquisition of the building geometry and condition.
Various publications deal with comprehensive sensing methods for the structural health monitoring of concrete or other materials above water or in clear offshore regions. A static underwater multibeam scanner for 3D reconstruction of subaquatic environments was introduced by [
2]. Robert et al. [
3] used a multibeam echo-sounder and underwater stereo cameras to create a 3D point cloud of vertical marine structures. Hadavandsiri et al. [
4] introduced a new approach for the automatic, preliminary detection of damage in concrete structures with terrestrial ground scanners and a systematic threshold. An automatic classification for underwater geomorphological bed forms was presented by Ref. [
5], which achieved an overall accuracy of 94%. A long-term monitoring approach for zigzag-shaped retaining structures is propose by Ref. [
6]. Aldosari et al. [
7] used a ultra-high accuracy wheel-based mobile LiDAR mapping system for monitoring mechanically stabilized earth walls. O’Byrne et al. [
8] detected disturbances by the texture segmentation of colour images. Gatys et al. [
9] showed that neural networks trained on natural images learn to represent textures in such a way that they can synthesize realistic textures and even entire scenes. Neural networks, as feature extraction, are, thus, preferred over hand-crafted features [
10,
11,
12]. A novel sensor data-driven fault diagnosis method is proposed based on convolutional neural networks (CNNs) by [
13]. However, the limitation of such a transfer of features remains an open research question, especially when the input domain has the same topological structure but different statistical behaviour. The detection of non-normal instances within datasets is often called anomaly detection.
The definition of outliers first mentioned by [
14] for outlier detection varies widely nowadays. Anomalies are no longer just understood as incorrect readings, but are often associated with particularly interesting events or suspicious datasets. The original definition was, therefore, extended by [
15]:
Two widely used methods in anomaly detection are transfer learning and local outlier factors (LOF). Transfer learning adopts pretrained neural networks based on a different domain [
16]. This results in advantages such as faster creation, better model quality, and less use of resources (training data). Breunig et al. [
17] describe a method called LOF, which judges a sub-element on how isolated it is regarding the local neighbourhood.
Nowadays, anomaly detection algorithms are often used in many application domains. García-Teodoro et al. [
18] describe a method using anomaly detection algorithms to identify network-based intrusion. In this context, anomaly detection is also often called behavioural analysis, and these systems typically use simple but fast algorithms. Other possible scenarios are fraud detection [
19], medical applications (such as patient monitoring during electrocardiography [
20]), data leakage prevention [
21] and other more specialised applications, such as movement detection in surveillance cameras [
22] or document authentification in forensic applications [
23].
In this work, we aim to detect structural damages in infrastructures based on point clouds. We use anomaly detection algorithms due to the large imbalance between damaged and undamaged areas and the small amount of training data for the damaged areas. The novel detection approach we use can classify defective from non-defective features in a simulated data environment. The procedure of transferring features from natural images to point clouds and then performing a novel detection is totally new in the context of structural health monitoring systems. It is now for the first time possible to detect damages in an automated manner. This opens the door for further research into the use of pretrained neural networks for range sensor data. Therefore, the approach developed is applicable in all areas of damage detection for infrastructure objects.
2. Methodology
We first need to preprocess the data, because unstructured, large 3D point clouds are unsuitable for most anomaly detection algorithms. Therefore, we transfer features learnt from natural images to height maps from a range sensor. A height map or height field (also called a digital elevation model (DEM) [
24]) in computer graphics is a raster image that is mainly used as a discrete global grid in secondary height modelling. Each pixel records values, such as surface elevation data. In contrast to natural images, the characteristics of height maps depend on the scan resolution and the object scanned itself, which makes transferability difficult. A way to overcome this drawback is to train height map neural networks from scratch [
25].
In our system, three different sensor types are merged into one kinematic multi-sensor system (k-MSS) for the mapping task: a high-resolution hydro-acoustic underwater multibeam echo-sounder, an above-water profile laser scanner and five high dynamic range cameras. In addition to the IMU-GNSS-based georeferencing method known from various applications, hybrid referencing with automatically tracking total stations is used for positioning (
Figure 1). Although the individual sensors record in a grid pattern, the resulting point cloud is not grid-shaped due to the movements of the carrier platform.
The choice of damage types depends on the application and the relevant task within the life cycle management. In this study, we focus on geometrical damages and, for the time being, we only use point cloud data and no images from the cameras. The point clouds should have the smallest possible distance between the points, but still large enough so that there is no correlation between the points due to overlapping laser footprints. The head of the laser scanner rotates with 100 hz, the platform moves as slow as possible and the position and orientation is obtained from the GNSS/IMU system. Furthermore, the following state-of-the-art does not address mapping and data collection due to the fact that the research contribution lies in the damage detection area. It focuses on damage detection (see [
1] for more details on mapping and data collection).
The method starts with a point cloud of typical structures (see
Section 3 and
Section 5 for details). Firstly, we transform the point cloud into a height field, which is described in
Section 2.1. Secondly, in
Section 2.2, we extract features with a CNN. The third step is the defect detection using two different approaches: transfer learning and LOF (
Section 2.3).
Both methods yield outlier scores, which can be thresholded to achieve a binary classification. In contrast to other common outlier detection methods, these do not make any assumptions about the distribution of the outliers. They are, thus, well-suited for port infrastructural monitoring where each damage is expected to be unique.
2.1. Height Field Generation
Input variables for the machine learning approach are equally sized and rasterised distances between the point cloud and the original damage-free structure. In an optimal scenario, one can use a computer-aided design (CAD) or building information model (BIM) and determine deviations between the model and the point cloud. Unfortunately, no models are available for most existing infrastructural objects. There are two possibilities to overcome this challenge: the manual or (semi-)automatic generation of a CAD model or the use of an approximated local surface, for example, using a moving-window approach (e.g., [
26]).
In the case of the simulated dataset, we use a mathematical model of a sheet-pile wall to create the simulated dataset and the corresponding CAD model. The distances from each point to the corresponding plane in the model are determined according to [
27] with Equation (
1) and are then rasterised into a two-dimensional (2D) height field with an equal 2 cm raster size,
where
is the normal vector of the plane with the entries
,
and
.
d is the distance to the origin and
,
and
are the co-ordinates of the point.
There is no existing CAD model of the quay wall for the real dataset, therefore, we had to create the model ourselves. For this purpose, regular shapes are fitted into the point cloud and the distance from the points to the geometry is determined. A simple plane according to [
27] is used as reference geometry in this work. Firstly, the point clouds are rotated in a consistent direction using principal component analysis. Regular square sections are then cut from the point cloud. These sections overlap by 50% each in the X and Y directions. The cutting into smaller sections is useful to be able to estimate (well) fitting geometries into the point cloud. After cutting, a plane is estimated in each of the sections. We only used the points of the quay wall and the damaged areas for the plane estimation. The distance to the plane is set manually to small values for points that are located on additional objects, such as ladders or fenders. This allows deviations due to damage in the grey value differences to be more clearly visible.
The raster size depends on the resolution of the point cloud and must be adapted to the respective dataset. Empty cells, which occur due to data gaps or inappropriate point distribution, are interpolated according to [
28] with natural neighbour interpolation to avoid interference in the feature extraction step (cf. Equation (
2))
where
is the estimate at
x,
the known data at
,
is the volume of the new cell centred in
x, and
is the volume of the intersection between the new cell centred in
x and the old cell centred in
.
The median value of the distances is used in overpopulated cells, which occur due to inappropriate point distribution. The whole process is implemented in MATLAB and Python and summarised in a flowchart form in
Figure 2.
The height field of the infrastructure object obtained is interpreted as a scalar function defined on a 2D grid, denoted by
. Afterwards, patches are extracted from the grid and rearranged into data vectors. The latter
are organised as matrix
with shape
, where
N is the number of patches and
p the number of pixels.
Figure 3 shows an example of such a height field in grey scale, where a lighter grey value represents a greater deviation from the nominal CAD model.
2.2. Feature Extraction
A deep learning network requires a large amount of high-quality annotated data. However, as damage to port structures is relatively rare, it takes a very long time and a lot of measurements until a sufficient amount of annotated data is available. To overcome this problem, we chose a truncated version of the VGG19 network as the basic backbone for feature extraction and transferred its pretrained parameters on ImageNet to our dataset of port structures. The VGG19 neural network is a standard CNN, pretrained on natural images [
29,
30]. The network consists of 19 layers and is trained in a classification scenario. It is well-known for achieving superhuman performance on the extensive scale image database ImageNet [
31]. The latter consists of more than a million labelled natural images of life scenes (such as cats, people, bicycles), which is very different from the dataset used in this paper. Therefore, we do not use the original network, but a variant that we modified. We only keep the first convolutional layers of the network, including layer pool_4, to prevent overfitting. The reason for this is that deeper layers in the network tend to learn higher order features, such as objects and faces, than lower layers that learn lower order features, such as edges and structures. A comprehensive visualisation can be found in [
32]. We focused in this work on the detection of geometric damage such as spalling, which can be described well with lower order features, therefore, we obtained the best results with the network truncated after layer pool_4.
In contrast to the scalar function of the height field, the VGG19 network requires three-channel input (RGB-colour). Therefore, the signal is broadcast over three channels.We may encounter large height fields depending on the length of the wall scanned. We split the height fields into smaller tiles to compensate for hardware limitations. Dividing a large scan into smaller tiles not only increases the computational efficiency but also creates the possibility of achieving more than one label for the whole area. As a result, defects can be more efficiently located based on the smaller size of the tiles. If a defect is located at the border of a tile, affecting more than one tile, a criterion of 50% overlap in both directions is defined for a more reliable defect detection. Every vector is propagated through the network, and the intermediate activation of the jth layer is stored.
Afterwards, the Gramian matrix of each activation is computed (see [
9] for details). We only keep the diagonal of the Gramian matrix, which relates to the energy per feature, for computational efficiency and because we are not interested in synthesising new data. This leads to the feature vector
, where
k is the number of feature maps in the
jth layer of the network. Note that this procedure always leads to a dimensionality
k independent of the input size
p. Again, we organise all feature vectors as rows in a matrix, resulting in a feature matrix
with shape
.
2.3. Defect Detection
The last step of the damage detection process transforms the features that were extracted from the height fields into a single prediction label. Two different but interchangeable methods were used: transfer learning and LOF. Their performance is evaluated and compared in
Section 4.
2.3.1. Transfer Learning
We use a three-layer feed-forward neural network to transform the extracted features into a single output label.
Firstly, it consists of two fully connected layers of the same size as the extracted features. These layers use the widely used ReLu (rectified linear unit) activation function [
33] to allow for non-linear modelling. Furthermore, a dropout rate of 20% was chosen to help prevent overfitting during the training of the neural network. Secondly, there is a layer with a single neuron that is fully connected to the previous layer. This network is appended to the feature extraction network from
Section 2.2. The value of the single output neuron is then used for threshold-based classification.
2.3.2. Local Outlier Factor
This second type of discriminator uses a standard approach called a LOF [
17]. It is capable of detecting outliers in data (outliers are data points that do not fit in with the rest of the data). In order to achieve outlier detection, the LOF method constructs a reachability graph in feature space to estimate the density of the neighbourhood. It then computes an outlier score for each data point from this density. There are two different ways of using this discriminator:
The first way is to feed an untrained LOF discriminator with new data and let it compute outlier scores for each data point. Using these scores, outliers can be found in new data without any prior training;
The second way uses training on clean data (only showing the normal state without any defects) to create the reachability graph. Afterwards, outlier scores can be computed for new data by comparing it with the reachability graph of the trained normal case.
Either way, the outlier score is then used for threshold-based classification.
4. Evaluation
We have evaluated our algorithms on our own synthetic datasets, which we created as explained in
Section 3.1. The damages in this dataset are ellipsoidal in shape and the three axes of the ellipsoid are randomly sized between 5 and 50 cm.
4.1. Tile Score Merging Method
As discussed in
Section 3.3, the outlier score computed for each individual tile needs to be merged into a non-overlapping grid. Two different functions were used. Their performance was compared on a dataset with known labels.
Figure 7 shows how the false positive and false negative rate relate to each other for varying chosen threshold values using the LOF approach. It can be seen that for any false negative rate, the false positive rate of the mean function is lower than for the min function. Lower false positive rates are of interest for this application, as a higher value may mean more manual work to check a larger number of candidates for damage that are, in fact, in a good state. Thus, the mean function is found to be superior.
4.2. Evaluation Method
The size of a defect in our training data is typically larger than the size of one grid tile. In other words, a defect usually spreads over several grid tiles. It is important for our application that we can find all the defects, but we do not need to know the exact extent of each defect. Because of this, it is sufficient if only one (or more) of the tiles that a defect covers is classified as defective. Thus, for evaluating the usefulness of our algorithms regarding their intended application, we consider a defect as detected even if not all the tiles that it covers were classified as defective. Additionally, we ignore the border region of a labelled defect.
4.3. Threshold Selection
After merging the tile values, a grid of outlier values remains. The threshold required to detect these values into the Boolean labels
normal and
abnormal is not known a priori. It can be chosen arbitrarily to achieve a required maximum false negative rate.
Figure 8 shows how the commonly used classification evaluation metrics are dependent on the threshold for the LOF approach. Outliers are assigned smaller numbers. Thus, with a larger threshold, more tiles will be classified abnormal. The curve for the recall shows that more actual defects can be detected. On the other hand, raising the threshold will decrease the precision, which means that many of the tiles that are classified defective are, in fact, in a good state. The graph for the accuracy is also worth noting. The dataset contains far more normal than abnormal examples, therefore, the accuracy can be close to 100% even when the recall is not that good. This metric is, thus, not suited to evaluating the performance of anomaly detection.
Figure 9 shows the same metrics for the transfer learning approach. The graphs are generally lower, which shows that our transfer learning discriminator is inferior to our LOF discriminator.
Abnormal tiles have lower (more negative) scores than normal tiles. Thus, a larger threshold leads to a decrease in the recall (more defective examples found). However, at the same time, more false positive results are generated, which decreases the precision.
The primary goal with our application is to find most defects, i.e., a large recall. However, at the same time, the precision must be low enough to reduce the work that is required afterwards. Labelling everything as defective is useless. The threshold has to be selected to fulfil both conditions.
We were able to achieve much better results using the LOF approach, as can be seen when comparing the graphs of
Figure 8 and
Figure 9. Thus, we will focus only on that from now on. If we chose a desired recall of 95%, we would achieve the following average results from our data as shown in the confusion matrix in
Table 1 and evaluation metrics in
Table 2:
Figure 10 shows a few examples from our dataset with overlaid classification results. Correct classification of damage is coloured in green, yellow are false positives (classified damage where there is none) and red are false negatives (did not detect the damage). The magenta colouring shows the border areas that were excluded from our analysis.
As can be seen from the examples, most damage that was not detected is close to the borders of the scan strips. They are covered by fewer tiles and, thus, have a lesser chance of being detected.
The reason for the rather worse results in transfer learning is possibly that the trained three-layer network is too small to detect all randomly sized damages.
5. Application to Real Data
Since the method appears fundamentally suitable for identifying potential damage areas when using simulated data, the next step is to analyse its application to the real dataset. The real data represents a quay wall above water. It is surveyed with a terrestrial laser scanner of type Z + F Imager 5016 in Lübeck city port, Germany on 13 sensor positions. We fused the 13 sensor positions on point clouds to achieve a small spacing between the points. The point spacing within the point clouds varies due to different scanning positions but is around 1 cm. The noise reaches 1–2 mm.
Figure 11 shows a photo of the quay wall and the corresponding point cloud. There is obviously spalling in the upper part of the quay wall between the fenders. Two examples of damaged areas are shown in
Figure 12. Both are spalling in the concrete. They are up to 1.5 m wide and reach a height of 50 cm.
The point clouds were manually divided into three categories to generate ground truth: quay wall (blue), concrete spalling (green), and additional objects (red). This classification and corresponding depth and label image are shown in the top row of
Figure 13.
Since the methodology performs much better with LOF, as can be seen in the comparison of
Figure 8 and
Figure 9, we use only LOF for the real data. The average result can be seen in
Figure 14. When comparing the average results from the simulated dataset (
Figure 8) with the real dataset (
Figure 14), it can be seen that the curves for accuracy, precision, recall, and F1 score show similar behaviour with a lower overall accuracy.
Again, a threshold is chosen, where precision and recall are essentially equal, which is a good compromise between true and false positives in an economic sense. The corresponding confusion matrix for the threshold of −1.55 selected can be seen in
Table 3. As can be seen from the table, there is again a strong imbalance between the two classes. The number of false positives and false negatives is essentially the same.
The evaluation metrics for the threshold of −1.55 selected are shown in
Table 4. Accuracy reaches 90.5%. Precision and recall are with 72.2% and 72.6% mainly at the same level. This results in an average F1 score of 72.4%. This is still an indication of a good classification, but somewhat worse than in the simulated data.
The classification result for two exemplary images is shown in
Figure 15. Green, red and yellow indicate true positives, false negatives and false positives, respectively. Here, the original height field is shown with a higher contrast in the middle to make grey value differences in the height field more visible to the human eye. It can be seen in the top row of
Figure 15 that all damages are detected and classified correctly. There are no false positives or false negatives. Only the two small damages at the bottom edge are not recognised, but this is because the edge areas are cut off during classification. The example in the bottom row shows a weaker classification result. Two damaged areas are not detected and, furthermore, two areas are falsely detected as damage.
Therefore, the result is worse than for the simulated data (cf.
Table 2). Nevertheless, the method seems to give good results when applied to real data.
6. Discussion
We assume several reasons for the different results. The first and probably the most important point is that there are other disturbing objects in the data, such as ladders, fenders, plants and ropes. The objects also lead to a higher distance in the height fields, which are currently not separable from higher distances based on real damage. The impact, particularly of plants, can be reduced by measurements in seasons with little vegetation, such as winter. The second point is that we do not clean or filter the data at the beginning. Only a rough manual cutting into the area of interest is carried out. The dataset still contains outliers and sensor artefacts that lead to false measurements. These artefacts are particularly strong where the structure comes into contact with the water. Therefore, the optimal time for the measurement is when the water level is as low as possible. In addition, the noise in the real data may not be normally distributed and still contain systematic components, unlike the simulated data. The threshold value for the separation into damaged and undamaged zones is chosen in the present study in such a way that a good trade-off between detected damage and actually correct classifications is achieved. The threshold may differ from another data set and has to be chosen again. In a very sensible and non-economic approach, one would choose a different threshold value that would give a higher recall.
The method presented is currently limited to geometrical damages, such as spalling and large cracking. The reason is that only 3D point clouds are used, and no colour information, which would be necessary to also detect small cracks and sintering. The method cannot detect damages smaller than the decimetre range because the point spacing and noise level underwater from a multibeam echo-sounder is much higher than a laser-scanned point cloud above water. Damages above water can be detected from the centimetre range due to the higher accuracy of the laser scanner and the smaller point spacing. Nevertheless, the results still contribute to automated damage detection and a digitally guided building inspections process.
The presented method gives similar results to other studies. Point CNN gives a mean interval above the unity score of 74.68% for bridge inspections with point cloud classification [
37]. A combination of images and point clouds based on Otsu’s algorithm for automatic concrete crack-detection achieves an average F1 score of 86.7% [
38].
7. Conclusions and Outlook
The point clouds are converted into depth images and processed in a pretrained CNN with two extensions. Regarding the classification, firstly, an NN is attached to the CNN and, secondly, the LOF is calculated. Building inspection can be digitalised and taken to a completely new level with the method presented. We achieve a significantly higher completeness of the infrastructure inspections with the k-MSS used compared to the manual method with divers. We obtain a quality-controlled and reproducible mapping of the infrastructure by using laser scanners and hydrographic measurements. Suspected damage can be reliably detected and verified through the area-based measurement of the component surfaces above and below water. A comparison of different measurement epochs—as they have to be carried out every six years within the framework of the building inspection—is, thus, also possible for structures below water, so that the damage development and the service life of these economically important structures for our national economy can be better observed and evaluated in the future.
The procedure of transferring the features from natural images to point clouds and then performing a novel detection is totally new in the context of structural health monitoring systems. It is now possible for the first time to detect damage automatically.
The methodology presented is intended to automatically create a suspicion plan with suspected damage regions from point clouds. To be able to apply the methodology in reality, all damaged regions must be found as far as possible. Furthermore, only the damaged regions should be recognised as such. This means that the accuracy and the recall value together should be as high as possible. The methodology was first tested on simulated data and then applied to real data.
The analysis of the simulated data resulted in a very good classification with an F1 score of 96.3%. Concerning the requirements mentioned above, the method is suitable for creating suspicion plans of damage regions on quay walls. The result is slightly less effective for the real data. The F1 score is 72.4%. When looking at examples of damage in the data that has not been detected by our algorithm, it can be seen that most of them are at the border of the scanned depth map.
The results could be further improved by handling the edges separately, as those typically show a significantly different distribution compared to the rest of the scan. So far, we have only been able to test our algorithm on simulated data and one real dataset of a concrete quay wall. However, we are working towards acquiring more real world data with different materials and building types. The proposed strategy is also applicable to other infrastructure objects, such as bridges, high-rise buildings, and tunnels.