Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network

Zou, Xinyan; Yang, Min; Li, Siyu; Hu, Hai

doi:10.3390/ijgi13110411

Open AccessArticle

Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network

¹

School of Resource and Environmental Sciences, Wuhan University, 129 Luoyu Road, Wuhan 430079, China

²

Key Laboratory of Smart Earth, Xi’an 100029, China

³

Technology Innovation Center for Spatio-Temporal Information and Equipment of Intelligent City, Ministry of Natural Resources, Chongqing 401120, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2024, 13(11), 411; https://doi.org/10.3390/ijgi13110411

Submission received: 24 August 2024 / Revised: 26 October 2024 / Accepted: 5 November 2024 / Published: 14 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

The shape classification of building objects is crucial in fields such as map generalization and spatial queries. Recently, convolutional neural networks (CNNs) have been used to capture high-level features and classify building shape patterns based on raster representations. However, this raster-based deep learning method binarizes the areas into building and non-building zones and does not account for the distance information between these areas, potentially leading to the loss of shape feature information. To address this limitation, this study introduces a building shape classification method that incorporates distance field enhancement with a CNN. In this approach, the distance from various pixels to the building boundary is fused into the image data through distance field enhancement computation. The CNN model, specifically InceptionV3, is then employed to learn and classify building shapes using these enhanced images. The experimental results indicate that the accuracy of building shape classification improved by more than 2.5% following distance field enhancement. Notably, the classification accuracies for F-shaped and T-shaped buildings increased significantly by 4.34% and 11.76%, respectively. Moreover, the proposed method demonstrated a strong performance in classifying other building datasets, suggesting its substantial potential for enhancing shape classification in various applications.

Keywords:

building shape; classification; distance field; distance field enhancement; convolutional neural network

1. Introduction

Shapes are crucial for defining spatial objects and representing spatial phenomena [1]. They determine the arrangement of each geospatial entity, which, in turn, illustrates the development and interaction of geographical phenomena [2]. As typical man-made features, buildings often display distinct shape patterns that partially reflect their function and style [3]. These shape patterns not only offer valuable insights into architectural design and urban planning, but also play a critical role in geospatial processes, such as mapping, navigation, and environmental monitoring [4]. Consequently, the classification of building shapes has become a key area of research within geographic information science (GIS) [5,6], facilitating applications, such as building change detection [7], and multiscale building matching [8].

Historically, building shape classification involved the manual computation of features, focusing on morphological characteristics to distinguish between different building shapes. The commonly used indicators include compactness [9,10], concavity [11], rectangularity [12], and the shape index. These methods are intuitive and utilize human expertise to identify the relevant features and patterns. However, they depend heavily on predefined morphological features, which can introduce bias and limit scalability due to the need for subjective feature definition and manual intervention.

With advancements in computer science, the focus has shifted toward employing traditional machine learning algorithms to improve the accuracy of building shape classification. Techniques such as random forest (RF) classifiers have been used to address complex nonlinear relationships in shape classification [13], whereas support vector machines (SVMs) have been applied for the detection and recognition of building shapes [14]. Other methods, including decision trees, K-means [15], and K-nearest neighbors, have also been utilized [16]. These methods represent a significant transition from manual, feature-based approaches to more automated and scalable classification techniques. Nonetheless, they still rely substantially on feature engineering, which involves complex data processing and may not capture all the relevant features of a building shape. Consequently, this reliance on predefined features can lead to missed opportunities for recognizing unique architectural forms and variations, ultimately affecting classification accuracy.

To address these limitations, other researchers have increasingly adopted deep learning techniques. Deep learning, a subset of machine learning, utilizes multi-layered neural networks to model and interpret complex patterns and relationships in data [17]. This approach has significantly advanced fields, such as natural language processing [18], image segmentation [19], and speech recognition [20]. CNNs [21] are known for their ability to learn the spatial hierarchies of features automatically and adaptively, and thus have been extensively used in recent years for classifying building shapes. For instance, Alidoost et al. [22] proposed a CNN-based approach that automatically extracts building detection and classifies roof shapes from a single aerial image. Castagno [23] et al. fused LiDAR data with satellite image data, extracted their features using a CNN, and used decision trees for subsequent building shape classification. Partovi et al. [24] utilized different pre-trained CNN models to extract deep features, and then employed SVM to classify a building roof shape. Jiao et al. [25] converted vector building data into raster images, and then utilized AlexNet to classify building shapes based on manually labeled images, effectively achieving the classification of different building shapes.

In contrast to CNN-based methods, other studies have directly processed vector-based building data using graph convolutional networks (GCNs) [26]. These methods leverage the polygonal vertices and edges of buildings to represent the structural data of buildings. For example, the graph convolutional autoencoder (GCAE) model [6] was used to encode various features of buildings, enabling differentiation between different shapes. The Seq2seq autoencoder model [27] extracted features from a building boundary structure, facilitating shape similarity measurement and retrieval matching. Additionally, several researchers have employed GCNs for shape classification. Liu et al. [28] developed a deep point neural network for building shape recognition to perform convolution operations directly on building points. Yu et al. [29] constructed a computation graph from building contour polygons and utilized a fully connected neural network to classify the building shapes.

The two deep learning-based methods for classifying building shapes each offer distinct advantages. However, the computational efficiency of GCN-based methods are not as good as that of the CNN-based approach methods when dealing with large-scale datasets, mainly because GCNs need to aggregate the information of neighboring nodes on each node, which can lead to a significant increase in computational costs in large-scale graph structures [30]. In contrast, the CNN-based methods offer a more efficient approach for shape classification, as they can extract features directly from images, reducing the additional computational overhead. Furthermore, the CNN-based classification methods [22,23,24,25] benefit from their compatibility with a diverse array of deep learning frameworks. They also have the capability to learn from multiple perspectives at various resolutions, which enhances the efficiency of both training and implementation.

In the application of CNNs for building shape classification, as in this reference [25], the input datasets typically consist of binarized raster images. These images effectively outline building boundaries by creating clear demarcation zones between the background and target pixels, a crucial feature for the feature extraction of CNNs [31]. Furthermore, the contiguous areas of target pixels in these images represent the building’s shape and implicitly include vital structural features such as skeleton lines [32]. However, the uniformity in the values of contiguous target pixels can limit the effectiveness of CNN models in utilizing these features, which may reduce the network’s sensitivity to subtle differences in building shapes, making it challenging to differentiate between similar shapes. As a result, the CNN may not fully capture important structural characteristics, which can lead to decreased classification accuracy.

To optimize the use of building shape information, the distance field method has been adopted to capture the internal structural features of buildings effectively. This method encodes the distance between the internal target pixels and the boundary as a distance value [33,34,35], enabling the precise representation of a building’s internal features. The distance field method has been widely applied in various domains, such as semantic segmentation [36] and image segmentation [37]. For examples, Hu et al. [38] first proposed a distance field-based convolutional neural network (DF-CNN). This method combined a feature map with a distance field branch, improving the overall edge detection performance, and its effectiveness is demonstrated by experiments on multiple datasets. Huang et al. [39] embedded the distance field map into a two-stage network and used the obtained distance map as a learning weight map to enhance left atrium MRI segmentation performance. Wang et al. [40] proposed Deep Distance Transform (DDT) for tubular structure segmentation in CT scans. By using distance transform as an auxiliary task, the method enhanced the model’s ability to segment complex structures like blood vessels, with improved accuracy validated across several medical datasets. In the context of building shape classification, the distance field can enhance the ability of the CNN to capture the structural features within a building image, thereby improving the classification accuracy.

Therefore, this study introduces the distance field method to enrich the shape information of buildings. By calculating the distance of each pixel to the boundary of the building, a more expressive feature image matrix is generated. This approach captures the complex geometry inside the building, while retaining the edge features of the shape, providing a more detailed description of the shape. Subsequently employing the CNN method, we utilize the distance field enhancement images as the input, which further improves classification accuracy. The remainder of this paper is structured as follows: The CNN-based building classification method incorporating the distance field is introduced in Section 2. The experiments conducted and their results are presented in Section 3. Lastly, the conclusions of this study are summarized in Section 4.

2. Methodology

The building shape classification method described in this study is depicted in Figure 1 and involves four primary steps: vector building rasterization, feature enhancement using distance field, the fusion of distance field with building boundary, and building shape classification. Specifically, vector building rasterization transforms each vector building polygon into a raster image to facilitate the extraction of boundary features. The purpose of feature enhancement using the distance field is to assign distance values to the target pixels in raster building images, thereby highlighting the internal structure of a building. By fusing distance field with building boundary, the information within the building images is enriched, providing deeper structural features for subsequent model learning and classification. Ultimately, the CNN model, InceptionV3, employs the enhanced distance field of building images as input data for training and classifying building shapes.

2.1. Vector Building Rasterization

The rasterization process of the vector building data is illustrated in Figure 2. For each building polygon, a square area centered on the geometric center of the polygon and with side length R is established as the sampling range. To ensure complete representation of the building in the resultant raster image, R is evaluated based on the side length of the minimum bounding square of the building. Owing to the varied sizes of buildings in the dataset, this study adopts the median of all minimum bounding square areas as the benchmark. Buildings of differing sizes are scaled to align their minimum bounding square areas with this benchmark. Subsequently, raster images with a resolution of m × m are generated based on the minimum bounding squares. The choice of m must balance the clarity of the raster image and the input data requirements of the subsequent classification model [41].

2.2. Feature Enhancement Using Distance Field

Following rasterization, a binary image that encapsulates the boundary features of the building is produced. In this image, features are primarily focused in the critical regions between the background and target pixels, corresponding to the locations of the building boundaries.

Subsequently, an internal distance field for buildings is created based on binary images that include building boundary features to enhance and enrich the internal structural features of buildings.

Typically, a distance field is defined by assigning a value to each pixel within a finite region of the field space, where the value represents the distance from the pixel to the nearest spatial feature (region boundary). This can be expressed using the following formula:

{F i e l d}_{d i s} (I, J) = m i n \{d i s t a n c e ((I, J), {F e a t u r e}_{i})\}

(1)

where

{F i e l d}_{d i s} (I, J)

represents the distance field value at the pixel with row and column coordinates

(I, J)

in the image, and

d i s t a n c e ((I, J), {F e a t u r e}_{i})

represents the distance from the pixel at coordinates

(I, J)

to the boundary of object

{o b j}_{i}

in the image.

In the distance field within the interior of a building, each pixel location records the shortest distance to the nearest building boundary. Various distance metrics, such as Euclidean and Manhattan distances, are applicable in distance field computations. As illustrated in Figure 3a,b, the Euclidean distance field depicted in Figure 3b more accurately reflects real-world distance measurements compared to the Manhattan distance shown in Figure 3a. This precision may have a positive impact on the task of building shape classification. For instance, the distance field features highlighted in the red box in Figure 3b, as opposed to those in the corresponding location in Figure 3a, are more effective at capturing the turning points of the building’s morphology. This sensitivity to detail could be crucial for enhancing the performance of classification algorithms.

To enhance the efficiency of computing the distance field, this study integrates the use of distance templates, specifically upper and down templates. The basic process is depicted in Figure 3. Initially, a two-dimensional binary image of identical size is created from the raster image obtained in the preceding step, serving as the source field for distance transformation. This source field—a matrix of distance values—assigns a zero value to positions on the building exterior, indicating the initial distance to the image features. Conversely, interior target pixels are assigned an infinite value, representing their initial distances to the features.

Distance transformation scanning was conducted on the distance matrix using an upper-down template, as illustrated in Figure 4 with a 3 × 3 example template. The parameters of the pixels within the template represent their relative Euclidean distances from the template’s center. The upper template scans each pixel in a left-to-right, bottom-to-top sequence, starting from the bottom-left corner. It calculates the sum of the relative distance to the template’s center for each unit and the stored distance at the corresponding position in the distance matrix as the candidate distance value. This process iterates through all the template units to compute the candidate distance values, selects the minimum value among them, and updates the distance matrix with this new value. Let

A [i] [j]

denote the distance between the pixels with row index

i

and column index

j

. The formula for calculating the distance of this pixel during the sweep of the upper template is as follows:

A [i] [j] = m i n {(A [i] [j] + 0), (A [i - 1] [j] + 1), (A [i - 1] [j - 1] + \sqrt{2}), (A [i] [j - 1] + 1), (A [i + 1] [j - 1] + \sqrt{2})}

(2)

Similarly, this study employed a down template to scan each pixel in reverse order. When the distance values of all the pixels stabilize, indicating no further changes, the distance field is considered to have converged. At this point, the distance of each pixel approximately represents the shortest Euclidean distance to the building boundary. In computing the distance field, the use of both upper and down templates ensures the accuracy of the distance calculations in various directions, as well as effectively reduces the computational complexity.

After applying the distance field method, pixels within each raster image of a building were assigned values corresponding to their distances from the boundary of the building. Figure 5a presents a distance field image of a building. To clarify the results of the distance field computation, Figure 5b employs a color palette (black, red, yellow, green, blue, and purple) to depict pixels according to their increasing distance values. This assignment of distinct values to target pixels based on their proximity to the boundary provides a richer representation of the morphological structure of the building. Such enriched data facilitate the more effective feature extraction and classification of building shapes in subsequent analyses.

2.3. Fusion of Distance Field with Building Boundary

Upon generating the binary image of the building boundary and the distance field image that reflects the building’s internal structural features, we enhanced the contrast by multiplying all the pixel values in the distance field image by the coefficient, k. This adjustment aims to accentuate the differences between pixels at varying distances, thereby effectively merging the boundary features with the internal structural features into a single grayscale image. This fusion highlights the benefits of distance field enhancement.

Figure 6 illustrate the enhanced distance field images with the coefficient k set at 4, 5, and 6, respectively. These enhancements preserve the boundary features, as well as amplify the internal spatial characteristics of the buildings, leveraging the distance attributes of the boundaries.

2.4. Building Shape Classification

After fusing the distance field with the building boundary, we utilize enhanced distance field images as the input to conduct shape training and classification using a CNN model. In this study, we employ InceptionV3 for image classification, as illustrated in Figure 7. This model enhances the Inception module, originally based on GoogLeNet [42], and requires a minimum image resolution of 224 × 224 pixels. It extracts features through multiple convolution and pooling layers. Inception modules with convolutional kernels of varying sizes are used to capture receptive field information at different image scales. The features are then pooled and passed through fully connected layers to map them to a lower-dimensional feature vector. A Softmax activation function finally maps this vector to a predefined set of classes, recording the probabilities of the input image belonging to each shape class.

InceptionV3 incorporates significant adjustments inspired by the concept of factorizing convolutions [42]. These adjustments include splitting the convolution into smaller parts, reducing computational complexity, while maintaining the ability of the network to capture complex patterns. This refined structure effectively captures both local and global features, enhancing the expressive power of the network and improving its performance in complex image recognition tasks.

Unlike other CNNs, InceptionV3 employs label-smoothing regularization (LSR) to enhance the classification performance and generalization ability of the network. The core idea of LSR is to adjust the labels during training to values between 0 and 1, thus reducing the model’s overfitting of the training data [42]. The function of LSR is described as follows:

H (q^{'} (l), p (l)) = - \sum_{l = 1}^{N} l o g q^{'} (l) p (l) = (1 - ϵ) H (q (l), p (l)) + ϵ H (u (l), p (l))

(3)

where

H (q ’ (l), p (l))

represents the cross-entropy loss function used to measure the difference between the predicted probability distribution

q ’ (l)

by the model and the true probability distribution

p (l)

;

l

represents the index of the class;

N

denotes the number of classes;

p (l)

indicates the probability of the l-th class in the true probability distribution;

q' (l)

represents the probability of the l-th class in the smoothed probability distribution predicted by the model;

ϵ

indicates the smoothing parameter used to control the degree of smoothing;

q (l)

symbolizes the probability of the l-th class in the probability distribution predicted by the model; and

u (l)

denotes the probability of the l-th category in the uniform distribution. Through LSR, InceptionV3 can effectively prevent the overfitting of the model.

3. Experiments

3.1. Experimental Data and Parameter Settings

The experimental dataset utilized in this study, as described in [6], comprises 10 different building polygon shapes, with 500 instances of each type. We chose these 10 building shape types because they align with our research objectives and represent our dataset well. Additionally, the common shapes of English letters in buildings provide a simplified, yet effective way to demonstrate our algorithm’s performance [43]. This dataset is widely recognized for its application in building shape analysis. During vector building rasterization, the resolution was carefully selected to satisfy the clarity requirements of the InceptionV3 model. A resolution of 256 × 256 pixels was selected, ensuring that all the buildings were displayed clearly and completely. Several examples of the rasterized building images of varying shape are listed in Table 1.

For the computation of distance fields, template sizes are typically selected to be (2n + 1) × (2n + 1) to center the template on the target pixel, where n denotes a natural number. The use of larger distance templates yields distance fields that more closely approximate the real Euclidean distance, while significantly reducing computational efficiency [44]. Therefore, three representative template sizes, 7 × 7, 13 × 13, and 15 × 15, were selected for distance field processing in this study for experimentation. Through these experiments, we found that the 13 × 13 template provided the best balance between distance measurement accuracy and computational efficiency. Consequently, we selected the 13 × 13 template for use in our subsequent experiments.

The dataset was randomly divided into training, validation, and testing sets in a 6:2:2 ratio. Training was performed on a computer running Windows 10, equipped with a GTX 1050 GPU and 2 GB of memory. Given the established performance of the InceptionV3 network, adjustments were made to the learning rate of the model during training to circumvent local optima. After several trials, the learning rate was set at 0.01, the batch size at 16, and the maximum number of iterations at 20,000 to ensure model convergence. A gradient descent optimizer was employed to update the parameters of each layer [42].

3.2. Results and Analysis

Three experiments were conducted using raster building and distance field enhancement images with varying coefficients k. The average accuracy from these experiments was deemed the overall accuracy for each set. The results demonstrated that the overall shape classification accuracy for the raster building images was 96.00%. With varying enhancement coefficients k, the accuracy of the enhanced distance field images exhibited slight variations, as depicted in Figure 8. The findings suggested that excessively small or large values of k adversely affected the shape classification of the distance field images due to the inappropriate pixel distance values. The optimal classification occurred when k was set to five, achieving the highest overall accuracy of 98.80%. Hereafter, unless stated otherwise, k is maintained at five.

Figure 9 presents the changes in training accuracy over epochs for the InceptionV3 model trained using both raster building images and distance field enhancement images. The enhancement of the distance field, which enriches the spatial information of the building images, resulted in a slower convergence rate during training. Nonetheless, the final model attained higher classification accuracy compared to that of the model trained solely on raster building images.

The classification accuracy for different building shapes in the test sets is outlined in Table 2. The results demonstrated improvements in classification accuracy for various building shapes with distance field enhancement, and the total classification accuracy is increased by 2.8%. Notably, the classification accuracy for the F-shaped and T-shaped buildings increased by 4.34% and 11.76%, respectively. These improvements suggest that distance field enhancement effectively extracts implicit building shape information from raster images, thereby enhancing the performance of the classification model for building shapes.

Several examples of the classification result probabilities for building shapes using both raster images and distance field enhancement images are presented in Table 3. The classifications derived from the raster images often result in misclassifications for certain building shapes. Specifically, the Y- and T-shaped buildings are susceptible to misclassification owing to their similar folding degrees, while the U-, L-, and Z-shaped buildings are challenging to classify accurately owing to their compactness similarities. To mitigate these issues, distance field enhancement accentuates the internal differences among buildings of various shape, thereby enriching local features for more accurate classification across the different shape categories.

3.3. Comparative Experiments

3.3.1. Different Sizes of Template Comparation

The previous experiments have demonstrated that distance field enhancement positively influences the accuracy of building shape classification. This section delves into the impact of using different template sizes on classification results through distance field enhancement. The template sizes tested were 7 × 7, 13 × 13, and 15 × 15. The InceptionV3 model was employed to train the system using the enhanced images. The classification outcomes for the test set are documented in Table 4, revealing that the shape classification accuracies for the distance field enhancement images with three template sizes were 97.90%, 98.80%, and 98.40%, respectively.

These results indicate that template size moderately affects classification accuracy, although the variations are not markedly significant. Notably, the 7 × 7 template size improved the distinction of E-, H-, and U-shaped buildings more compared to those of the other sizes. However, it displayed reduced accuracy for the F-, T-, Y-, and Z-shaped buildings. Conversely, the 15 × 15 template size enhanced the classification accuracy for the Y-shaped buildings, but reduced it for the F- and I-shaped buildings.

When the template size is set to 13 × 13 pixels, the training accuracy improves by 1 percentage point compared to that of the 7 × 7 template, and the recognition accuracy across various types becomes more balanced. However, further enlargement of the template does not yield additional improvements in classification precision. Therefore, from the perspective of balancing computational accuracy and efficiency, a 13 × 13 distance template is more suitable for the experiments in this study. Additionally, the experimental results indicate that building shapes with simpler structures, such as L-shaped, T-shaped, and Y-shaped buildings, rely more on the curvature features within their morphology for classification. For these types of buildings, the use of larger distance templates to enhance the accuracy of distance field measurements also leads to improved recognition precision. In contrast, buildings with more distinct structural features, such as E-shaped, H-shaped, and I-shaped structures, exhibit less sensitivity to the accuracy of distance field measurements. These variations suggest that smaller templates may not capture sufficient detail of the target shape, whereas larger templates could introduce noise from the distant features. The optimal template size depends on the specific architectural characteristics of the building.

3.3.2. Different Regions for Distance Field Enhancement Comparation

Distance field enhancement involves various regions, such as internal distance field enhancement (hereinafter referred to as internal enhancement), external distance field enhancement (referred to as external enhancement), and internal–external distance field enhancement (referred to as internal–external enhancement), as depicted in Figure 10. In this section, we employ the latter two methods to enhance the raster images of buildings, followed by using the InceptionV3 to classify the shapes of the enhanced images.

The classification results of building shapes following the application of distance field enhancement across different regions are presented in Table 5. Notably, the accuracy of building shape classification after external enhancement was 97.8%, which represents a 1% decrease compared to that of internal enhancement. Conversely, the accuracy after internal–external enhancement was 94.4%, marking a 4.4% decrease relative to that of internal enhancement. Specifically, external enhancement slightly improved the classification accuracy of the Y-shaped buildings, but diminished the accuracy for the H- and U-shaped buildings. Internal–external enhancement demonstrated high classification accuracy for the E- and F-shaped buildings, but was less effective for the L-, T-, E-, and Z-shaped buildings, with a notable 13.4% decrease in the classification accuracy of the T-shaped buildings compared to that of the two other methods.

Overall, despite achieving comparable shape classification accuracies, the three distance enhancement methods posed different impacts on the classification of specific shape types.

3.4. Tests on Microsoft Building Footprint Dataset

To evaluate the effectiveness of our approach in classifying building shapes in real-world datasets, we used the Microsoft Building Footprint dataset. This dataset is designed for the detection and analysis of building footprints, providing a solid foundation for users to identify building shapes. It collects building boundary information from various real-world sources, such as remote sensing images, and ensures the accuracy of boundary information through semantic segmentation and polygon processing, which help to distinguish shapes. As illustrated in Figure 11, we selected building data from an area in Alaska, USA, for our experiment. A total of 2000 buildings were labeled and classified into the same 10 shape types we identified earlier. Figure 12 shows that the dataset includes 10 different building shapes, each represented by 200 samples.

Using the labeled building data from the dataset, we generated both the raster building images and the distance field enhancement images. To verify the model’s transferability, the accuracy of classifying these buildings using both the previously trained raster image model and the previously trained distance field image trained model separately is listed in Table 6, indicating that our method demonstrates strong adaptability and effectively manages building-shape classification tasks across different scales.

To validate the accuracy of our method in real-world data sources, we compared it with the building shape classification method based on AlexNet from this reference study [25]. In this study, the vector building data sourced from OpenStreetMap (OSM) were first converted into raster images, and AlexNet was trained on manually labeled building images to develop a CNN model for shape classification. Since this study only reported classification results for L-shaped, O-shaped, and T-shaped buildings, we selected these three types for comparison. As shown in Table 7, the comparison results demonstrate that our method achieves higher accuracy and outperforms the AlexNet-based approach in building shape classification.

4. Conclusions

The shape feature extraction and classification of spatial objects are crucial tasks in spatial cognition and represent key challenges in the field of GIS. In this study, we enhanced the existing deep learning classification methods by incorporating the distance field method, which enriches the distance description information for all the pixels within a region, thereby facilitating building classification tasks. This approach provides more detailed structural information, which is beneficial for deep learning-based shape feature extraction. The experimental results indicate that the shape classification accuracy of building images has improved after applying distance field enhancement. Specifically, the classification accuracy for certain building types, such as those with F shapes and T shapes, increased by 4.34% and 11.76%, respectively.

Additionally, our method demonstrated strong transferability to other real-world building datasets, effectively adapting to variations in data quality across different buildings. This finding suggest that our method offers a novel and viable approach to enhancing the performance of deep learning-based pattern classification methods for map objects. Furthermore, we compared our method with the existing building classification methods. The existing approaches utilize the AlexNet network to classify the shapes of raster building images. In the same types of building shape classification task, our method demonstrated higher accuracy, indicating its superiority in building shape classification and showcasing its stronger performance.

However, our study also has some limitations. First, the computational cost of generating distance fields is relatively high, especially for complex datasets. Second, although our method improves classification accuracy for certain building shapes, its effectiveness may be weaker for irregular or ambiguous building shapes, which require further investigation. In future work, we plan to refine the parameter selection process for distance field enhancement and improve the classification model. By conducting systematic experiments to refine the algorithm further, we aim to develop more precise methods for extracting features from building images to increase shape classification accuracy. Furthermore, we intend to extend the application of our method to other feature objects, including road infrastructure and water networks. We will also explore the interpretability of the model to better understand its internal mechanisms.

Author Contributions

Conceptualization, Min Yang and Hai Hu; methodology, Min Yang, Hai Hu and Xinyan Zou; software, Xinyan Zou and Siyu Li; formal analysis, Min Yang, Hai Hu and Xinyan Zou; data curation, Xinyan Zou and Siyu Li; writing—original draft preparation, Xinyan Zou; writing—review and editing, Min Yang, Hai Hu and Xinyan Zou; supervision, Min Yang and Hai Hu; funding acquisition, Min Yang and Hai Hu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Laboratory of Smart Earth under grant number [KF2023ZD04-01] and supported by The National Key R&D Program of China (2022YFC3005704) and the Open Project of Technology Innovation Center for Spatio-temporal Information, Equipment of Intelligent City (STIEIC-KF202309).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yan, X.; Yang, M. A Comparative Study of Various Deep Learning Approaches to Shape Encoding of Planar Geospatial Objects. ISPRS Int. J. Geo-Inf. 2022, 11, 527. [Google Scholar] [CrossRef]
Klettner, S. Why Shape Matters—On the Inherent Qualities of Geometric Shapes for Cartographic Representations. Int. J. Geo-Inf. 2019, 8, 217. [Google Scholar] [CrossRef]
Wurm, M.; Schmitt, A.; Taubenböck, H. Building Types’ Classification Using Shape-Based Features and Linear Discriminant Functions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1901–1912. [Google Scholar] [CrossRef]
Shi, Y.; Xie, X.; Fung, J.C.-H.; Ng, E. Identifying Critical Building Morphological Design Factors of Street-Level Air Pollution Dispersion in High-Density Built Environment Using Mobile Monitoring. Build. Environ. 2018, 128, 248–259. [Google Scholar] [CrossRef]
Yang, M.; Kong, B.; Dang, R.; Yan, X. Classifying Urban Functional Regions by Integrating Buildings and Points-of-Interest Using a Stacking Ensemble Method. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102753. [Google Scholar] [CrossRef]
YAN, X.; AI, T.; YANG, M.; TONG, X. Graph Convolutional Autoencoder Model for the Shape Coding and Cognition of Buildings in Maps. Int. J. Geogr. Inf. Sci. 2021, 35, 490–512. [Google Scholar] [CrossRef]
Zhou, X.; Chen, Z.; Zhang, X.; Ai, T. Change Detection for Building Footprints with Different Levels of Detail Using Combined Shape and Pattern Analysis. ISPRS Int. J. Geo-Inf. 2018, 7, 406. [Google Scholar] [CrossRef]
Zhang, X.; Ai, T.; Stoter, J.; Zhao, X. Data Matching of Building Polygons at Multiple Map Scales Improved by Contextual Information and Relaxation. ISPRS J. Photogramm. Remote Sens. 2014, 92, 147–163. [Google Scholar] [CrossRef]
Li, W.; Goodchild, M.F.; Church, R. An Efficient Measure of Compactness for Two-Dimensional Shapes and Its Application in Regionalization Problems. Int. J. Geogr. Inf. Sci. 2013, 27, 1227–1250. [Google Scholar] [CrossRef]
Maceachren, A.M. Compactness of Geographic Shape: Comparison and Evaluation of Measures. Geogr. Ann. Ser. B Hum. Geogr. 1985, 67, 53–67. [Google Scholar] [CrossRef]
Melih, B.; Sinan, C. Performance of Shape Indices and Classification Schemes for Characterising Perceptual Shape Complexity of Building Footprints in GIS. Int. J. Geogr. Inf. Sci. 2017, 31, 1952–1977. [Google Scholar]
Du, S.; Zhang, F.; Zhang, X. Semantic Classification of Urban Buildings Combining VHR Image and GIS Data: An Improved Random Forest Approach. ISPRS J. Photogramm. Remote Sens. 2015, 105, 107–119. [Google Scholar] [CrossRef]
Zhang, J.; Fan, B.; Li, H.; Liu, Y.; Wei, R.; Liu, S. Research on the Shape Classification Method of Rural Homesteads Based on Parcel Scale—Taking Yangdun Village as an Example. Remote Sens. 2023, 15, 4763. [Google Scholar] [CrossRef]
Inglada, J. Automatic Recognition of Man-Made Objects in High Resolution Optical Remote Sensing Images by SVM Classification of Geometric Image Features. ISPRS J. Photogramm. Remote Sens. 2007, 62, 236–248. [Google Scholar] [CrossRef]
Hong, S.; Lee, Y. Typification of Irregular Shaped Land Parcels Using Machine Learning. J. Archit. Inst. Korea 2022, 38, 189–198. [Google Scholar] [CrossRef]
Novitasari, M.; Yaddarabullah; Permana, S.D.H.; Krishnasari, E.D. Classification of House Buildings Based on Land Size Using the K-Nearest Neighbor Algorithm. In Proceedings of the 2nd International Conference of Science and Information Technology in Smart Administration, Balikpapan, Indonesia, 20–21 October 2021. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Collobert, R.; Com, C.; Weston, J.; Bottou, L.; Org, B.; Karlen, M.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef]
Deng, L.; Platt, J. Ensemble deep learning for speech recognition. In Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Alidoost, F.; Arefi, H. A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2018, 86, 235–248. [Google Scholar] [CrossRef]
Castagno, J.; Atkins, E. Roof Shape Classification from LiDAR and Satellite Image Data Fusion Using Supervised Learning. Sensors 2018, 18, 3960. [Google Scholar] [CrossRef]
Partovi, T.; Fraundorfer, F.; Azimi, S.; Marmanis, D.; Reinartz, P. Roof Type Selection Based on Patch-Based Classsification Using Deep Learning for High Resolution Satellite Imager. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 653–657. [Google Scholar] [CrossRef]
Jiao, Y.; Liu, P.; Liu, A.; Liu, S. Map building shape classification method based on AlexNet. J. Geo-Inf. Sci. 2022, 24, 2333–2341. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, Miami Beach, FL, USA, 24–26 April 2017. [Google Scholar]
Yan, X.; Ai, T.; Yang, M. A simplification of residential feature by the shape cognition and template matching method. Acta Geod. Cartogr. Sin. 2021, 50, 757–765. [Google Scholar]
Liu, C.; Hu, Y.; Li, Z.; Xu, J.; Han, Z.; Guo, J. TriangleConv: A Deep Point Convolutional Network for Recognizing Building Shapes in Map Space. ISPRS Int. J. Geo-Inf. 2021, 10, 687. [Google Scholar] [CrossRef]
Yu, Y.; He, K.; Wu, F.; Xu, J. Graph convolution neural network method for shape classification of areal settlements. Acta Geod. Cartogr. Sin. 2022, 51, 2390–2402. [Google Scholar]
Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; Bronstein, M.M. Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Reda, K.; Kedzierski, M. Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks. Remote Sens. 2020, 12, 2240. [Google Scholar] [CrossRef]
Chen, G.; Qian, H. Extracting Skeleton Lines from Building Footprints by Integration of Vector and Raster Data. ISPRS Int. J. Geo-Inf. 2022, 11, 480. [Google Scholar] [CrossRef]
Rosenfeld, A.; Pfaltz, J.L. Sequential Operations in Digital Picture Processing. J. ACM 1966, 13, 471–494. [Google Scholar] [CrossRef]
Rosenfeld, A.; Pfaltz, J.L. Distance Functions on Digital Pictures. Pattern Recognit. 1968, 1, 33–61. [Google Scholar] [CrossRef]
Hu, H.; Liu, X.; Hu, P. Voronoi Diagram Generation on the Ellipsoidal Earth. Comput. Geosci. 2014, 73, 81–87. [Google Scholar] [CrossRef]
Audebert, N.; Boulch, A.; Le Saux, B.; Lefèvre, S. Distance Transform Regression for Spatially-Aware Deep Semantic Segmentation. Comput. Vis. Image Underst. 2019, 189, 102809. [Google Scholar] [CrossRef]
Papandreou, G.; Maragos, P. Multigrid Geometric Active Contour Models. IEEE Trans. Image Process. 2007, 16, 229–240. [Google Scholar] [CrossRef] [PubMed]
Hu, D.; Yang, H.; Hou, X. Distance Field-Based Convolutional Neural Network for Edge Detection. Comput. Intell. Neurosci. 2022, 2022, 1712258. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Lin, Z.; Jiao, Y.; Chan, M.-T.; Huang, S.; Wang, L. Two-Stage Segmentation Framework Based on Distance Transformation. Sensors 2022, 22, 250. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wei, X.; Liu, F.; Chen, J.; Zhou, Y.; Shen, W.; Fishman, E.K.; Yuille, A.L. Deep Distance Transform for Tubular Structure Segmentation in CT Scans. In Proceedings of the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 14–19 June 2020. [Google Scholar]
Yang, M.; Cheng, L.; Cao, M.; Yan, X. A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions. ISPRS Int. J. Geo-Inf. 2022, 11, 523. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Yan, X.; Ai, T.; Zhang, X. Template Matching and Simplification Method for Building Features Based on Shape Cognition. ISPRS Int. J. Geo-Inf. 2017, 6, 250. [Google Scholar] [CrossRef]
Yang, C.; Hu, H.; Hu, P.; Cao, F. Solution of Euclidean Shortest Path Problem Space with Obstacles. Geomat. Inf. Sci. Wuhan Univ. 2012, 37, 1495–1499. [Google Scholar]

Figure 1. Overall framework of proposed building shape classification method based on distance field and CNN.

Figure 2. An illustration of the building rasterization process.

Figure 3. Distance field images based on different types of distance.

Figure 4. Illustration of distance field computation using example template.

Figure 5. Raster building image after distance field computation.

Figure 6. Distance field enhancement images after fusion of distance field and building boundary.

Figure 7. Simplified illustration of InceptionV3 architecture.

Figure 8. Building shape classification accuracies of distance field enhancement under different k values.

Figure 9. The training accuracies of the model for two building images, respectively.

Figure 10. Different regions of distance field enhancement building images.

Figure 11. A partial view of the experimental area in Alaska, USA.

Figure 12. Large-scale building image samples.

Table 1. Examples of 10 types of building shape in training data.

Shape Type	Shape Example
E-shape
F-shape
H-shape
I-shape
L-shape
O-shape
T-shape
U-shape
Y-shape
Z-shape

Table 2. Classification accuracies of buildings with different shape types.

Image Type	Accuracy (%)
Image Type	E	F	H	I	L	O	T	U	Y	Z	Overall
Raster building image	97.68	92.52	96.63	99.71	98.99	100	88.24	97.70	93.43	99.04	96.00
Distance field enhancement image	98.69	96.86	99.07	100	99.69	100	100	99.30	93.46	100	98.80

Table 3. Classification prediction probabilities of building images.

Image example	Prediction probability (%)										Image example	Prediction probability (%)
Image example	E	F	H	I	L	O	T	U	Y	Z	Image example	E	F	H	I	L	O	T	U	Y	Z
Raster building image	8.43	0.55	0.01	0	0.10	0.02	90.85	0	0	0.03	Distance field enhancement image	99.59	0.24	0	0	0	0	0	0.17	0	0
	0.02	4.43	0	0.50	0	0	0	0	0	95.05		0.14	98.88	0	0.85	0	0	0	0	0	0.13
	96.92	0.08	2.68	0	0	0.01	0	0.31	0	0		0	0	100	0	0	0	0	0	0	0
	0	0	0	46.28	0.46	0	53.26	0	0	0		0	0.11	0	99.81	0	0	0.08	0	0	0
	0	1.67	0	49.57	48.67	0	0	0	0	0		0	0.03	0	0.12	97.32	0	1.55	0	0.98	0
	0	0	0	0	0	100	0	0	0	0		0	0	0	0	0	100	0	0	0	0
	0	99.16	0.01	0.01	0	0	0.53	0	0.01	0.28		0	1.97	0.02	0.01	0.05	0	86.22	0	10.46	1.27
	1.08	2.42	0	0.97	81.94	0.10	0.06	12.56	0.01	0.85		0	0	0	0	0	0	0	100	0	0
	0	0	0	0	0.01	0	99.88	0	0.10	0.01		0	0	0	0	0	0	0	0	100	0
	0	0.06	0	0.06	78.91	0.04	0.14	6.26	0.01	14.52		0	0	0	0	0.02	0	0	0	0	99.97

Table 4. Comparison of building shape classification accuracies under templates of different size.

Template Size	Accuracy (%)
Template Size	E	F	H	I	L	O	T	U	Y	Z	Overall
7 × 7	100	95.41	100	100	97.98	100	97.09	100	91.92	97.20	97.90
13 × 13	98.69	96.86	99.07	100	99.69	100	100	99.30	93.46	100	98.80
15 × 15	99.05	95.00	99.11	99.15	98.91	100	97.83	99.07	95.24	100	98.40

Table 5. Classified accuracies based on distance field enhancement in different regions.

Image Type	Accuracy (%)
Image Type	E	F	H	I	L	O	T	U	Y	Z	Overall
Raster image	97.68	92.52	96.63	99.71	98.99	100	88.24	97.70	93.43	99.04	96.00
Internal enhancement image	98.69	96.86	99.07	100	99.69	100	100	99.30	93.46	100	98.80
External enhancement image	98.11	95.05	95.56	100	99.00	100	100	97.14	93.52	100	97.80
Internal–external enhancement image	100	98.02	99.12	100	94.78	100	86.60	96.04	92.05	96.80	94.40

Table 6. Large-scale buildings classification accuracy.

Image Type	Accuracy (%)
Image Type	E	F	H	I	L	O	T	U	Y	Z	Overall
Raster building image	84.00	84.00	86.50	100	86.50	98.50	82.00	89.50	85.00	97.00	89.10
Distance field enhancement image	91.00	93.00	92.00	100	93.50	99.00	91.50	92.00	89.50	98.50	94.00

Table 7. Comparison of building shape classification accuracy between our method and AlexNet-based method.

Shape Classification Method	Accuracy (%)
Shape Classification Method	L	O	T
AlexNet [25]	92.58	89.56	87.50
Inception V3 based on distance field	93.50	99.00	91.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, X.; Yang, M.; Li, S.; Hu, H. Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network. ISPRS Int. J. Geo-Inf. 2024, 13, 411. https://doi.org/10.3390/ijgi13110411

AMA Style

Zou X, Yang M, Li S, Hu H. Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network. ISPRS International Journal of Geo-Information. 2024; 13(11):411. https://doi.org/10.3390/ijgi13110411

Chicago/Turabian Style

Zou, Xinyan, Min Yang, Siyu Li, and Hai Hu. 2024. "Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network" ISPRS International Journal of Geo-Information 13, no. 11: 411. https://doi.org/10.3390/ijgi13110411

APA Style

Zou, X., Yang, M., Li, S., & Hu, H. (2024). Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network. ISPRS International Journal of Geo-Information, 13(11), 411. https://doi.org/10.3390/ijgi13110411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network

Abstract

1. Introduction

2. Methodology

2.1. Vector Building Rasterization

2.2. Feature Enhancement Using Distance Field

2.3. Fusion of Distance Field with Building Boundary

2.4. Building Shape Classification

3. Experiments

3.1. Experimental Data and Parameter Settings

3.2. Results and Analysis

3.3. Comparative Experiments

3.3.1. Different Sizes of Template Comparation

3.3.2. Different Regions for Distance Field Enhancement Comparation

3.4. Tests on Microsoft Building Footprint Dataset

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI