Home ITCPhD Defence Yaping Lin | Deep learning for semantic segmentation of airborne laser scanning point clouds

PhD Defence Yaping Lin | Deep learning for semantic segmentation of airborne laser scanning point clouds

Deep learning for semantic segmentation of airborne laser scanning point clouds

The PhD defence of Yaping Lin will take place (partly) online and can be followed by a live stream.
Livestream

Yaping Lin is a PhD student in the Department of Earth Observation Science. Supervisor is prof.dr.ir. M.G. Vosselman and co-supervisor is Dr.habil. Y. Yang from the from the faculty of Geo-Information Science and Earth Observation.

ALS data are essential data sources used to generate digital terrain models (DTM), 3D city models, landscape models and high precision maps. Semantic segmentation aiming to assign every point with a semantic label of ALS point clouds is of importance when generating those 3D products that have multiple categories and ask for detailed object geometry. Motivated by the top performance of deep learning algorithms on scene understanding tasks, this Ph.D. thesis investigates the semantic segmentation of ALS point clouds based on deep learning algorithms. We first explore how to learn representative features from ALS point clouds (Chapter 2). Then we focus on how to reduce the manual labelling efforts to train a deep learning model for semantic segmentation. We investigate active learning (Chapter 3) to select and annotate informative points, and weak supervision (Chapter 4) to annotate only weak labels for the pointwise prediction task.

To allow the deep learning networks to learn representative features from ALS data and involve different levels of neighbouring information to extract pointwise geometrical features, we first designed a local feature extractor and then explored contextual information at both object and global levels. The proposed LGEnet takes both 2D and 3D convolutions as local encoders. The combination of 2D and 3D convolutions enables the network to learn more discriminative features for elongated objects distributed on horizontal planes. Furthermore, contextual information is explored at the object level through the proposed segment-based Edge Conditioned Convolution (SegECC), where graphs are constructed among segments. Then a spatial-channel attention is placed at the end of the network. The spatial attention models the global interdependencies between points by calculating the pairwise correlations between all points within an input sphere. The channel attention estimates the similarities between channels, aiming to enhance the learning of class specific discriminative features.

In order to reduce the required annotation efforts for the training of deep learning models, we propose an active and incremental learning framework for semantic segmentation of ALS point clouds. In this framework, we iteratively select point cloud tiles from the unlabelled pool and then incrementally enrich the model knowledge. For the selection criteria, we implement two data dependent uncertainty metrics (point entropy and segment entropy) and one model dependent metric (mutual information). Our proposed segment entropy estimates the semantic heterogeneity within geometrical homogenous units and it achieves the best performance for the ALS datasets. Instead of training from scratch for each iteration, we fine-tune the previous network on the enlarged labelled dataset and this significantly saves the training time.

We also investigate how to alleviate the annotation efforts for the deep learning training through using weak subcloud labels instead of pointwise ground truth for ALS datasets. The first step is to train a classification network with weak subcloud labels after which the pointwise pseudo labels on the training data are produced by the trained classification network. The second step is to exploit the produced pseudo labels to train a segmentation network which then produces predictions on the testing data. The performance of the classification network is boosted by an overlap region loss and an elevation attention. The overlap region loss provides more localization cues to the classification network and the elevation attention allows the classification network to learn more representative features from ALS data. For the training of the segmentation network, we use a supervised contrastive loss that uncovers the underlying correlations of class-specific features. This loss allows the segmentation network to effectively learn more representative class specific features from inaccurate pointwise pseudo labels. The benefit is maximal when the network dynamically updates the labels for those points whose pseudo labels generated by the classification network are not confident.

In conclusion, this thesis investigates deep learning algorithms for semantic segmentation of ALS point clouds. One network is proposed to extract representative feature from ALS data and two methods are proposed to reduce the annotation efforts for the training of semantic segmentation networks. In future, how to use self-attention mechanisms as feature extractors could be further explored. Also, unsupervised pre-training and domain adaption are possible solutions to reduce required annotation efforts for the training of semantic segmentation networks.