Effortless 2D-Guided, 3D Gaussian Segmentation: Gaussian Clustering and Filtering

5 Jun 2024


(1) Kun Lan, University of Science and Technology of China;

(2) Haoran Li, University of Science and Technology of China;

(3) Haolin Shi, University of Science and Technology of China;

(4) Wenjun Wu, University of Science and Technology of China;

(5) Yong Liao, University of Science and Technology of China;

(6) Lin Wang, AI Thrust, HKUST(GZ);

(7) Pengyuan Zhou, University of Science and Technology of China.

Abstract and 1. Introduction

2. Related Work

3. Method and 3.1. Point-Based rendering and Semantic Information Learning

3.2. Gaussian Clustering and 3.3. Gaussian Filtering

4. Experiment

4.1. Setups, 4.2. Result and 4.3. Ablations

5. Conclusion and References

3.2. Gaussian Clustering

During experiments, we observed that employing 2D segmentation maps as the sole guide for learning 3D semantic information may lead to inaccuracies in the semantic information of some 3D Gaussians. These inaccuracies manifest either as 3D Gaussians approximating an initial state of uniform distribution across all categories or as exhibiting similar probabilities in a limited number of categories. To address this issue, and considering that objects are continuously distributed in space, we posit that each 3D Gaussian should typically be classified within the same category as other 3D Gaussians located within a certain proximity.

To remedy the inaccuracies in semantic information, we refer to the KNN clustering algorithm. For a 3D scene with pre-learned semantic information, we initially retrieve the object code, denoted as o, of each 3D Gaussian used to represent the scene. These codes then undergo softmax processing to deduce the probability distribution of each 3D Gaussian across various categories. 3D Gaussians with maximum probability values max(sof tmax(o)) < β are selected. Finally, we fed the object codes of these selected 3D Gaussians along with their center coordinates into KNN for clustering. For a query 3D Gaussian, we calculate its distance from the surrounding 3D Gaussians, and the k 3D Gaussians closest in distance are selected, the object code of the query Gaussian is set to the mean of these 3D Gaussians’ object code.

Fig. 2. The qualitative results of our method. The first and second columns are the original image and the foreground objectobtained from the interactive segmentation model, respectively. The third and fourth columns are renderings of the 3D segmentation effect obtained by our method from the test viewpoint and a new viewpoint. The last column is the result of converting the different categories belonging to the 3D Gaussian into RGB values.

3.3. Gaussian Filtering

During experiments, We also found that after 3D semantic information learning and Gaussian clustering, some 3D Gaussians not belonging to the object intended for segmentation were incorrectly segmented out. We observed that these erroneously segmented 3D Gaussians are spatially distant from the rest of the segmented 3D Gaussians, as shown in Fig. 4(a). Therefore, we employ a statistical filtering algorithm similar to that used in point cloud segmentation to solve this problem. For each segmented Gaussian, we calculate its average distance D from the neighboring 3D Gaussians. Then, we compute the mean µ and variance σ of these average distances. Finally, we remove those 3D Gaussians whose average distance D > µ + σ from the current segmentation results.

Fig. 3. Comparison results. (a) is the GT Mask we used to guide the segmentation of 3D Gaussians, (b) is the result of ISRF (from the original paper), and (c) is our result.

This paper is available on arxiv under CC 4.0 license.