PCL 的欧式距离聚类
(感谢前辈) 转自:https://zhuanlan.zhihu.com/p/75117664
聚类代码如下:
from paper_1_v0.my_ransac import my_ransac_v5
import numpy as npimg_id = 1 # 这里读入你的kitti 雷达数据即可
path = r'D:\KITTI\Object\training\velodyne\%06d.bin' % img_id ## Path ## need to be changed
points = np.fromfile(path, dtype=np.float32).reshape(-1, 4)index_ground, index_obj, best_model, alpha = my_ransac_v5(points) # 去除地面,用pcl 的代码也可以,不做也可以print('points shape', points[index_obj, :3].shape)s = time.time()
cloud = pcl.PointCloud(points[index_obj, :3])tree = cloud.make_kdtree()ec = cloud.make_EuclideanClusterExtraction()
# ec.set_ClusterTolerance(0.02)
ec.set_ClusterTolerance(0.62) # 三个参数设置
ec.set_MinClusterSize(100)
ec.set_MaxClusterSize(15000)
ec.set_SearchMethod(tree)
cluster_indices = ec.Extract()print('time:', time.time() - s)print('cluster_indices : ' + str( len(cluster_indices)) + " .")cluster_points = points[index_obj,:].copy()
cluster_points[:,3] = 1000 # 基础颜色for j, indices in enumerate(cluster_indices):print('indices = ' + str(len(indices)))cluster_points[indices,3] = (j+10) * 4000 # 设置每个类的颜色color_cloud = pcl.PointCloud_PointXYZRGB(cluster_points)
visual = pcl.pcl_visualization.CloudViewing()
visual.ShowColorCloud(color_cloud, b'cloud')flag = True
while flag:flag != visual.WasStopped()
能设置的3个参数含义是什么?
tolerance 是设置 kdtree 的近邻搜索的搜索半径,从实验结果来看,tolerance 越大,检测到的范围也就越大; 同时如果搜索半径取一个非常小的值,那么一个实际的对象就会被分割为多个聚类;如果将值设置得太高,那么多个对象就会被分割为一个聚类;
setMinClusterSize()来限制一个聚类最少需要的点数目;
setMaXClusterSize()来限制最多需要的点数目;这两个不要解释;
这个方法底层又是怎么聚类的?
欧几里德算法具体的实现方法大致是:
1 找到空间中某点p10,有kdTree找到离他最近的n个点,判断这n个点到p的距离。将距离小于阈值r的点p12,p13,p14…放在类Q里
2 在 Q(p10) 里找到一点p12,重复1
3 在 Q(p10,p12) 找到一点,重复1,找到p22,p23,p24…全部放进Q里
4 当 Q 再也不能有新点加入了,则完成搜索了;
看 PCL的代码,过程很简单,也能看出这个过程:
pcl::extractEuclideanClusters (const PointCloud<PointT> &cloud,const typename search::Search<PointT>::Ptr &tree,float tolerance, std::vector<PointIndices> &clusters,unsigned int min_pts_per_cluster,unsigned int max_pts_per_cluster)
{if (tree->getInputCloud ()->points.size () != cloud.points.size ()) // 点数量检查{PCL_ERROR ("[pcl::extractEuclideanClusters] Tree built for a different point cloud dataset (%lu) than the input cloud (%lu)!\n", tree->getInputCloud ()->points.size (), cloud.points.size ());return;}// Check if the tree is sorted -- if it is we don't need to check the first elementint nn_start_idx = tree->getSortedResults () ? 1 : 0;// Create a bool vector of processed point indices, and initialize it to falsestd::vector<bool> processed (cloud.points.size (), false);std::vector<int> nn_indices;std::vector<float> nn_distances; // 定义需要的变量// Process all points in the indices vectorfor (int i = 0; i < static_cast<int> (cloud.points.size ()); ++i) //遍历点云中的每一个点{if (processed[i]) //如果该点已经处理则跳过continue;std::vector<int> seed_queue; //定义一个种子队列int sq_idx = 0;seed_queue.push_back (i); //加入一个种子processed[i] = true;while (sq_idx < static_cast<int> (seed_queue.size ())) //遍历每一个种子{// Search for sq_idx kdtree 树的近邻搜索if (!tree->radiusSearch (seed_queue[sq_idx], tolerance, nn_indices, nn_distances)) {sq_idx++;continue; //没找到近邻点就继续}for (size_t j = nn_start_idx; j < nn_indices.size (); ++j) // can't assume sorted (default isn't!){if (nn_indices[j] == -1 || processed[nn_indices[j]]) // Has this point been processed before ?continue; // 种子点的近邻点中如果已经处理就跳出此次循环继续// Perform a simple Euclidean clusteringseed_queue.push_back (nn_indices[j]); //将此种子点的临近点作为新的种子点。入队操作processed[nn_indices[j]] = true; // 该点已经处理,打标签}sq_idx++;}// If this queue is satisfactory, add to the clusters 最大点数和最小点数的类过滤if (seed_queue.size () >= min_pts_per_cluster && seed_queue.size () <= max_pts_per_cluster){pcl::PointIndices r;r.indices.resize (seed_queue.size ());for (size_t j = 0; j < seed_queue.size (); ++j)r.indices[j] = seed_queue[j];// These two lines should not be needed: (can anyone confirm?) -FFstd::sort (r.indices.begin (), r.indices.end ());r.indices.erase (std::unique (r.indices.begin (), r.indices.end ()), r.indices.end ());r.header = cloud.header;clusters.push_back (r); // We could avoid a copy by working directly in the vector}}
}
为什么tolerance 比较小的时候,有很多比较远的点都分为了一类,直接没有进入聚类呢?
因为被点数限制过滤掉了,如果把最小点数设置为10,会发现全部都出来了;
看下不同参数时候的聚类结果:
前面的描述分别是 Tolerance 参数, 聚类花费的时间(约36800个点,去除地面后),聚类的结果类别个数;
Tolerance 0.05 104ms 17Cluster
Tolerance 0.1 223ms 19Cluster
Tolerance 0.2 25cluster 190ms
Tolerance 0.3 35cluster 245ms
Tolerance 0.4 317ms 44cluster
Tolerance 0.5 464ms 51cluster
Tolerance 0.6 615ms 51Cluster
Tolerance 0.65 508ms 48Cluster
Tolerance 0.82 508ms 47Cluster
最小点数设置为 10:
小结下,搜索距离半径设置为0.5-0.8之间得到的结果是比较好的,物体分割的比较好,又没有太粘连在一起;
而最小点数的限制则需要根据点云数据的疏密程度决定,它极大的决定了最后结果的类别数量;