3D perception for robotics

Hello everyone! Today I would like to go with you through the steps of filtering a camera image, clustering for segmentation and doing object recognition, all that for the famous PR2 robot. We will use the Point Cloud Library (PCL) to help us with that along with Gazebo, RViz and Python.

In the following sections, I will introduce each mention aspect in detail.

The PR2 robot

The PR2 is an open and robust robot platform designed from the ground up for software developers and researchers. By eliminating the need to first build a hardware system and then re-implement code, the PR2 allows software experts to immediately create new functionality on the robot.

The PR2 Robot is fully integrated with ROS, providing the power of all the ROS developer tools and out-of-the-box functionality for everything from full system calibration to manipulation.

To allow development without a physical PR2, we provide a simulator, with tutorials to get you started. The ROS interfaces to the simulator and to the real robot are the same, so code written for one will generally run on the other.


Calibration, filtering and segmentation

The first step was to create a Point Cloud from the RGBD camera mounted on the PR2 robot. The data was received from the /pr2/world/points topic in ROS, which contained noise (the data wasn’t clear). I had to calibrate and filter the data, to get a satisfying result.

Here are the techniques I used...

Statistical Outlier Filtering

The Statistical Outlier Filter performs a statistical analysis in the neighborhood of each point, and remove those points which do not meet a certain criteria. By assuming a Gaussian distribution, all points whose mean distances are outside of an interval defined by the global distances mean+standard deviation are considered to be outliers and removed from the point cloud.

def statistical_outlier_filtering(data, k=20, thresh=0.5):
    sof = data.make_statistical_outlier_filter()
    return sof.filter()

Voxel Grid Downsampling

A voxel grid filter allows you to downsample the data by taking a spatial average of the points in the cloud confined by each voxel. You can adjust the sampling size by setting the voxel size along each dimension. The set of points which lie within the bounds of a voxel are assigned to that voxel and statistically combined into one output point. The resulting Point Cloud is smaller.

def voxel_grid_downsampling(data, leaf_size=0.01):
    vox = data.make_voxel_grid_filter()
    vox.set_leaf_size(leaf_size, leaf_size, leaf_size)
    return vox.filter()

Pass Through Filtering

The Pass Through Filter works much like a cropping tool, which allows you to crop any given 3D point cloud by specifying an axis with cut-off values along that axis. The region you allow to pass through, is often referred to as region of interest.

def passthrough_filter(data, axis, min, max):
    passthrough = data.make_passthrough_filter()
    passthrough_z.set_filter_limits(min, max)
    return passthrough.filter()

In the project, the passthrough filter was used twice. First in the z axis, and next in the x axis (to take care of the place boxes).

pcl_data = passthrough_filter(pcl_data, 'z', 0.6, 1.1)
pcl_data = passthrough_filter(pcl_data, 'x', 0.3, 1.0) # Remove boxes

RANSAC algorithm

The RANSAC algorithm assumes that all of the data in a dataset is composed of both inliers and outliers, where inliers can be defined by a particular model with a specific set of parameters, while outliers do not fit that model and hence can be discarded. Like in the example below, we can extract the outliners that are not good fits for the model.

In our case inliners are the objects placed on the table, while outliners include the table and everything else.

def ransac_segmentation(data, max_distance=0.01):
    ransac = data.make_segmenter()
    return ransac.segment()

# Getting the objects and table example
inliners, coefficients = ransac_segmentation(data)
objects = data.extract(inliners, negative=True) # Please notice the negative value
table = data.extract(inliners, negative=False) # And here... :)

Clustering for segmentation

Once we get the Point Clouds associated with objects, we can begin the clustering and segmentation process. We need this, to be able to recognize objects placed on the table.

The k-d tree data structure is used in the Euclidian Clustering algorithm to decrease the computational burden of searching for neighboring points. While other efficient algorithms/data structures for nearest neighbor search exist, PCL’s Euclidian Clustering algorithm only supports k-d trees.

def euclidean_clustering(white_cloud, tolerance=0.01, min=30, max=5000):
    tree = white_cloud.make_kdtree()
    ec = white_cloud.make_EuclideanClusterExtraction()
    return ec.Extract()

I found out that using the above values presented good results for clustering even the smallest objects (for example glue in the 3rd world scene).

Once we have that going, we can visualize the cluster in a way, that is readable to the human eye:

def get_cluster_cloud(cluster_indices, white_cloud):
    cluster_color = get_color_list(len(cluster_indices))
    color_cluster_point_list = []

    for j, indices in enumerate(cluster_indices):
        for i, indice in enumerate(indices):
    cluster_cloud = pcl.PointCloud_PointXYZRGB()
    return cluster_cloud

Object recognition

Before doing any actual object recognition, I had to create a labeled dataset first. I have used the capture_features.py script with 100 iterations for each object to get a good feature list. The result was saved into training_set.sav.

Features in Training Set: 800
Invalid Features in Training set: 0
Scores: [ 0.96875  0.9875   0.975    0.98125  0.96875]
Accuracy: 0.98 (+/- 0.01)
accuracy score: 0.97625

Next I have run the train_svm.py script. SVMs work by applying an iterative method to a training dataset, where each item in the training set is characterized by a feature vector and a label. In the image above, each point is characterized by just two features, A and B. The color of each point corresponds to its label, or which class of object it represents in the dataset. Below you can see the Confusion Matrix with and without normalization.

Sending the yaml file

To communicate with the robot, we will generate yaml files, that contain the following:

Here is the code needed for creating the yaml file responsible for robot movement:

# These variables will be used by all objects
test_scene_num = Int32()
test_scene_num.data = SCENE_NUMBER
output = []

# Get object list and dropbox parameters from config files
object_list_param = rospy.get_param('/object_list')
dropbox_list_param = rospy.get_param('/dropbox')

# Loop through all the objects we detected previously
for object in object_list:

    # Create response parameters
    object_name = String()
    arm = String()
    pick_pose = Pose()
    place_pose = Pose()

    # Get object name
    object_name.data = str(object.label)

    # Get the centroid of the object cloud using numpy
    cloud = ros_to_pcl(object.cloud).to_array()
    x, y, z = np.mean(cloud, axis=0)[:3]
    pick_pose.position.x = np.asscalar(x)
    pick_pose.position.y = np.asscalar(y)
    pick_pose.position.z = np.asscalar(z)

    # Get to what group the item belongs
    target_group = None
    for param in object_list_param:
        if param['name'] == object_name.data:
            target_group = param['group']

    # Check for box information
    for param in dropbox_list_param:
        if param['group'] == target_group:
            arm.data = param['name']
            x, y, z = param['position']
            place_pose.position.x = np.float(x)
            place_pose.position.y = np.float(y)
            place_pose.position.z = np.float(z)

    # Create yaml for object using helper method
    dict = make_yaml_dict(test_scene_num, arm, object_name, pick_pose, place_pose)

    # Add yaml to output

    # Send data to the PR2 robot
        pick_place_routine = rospy.ServiceProxy('pick_place_routine', PickPlace)
        # Pass the data to the response
        resp = pick_place_routine(

        print ("Response: ", resp.success)
    except rospy.ServiceException, e:
        print "Service call failed: %s" % e


The presented methods allowed me to successfully recognize the objects in 3 test worlds. I had to change the calibration values for filters from the exercises, because they weren’t giving me satisfying results - mostly because of the noise present in the PR2 /pr2/world/points topic.

It amazed me, that I had to do so many feature runs (100 iterations), before I got good results that allowed recognition of items. With previous try, which was 50 iterations, some items weren’t recognized at all.

Also Test World 1 required a higher value for the Euclidean Clustering minimum cluster size - 100 instead of 30 - compared to Test World 2 and 3.

What can be improved in future work:

Tell me what you think!