Ground Truth Report : Downtown San Francisco

Ground Truth Report : Downtown San Francisco

The RoadMentor ground truth report is a detailed analysis of data collected from a drive through downtown San Francisco. The data, which includes base map tiles, ROSBAG files, camera images, LiDAR scans, GPS and IMU data, and fused vehicle trajectory, was collected using a variety of hardware and sensors, including a Ouster OS-2 128 beam LiDAR, three 1080P IMX 291 Sony cameras, a U-Blox 7 ZedF9P GNSS sensor, a Vector VN-100 IMU, and an NVIDIA DRIVE ORIN platform, mounted on a Mercedes GL450 data collection vehicle.

The report compares the raw GPS data, or input signal, with the post-processed ground truth data, or corrected signal, to understand the magnitude of telemetry drift and localization error. The ground truth data reflects the vehicle’s actual direction of travel, and is created by mapping and ground-truthing the trip data frame by frame.

The analysis of the data collected during the drive showed that the GPS was off by 5-40 meters laterally, -20-90 meters longitudinally, and 2-60 meters vertically. Heading error, or yaw, varied from -10 to 190 degrees, with approximately 1 degree error in roll and pitch.

The 6 DoF SBET ground truth data is created using a localization pipeline, which combines data from the LiDAR and camera sensors to create a highly accurate localization of the vehicle. The process begins with the upload of disengagement data to the RoadMentor web portal, which triggers the localization and ground truthing pipeline. The data is then normalized for compatibility and to accommodate changes in the system calibration and configuration settings.

Next, the image data from each camera is stitched together to create an RGB-D dataset, which is then compared with the LiDAR data. This process creates a virtual sensor, represented by the centroid of all the 4×4 transformation matrices obtained from each camera sensor, and the LiDAR has its own 4×4 transformation matrix with respect to the origin of the vehicle frame. The resulting 3D structure is used to create a top-down, bird’s eye view perspective, which is then combined with structure from motion and global pose corrections to calculate the 6 DoF pose estimation through sensor fusion.

This methodology provides a highly accurate localization of the vehicle, with lateral and longitudinal errors of less than 20 cm. It is an important step in ensuring the safety and reliability of autonomous vehicles, as accurate localization is crucial for the successful navigation of these systems.

You can download the full report here:

Leave a Reply

Your email address will not be published.