Characterizing Visual Localization and Mapping Datasets

Overview

Benchmarking mapping and motion estimation algorithms is established practice in robotics and computer vision. As the diversity of datasets increases, in terms of the trajectories, models and scenes, it becomes a challenge to select datasets for a given benchmarking purpose. This paper addresses this concern by developing novel metrics to evaluate trajectories and the environments without relying on any SLAM or motion estimation algorithm. The metrics, which so far have been missing in the research community, can be applied to the plethora of datasets that exist. The proposed metric is a general metric and can be used for other purposes. To demonstrate the effectiveness of the metric in robotic applications, the metric has also been applied to run-time SLAM adaptation. Additionally, to improve the robotics SLAM benchmarking, the paper presents a new dataset for visual localization and mapping algorithms. A broad range of real-world trajectories is used in very high-quality scenes and a rendering framework to create a set of synthetic datasets with ground-truth trajectory and dense map which are representative of key SLAM applications such as virtual reality (VR), micro aerial vehicle (MAV) flight, and ground robotics.

Interior Scenes



In this work, we render ground-truth data on two high resolution scenes provided by kujiale.com. Scene Diamond (left, 3,539,795 vertices and 6,951,971 faces) particularly represents a wide interior space with low light condition whilst Deer (right, 6,863,205 vertices and 13,617,380 faces) shows a crowded living room with multiple small objects occluded from each other.



Real-world Trajectories

Here we demonstrate several ground truth camera trajectories used in our work:

Rendered RGB-D GT: Deer Scene



Ground Robot
Number of Images: 1600.
Size: 587M/1.4G
Frame Rate: 20Hz
Total Time: 80 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Walk Head
Number of Images: 1308.
Size: 603M/1.3G
Frame Rate: 20Hz
Total Time: 65 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Walk
Number of Images: 1281.
Size: 586M/1.3G
Frame Rate: 20Hz
Total Time: 60 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Running
Number of Images: 569.
Size: 259M/582M
Frame Rate: 30Hz
Total Time: 17 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT
    

VR Slow
Number of Images: 2083.
Size: 1.0G/2.1G
Frame Rate: 30Hz
Total Time: 62.5 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

VR Fast
Number of Images: 1228.
Size: 581M/1.2G
Frame Rate: 20Hz
Total Time: 60 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

MAV Slow
Number of Images: 2001.
Size: 1.2G/2.0G
Frame Rate: 20Hz
Total Time: 100 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

MAV Fast
Number of Images: 2050.
Size: 1.0G/2.0G
Frame Rate: 20Hz
Total Time: 100 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT
    
    

Rendered RGB-D GT: Diamond Scene



Ground Robot
Number of Images: 1600.
Size: 673M/1.4G
Frame Rate: 20Hz
Total Time: 80 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Walk Head
Number of Images: 1308.
Size: 722M/1.2G
Frame Rate: 20Hz
Total Time: 65 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Walk
Number of Images: 1281.
Size: 664M/1.2G
Frame Rate: 20Hz
Total Time: 60 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

Running
Number of Images: 569.
Size: 309M/548M
Frame Rate: 30Hz
Total Time: 17 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT
    

VR Slow
Number of Images: 2083.
Size: 1.0G/2.0G
Frame Rate: 30Hz
Total Time: 62.5 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

VR Fast
Number of Images: 1228.
Size: 718M/1.2G
Frame Rate: 20Hz
Total Time: 60 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

MAV Slow
Number of Images: 2001.
Size: 1.1G/1.9G
Frame Rate: 20Hz
Total Time: 100 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT

MAV Fast
Number of Images: 2050.
Size: 1.0G/2.0G
Frame Rate: 20Hz
Total Time: 100 secs
RGB-D PNGs
RGB-D PNGs with noise
GT Poses ViSim format: TrajectoryGT
    
    

Evaluation

No. Model Trajectory ATE [m] RPE [m] Tracked [%]
1 Deer Ground Robot 0.01452 0.01104 59
2 Deer Walk-Head 0.2074 0.05496 99
3 Deer Walk 0.08640 0.1454 100
4 Deer Run 0.04650 0.0672 64
5 Deer VR slow 0.6152 0.06583 98
6 Deer VR fast 0.9051 0.3117 40
7 Deer MAV slow 0.0102 0.0929 100
8 Deer MAV fast 0.9525 0.0681 96
9 Diamond Ground Robot 0.2112 0.0181 100
10 Diamond Walk-Head 0.0187 0.0482 100
11 Diamond Walk 0.0479 0.0530 100
12 Diamond Run 0.1408 0.0750 100
13 Diamond VR slow 0.0192 0.0667 100
14 Diamond VR fast 1.354 0.3113 57
15 Diamond MAV slow 0.0086 0.0758 100
16 Diamond MAV fast 0.0090 0.1162 100

Table 1. Absolute trajectory error (ATE), relative pose error (RPE), and the percentage of the frames being tracked by ORBSLAM2.0 in the RGB-D mode, for all trajectories in the two models, deer and diamond.

Here we show two sample trajectories above for MAV-Slow and MAV-Fast from two different scenes.

License

The scenes, images, video and related ground truth data presented within this page is intended for research use ONLY.

Acknowledgements

This research is supported by Engineering and Physical Sciences Research Council (EPSRC), grant references EP/K008730/1 and EP/N018494/1. We also thank Dr. Rui Tang from kujiale.com for providing us the scenes Diamond and Deer.

Citation

 

Characterizing Visual Localization and Mapping Datasets
Sajad Saeedi, Eduardo D C Carvalho, Wenbin Li, Dimos Tzoumanikas,
Stefan Leutenegger, Paul H J Kelly, Andrew J Davison

International Conference on Robotics and Automation, ICRA 2019
@inproceedings { Characterizing19,
      author = { Sajad Saeedi, Eduardo D C Carvalho, Wenbin Li, Dimos Tzoumanikas, 
                 Stefan Leutenegger, Paul H J Kelly, Andrew J Davison},
   booktitle = { International Conference on Robotics and Automation (ICRA) },
       title = { Characterizing Visual Localization and Mapping Datasets },
       pages = { 6699--6705 },
        year = { 2019 },
organization = { IEEE }
}