Pering Laboratory

Characterizing Visual Localization and Mapping Datasets

Sajad Saeedi Eduardo D C Carvalho Wenbin Li Dimos Tzoumanikas
Stefan Leutenegger Paul H. J. Kelly Andrew J. Davison

Imperial College London

Overview

Benchmarking mapping and motion estimation algorithms is established practice in robotics and computer vision. As the diversity of datasets increases, in terms of the trajectories, models and scenes, it becomes a challenge to select datasets for a given benchmarking purpose. This paper addresses this concern by developing novel metrics to evaluate trajectories and the environments without relying on any SLAM or motion estimation algorithm. The metrics, which so far have been missing in the research community, can be applied to the plethora of datasets that exist. The proposed metric is a general metric and can be used for other purposes. To demonstrate the effectiveness of the metric in robotic applications, the metric has also been applied to run-time SLAM adaptation. Additionally, to improve the robotics SLAM benchmarking, the paper presents a new dataset for visual localization and mapping algorithms. A broad range of real-world trajectories is used in very high-quality scenes and a rendering framework to create a set of synthetic datasets with ground-truth trajectory and dense map which are representative of key SLAM applications such as virtual reality (VR), micro aerial vehicle (MAV) flight, and ground robotics.

Interior Scenes

In this work, we render ground-truth data on two high resolution scenes provided by kujiale.com. Scene Diamond (left, 3,539,795 vertices and 6,951,971 faces) particularly represents a wide interior space with low light condition whilst Deer (right, 6,863,205 vertices and 13,617,380 faces) shows a crowded living room with multiple small objects occluded from each other.

Real-world Trajectories

Here we demonstrate several ground truth camera trajectories used in our work:

Rendered RGB-D GT: Deer Scene

Ground Robot Number of Images: 1600. Size: 587M/1.4G Frame Rate: 20Hz Total Time: 80 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Walk Head Number of Images: 1308. Size: 603M/1.3G Frame Rate: 20Hz Total Time: 65 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Walk Number of Images: 1281. Size: 586M/1.3G Frame Rate: 20Hz Total Time: 60 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Running Number of Images: 569. Size: 259M/582M Frame Rate: 30Hz Total Time: 17 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT

VR Slow Number of Images: 2083. Size: 1.0G/2.1G Frame Rate: 30Hz Total Time: 62.5 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	VR Fast Number of Images: 1228. Size: 581M/1.2G Frame Rate: 20Hz Total Time: 60 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	MAV Slow Number of Images: 2001. Size: 1.2G/2.0G Frame Rate: 20Hz Total Time: 100 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	MAV Fast Number of Images: 2050. Size: 1.0G/2.0G Frame Rate: 20Hz Total Time: 100 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT

Rendered RGB-D GT: Diamond Scene

Ground Robot Number of Images: 1600. Size: 673M/1.4G Frame Rate: 20Hz Total Time: 80 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Walk Head Number of Images: 1308. Size: 722M/1.2G Frame Rate: 20Hz Total Time: 65 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Walk Number of Images: 1281. Size: 664M/1.2G Frame Rate: 20Hz Total Time: 60 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	Running Number of Images: 569. Size: 309M/548M Frame Rate: 30Hz Total Time: 17 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT

VR Slow Number of Images: 2083. Size: 1.0G/2.0G Frame Rate: 30Hz Total Time: 62.5 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	VR Fast Number of Images: 1228. Size: 718M/1.2G Frame Rate: 20Hz Total Time: 60 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	MAV Slow Number of Images: 2001. Size: 1.1G/1.9G Frame Rate: 20Hz Total Time: 100 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT	MAV Fast Number of Images: 2050. Size: 1.0G/2.0G Frame Rate: 20Hz Total Time: 100 secs RGB-D PNGs RGB-D PNGs with noise GT Poses ViSim format: TrajectoryGT

Evaluation

No.	Model	Trajectory	ATE [m]	RPE [m]	Tracked [%]
1	Deer	Ground Robot	0.01452	0.01104	59
2	Deer	Walk-Head	0.2074	0.05496	99
3	Deer	Walk	0.08640	0.1454	100
4	Deer	Run	0.04650	0.0672	64
5	Deer	VR slow	0.6152	0.06583	98
6	Deer	VR fast	0.9051	0.3117	40
7	Deer	MAV slow	0.0102	0.0929	100
8	Deer	MAV fast	0.9525	0.0681	96
9	Diamond	Ground Robot	0.2112	0.0181	100
10	Diamond	Walk-Head	0.0187	0.0482	100
11	Diamond	Walk	0.0479	0.0530	100
12	Diamond	Run	0.1408	0.0750	100
13	Diamond	VR slow	0.0192	0.0667	100
14	Diamond	VR fast	1.354	0.3113	57
15	Diamond	MAV slow	0.0086	0.0758	100
16	Diamond	MAV fast	0.0090	0.1162	100

Table 1. Absolute trajectory error (ATE), relative pose error (RPE), and the percentage of the frames being tracked by ORBSLAM2.0 in the RGB-D mode, for all trajectories in the two models, deer and diamond.

Here we show two sample trajectories above for MAV-Slow and MAV-Fast from two different scenes.

License

The scenes, images, video and related ground truth data presented within this page is intended for research use ONLY.

Acknowledgements

This research is supported by Engineering and Physical Sciences Research Council (EPSRC), grant references EP/K008730/1 and EP/N018494/1. We also thank Dr. Rui Tang from kujiale.com for providing us the scenes Diamond and Deer.

Citation

Characterizing Visual Localization and Mapping Datasets
Sajad Saeedi, Eduardo D C Carvalho, Wenbin Li, Dimos Tzoumanikas,
Stefan Leutenegger, Paul H J Kelly, Andrew J Davison
International Conference on Robotics and Automation, ICRA 2019

@inproceedings { Characterizing19,
      author = { Sajad Saeedi, Eduardo D C Carvalho, Wenbin Li, Dimos Tzoumanikas, 
                 Stefan Leutenegger, Paul H J Kelly, Andrew J Davison},
   booktitle = { International Conference on Robotics and Automation (ICRA) },
       title = { Characterizing Visual Localization and Mapping Datasets },
       pages = { 6699--6705 },
        year = { 2019 },
organization = { IEEE }
}