Panovild: A Challenging Panoramic Vision, Inertial and Lidar Dataset For Simultaneous Localization and Mapping

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

The Journal of Supercomputing (2021) 78:8247–8267

https://doi.org/10.1007/s11227-021-04198-1

PanoVILD: a challenging panoramic vision, inertial


and LiDAR dataset for simultaneous localization
and mapping

Zeeshan Javed1 · Gon‑Woo Kim1

Accepted: 23 October 2021 / Published online: 7 January 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
2021, corrected publication 2022

Abstract
This paper presents a challenging panoramic vision and LiDAR dataset collected
by an autonomous vehicle at Chungbuk National University campus to facilitate
robotics research. The vehicle is equipped with a Point Grey Ladybug 3 camera,
3D-LiDAR, global positioning system (GPS) and inertial measurement unit (IMU).
The data are collected while driving in an outdoor environment, which includes
various scenes such as parking lot, semi-off road path and the campus road scene
with traffic. The data from all sensors mounted on the vehicle are timely registered
and synchronized. The dataset includes point clouds from 3D LiDAR, images, GPS
and IMU measurement. The vision data contain multiple fisheye images covering
360 field-of-view from individual cameras of Ladybug3 at high resolution and accu-
rately stitched spherical panoramic images. The availability of multiple-fisheye and
accurate panoramic images may be used for development and validation of novel
multi-fisheye, panoramic, and 3D LiDAR based simultaneous localization and map-
ping (SLAM) systems. The dataset is collected to target various applications such
as odometry, SLAM, loop closure detection, deep learning based algorithms with
vision, inertial, LiDAR and fusion of visual, inertial and 3D information. To evalu-
ate the algorithm, high accuracy RTK GPS measurements are provided for testing
and evaluation.

Keywords Panoramic cameras · Omnidirectional vision · Visual-inertial odometry ·


Visual-LiDAR odometry · Autonomous car

* Gon‑Woo Kim
[email protected]
Zeeshan Javed
[email protected]
1
Department of Intelligent System and Robotics, Chungbuk National University, Cheongju,
South Korea

13
Vol.:(0123456789)
8248 Z. Javed, G.-W. Kim

1 Introduction

In recent years, wide field-of-view or 360-degree imaging systems have become very
popular in the field of robotics and computer vision. The ability to capture 360-degree
coverage is extremely useful in an unknown complex environment for mobile robot
navigation. In order to take full advantage of wide field of view, numerous algorithms
have been published in the literature in different application. Recently, spherical pano-
ramic imagery has become increasingly popular for object detection and tracking. Yu
et al. [1] proposed grid based spherical convolutional neural networks (G-SCNNs) to
classify objects in spherical images. Moreover, various object detection algorithms
Wang et al. [2] for panoramic images have been presented in the literature by using
convolutional neural networks. Similarly, the most important application of large field
of view imagery is for visual odometry and SLAM. Ji et al. [3] proposed panoramic
SLAM, multi-camera SLAM Yang et al. [4], Omni SLAM Won et al. [5], Cubemap-
SLAM Wang et al. [6], and Multicol-SLAM Urban and Hinz [7] are some feature-
based SLAM system being used in the literature. Some direct SLAM Caruso et al. [8],
Liu et al.[9] methods have been extended to process a large field of cameras. Further-
more, a multi-camera system has been used to perform visual odometry Seok and Lim
et al. [10], Jaramillo et al. [11], extension of direct sparse odometry Matsuki et al. [12],
and visual inertial odometry Ramezan et al. [13], Seok and Lim et al. [14] for both
indoor and outdoor environments.
All of the above discussed method either used panoramic images or large field
of view camera to accomplish different application specifically, visual odometry or
SLAM. However, there is no complete dataset exist that provides panoramic images
along with other sensor setup. Therefore, this paper provides vision and LiDAR data-
set, called PanoVILD. It includes 360-degree camera images, accurately stitched pan-
oramic images, laser scans, GPS and inertial measurement. The increasing number
of 360 camera applications in simultaneous localization and mapping motivated us
to collect large scale visual, laser and inertial data of real-world urban environments,
which might be useful for navigation purposes. The dataset includes total of nine
sequence collecting from campus with urban nature of environment. The data include
a variety of sequences ranging from small scale to large scale, and rich features areas
to featureless empty parking lots. The important aspect of data is the accurate regis-
tration of the 3D LiDAR data with the respective individual omnidirectional camera
images and panoramic images. Data can be useful to test and evaluate omnidirectional
visual odometry and SLAM, omnidirectional Visual-LiDAR odometry and SLAM,
and omnidirectional visual inertial SLAM. We hope that this dataset will be useful in
robotics and vision applications and will provide new research opportunities in terms
of using the panoramic image and LiDAR data together. The dataset is hosted and can
be downloaded from the dataset website www.​irl-​cbnu.​com/​datas​ets.

1.1 Related work

There are several other datasets already available including perspective images
either stereo or monocular and 2D/3D LiDAR to the robotics community. The

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8249

detailed study of some important dataset is presented in Table 1. Ford Campus


vision and LIDAR dataset Pandey et al. [15] provides two sequences of dataset
captured by a perception and navigation sensors. Similarly, to benefit the research
of long-term place recognition and SLAM application in seasonal and lighting
changes, North Campus Long-Term (NCLT) dataset Carlevaris-Bianco et al. [16]
has been published. They use a variety of sensors, consisting of Point Grey Lady-
bug3 camera, Velodyne 3D LiDAR, planar LIDAR, real-time kinematics GPS, IMU
and optic gyroscope and consumer grade GPS. PanoraMIS, an ultra-wide field of
view vision dataset Benseddik et al. [17] provides a dataset with panoramic cam-
eras (catadioptric, twin-fisheye) with accurate ground truth. The dataset has been
recorded using wheeled, aerial and industrial robotics platforms both for indoor and
outdoor environments. Similarly, another omnidirectional vision based dataset is
proposed by Koschorrek et al. [18]. The sensor suite equipped on the car consists of
a Point Grey Ladybug3 omnidirectional camera, an IMU, GPS receiver and a veloc-
ity sensor. Instead of recording omnidirectional dataset from car and Segway robot,
New College dataset Smith et al. [19] is recorded by wheeled robots. The dataset
consists of panoramic images recorded by Ladybug 2, laser data, stereo images and
GPS measurements. The most popular and well known dataset is KITTI suite Geiger
et al. [20]. The dataset consists of measurement from vision, LiDAR and inertial
measurement unit providing the ground truth for optical flow, road detection, visual
odometry, object detection and tracking (both 2D and 3D). The MIT Stata Center
Dataset Fallon et al. [21] provides the long-range and multiple sessions dataset with
stereo vision and laser. The dataset is collected from an indoor environment with
large scale over the course of the year. The Nordland Dataset Norwegian Broad-
casting Corporation. [22] contains the four season dataset from the train using only
monocular vision for place recognition. The RawSeeds project Ceriani et al. [23]
provides a dataset obtained from trinocular, perspective, and catadioptric cameras,
2D laser, RTK GPS, an IMU and from sonar sensors in both indoor and outdoor
environment. Another dataset CMU with the same application for long-range place
recognition and localization is proposed by Badino et al. [24]. The data are captured
by a monocular camera for the whole duration of the year.
The LaFiDa Urban et al. [25] provides vision data from trinocular fisheye cam-
era and laser data with 2D Hokuyo laser scanner. The panoramic dataset Li et al.
[26] is published for spherical object detection. Zhang et al. [27] provides the
synthetic images for three camera models (perspective, fisheye, and catadioptric)
for both indoor and outdoor environments. TUM RGB-D dataset Sturm et al. [28]
and the TUM stereo dataset Schubert et al. [29] are the most popular and widely
used dataset for indoor visual odometry and visual SLAM. The TUM RGB-D
sequence is captured by an RGBD sensor with ground truth provided by MoCap,
while TUM stereo dataset contains imagery recorded by stereo sensor and fish-
eye lenses. Moreover, the dataset provides accurate ground truth poses obtained
from IMU and MoCap. The Oxford RobotCar dataset Maddern et al. [30] is also a
widely used dataset for long-term localization and mapping for autonomous vehi-
cles. The dataset is acquired to provide a variation in seasons and light changes
with vision sensors such as a trinocular stereo camera and three monocular Grass-
hopper2 cameras, 2D/3D LiDARs, IMU and GPS.

13
Table 1  Overview of the related dataset
8250

Dataset Vision Panorama Platform/environ- LiDAR Other sensors/GT Details Target application
ment

13
Ford campus (2011) Omnidirectional, No Vehicle/outdoor 3D/2D LiDAR IMU,GPS /INS 2 sequence (10 km), SLAM, visual odom-
multicamera 70 GB etry, etc.
NCLT vision and Omnidirectional, No Wheel robot/outdoor 3D/2D LiDAR IMU, GPS, gyro- 147.4 km, 39.4 h, 27 Long-term SLAM,
LiDAR dataset multicamera scope/SLAM sequence. place recognition
(2016)
KITTI (2013) Stereo camera No Vehicle/outdoor 3D LiDAR IMU, GPS/INS 22 sequence Object detection, track-
ing, visual odometry,
etc.
MIT stata dataset Stereo camera No Wheel robot/indoor 2D LiDAR None/SLAM 38 hand 42 km Long-term, SLAM,
(2013) place recognition and
planning
New college (2009) Omnidirectional, Yes Wheel robot/Indoor 2D LiDAR IMU, GPS/INS 30 GB of data 2.2 km Localization, SLAM,
Stereo place recognition
PanoraMIS (2020) Catadioptric, twin- No Wheel, aerial No IMU, GPS/Odometry 8 SEQUENCE, 2 SLAM, heading,
Fisheye industrial/indoor, panoramic 4.5 GB altitude estimation,
outdoor odometry,SLAM
AMUSE (2013) Monocular, omnidi- Yes Vehicle/outdoor No IMU, GPS, height 7 sequences 24.4 km Visual odometry,
rectional sensors/INS SLAM
Synthetic (2016) Perspective, Fisheye, No None No None/synthetic 2 sequences Visual SLAM, odom-
catadioptric etry
TUM RGB-D and RGB-D, Stereo No Sensor Rig /indoor/ No IMU, MoCap/Motion 28, 39 (RGB-D) RGB-D/stereo/inertial
Stereo (2012) outdoor capture sequence, 20 km odometry and SLAM
(2018)
Oxford Robot car Fisheye, Trinocular No Vehicle/outdoor 3D LiDAR IMU, GPS/INS Almost 1000 km Visual, LiDAR SLAM
(2017) stereo and odometry, etc.
Nordland (2013) Monocular camera No Train/outdoor No None/ None 3000 km sessions Place recognition
Rawseeds (2009) Trinocular, perspec- No Wheel robot/indoor 2D LiDAR IMU, GPS /INS Approximately 1.9 Visual-inertial-odome-
tive, catadioptric km, 2 dataset try and SLAM, etc.
Z. Javed, G.-W. Kim
Table 1  (continued)
Dataset Vision Panorama Platform/environ- LiDAR Other sensors/GT Details Target application
ment

CMU (2011) Monocular camera No Vehicle/outdoor NO No/none 8.5 km, 12 months Long-term localization
and place recognition
LaFiDA (2017) Trinocular Fisheye No Helmet/indoor, 2D LiDAR MoCap/motion 1–4 min, 6 sequence Multi-camera visual
outdoor capture SLAM
Pano-RSOD (2019) Omnidirectional Yes Vehicle/outdoor NO No /none – Object detection
PanoVILD (our Omnidirectional Yes Vehicle/outdoor 3D LiDAR IMU,GPS/ GNSS 8 sequences, 200 Panoramic, multicam-
dataset) GB, 10 km era, inertial, LiDAR
odometry and SLAM
PanoVILD: a challenging panoramic vision, inertial and LiDAR…
8251

13
8252 Z. Javed, G.-W. Kim

Majority of the datasets provides monocular imagery for localization and map-
ping. However, these monocular camera cannot be used for complex environ-
ment for testing and evaluation of SLAM algorithm. The single camera SLAM
algorithm usually fails in complex scene such as environment having sparsity
of feature point, dynamic sunlight, occlusion of stable scene structure and sea-
son changes. Therefore, we proposed a dataset that contains 360 scene images in
multi-camera rig as well as panoramic images. In case of panoramic dataset, the
Ford vision and laser dataset, PanoraMIS and Koschorrek et al. [18] proposed
dataset are closely related to our dataset. The PanoraMIS and Koschorrek et al.
[18] dataset does not contain the 3D data from LiDAR along with their panoramic
images, most importantly PanoraMIS contains very few sequence with panoramic
images that may not be good enough to test and evaluate the novel algorithm.
In comparison with Ford vision and laser dataset, the dataset does not contain
accurate panorama images along with ladybug camera images. Furthermore, the
images are provided with half resolution with large vertical field of view and less
horizontal view, which may impose difficulties for visual localization and motion
estimation.

1.2 Contribution

This paper presents PanoViLD, a dataset collected with omnidirectional cam-


era, 3D LiDAR, IMU and GPS mounted on vehicles. The dataset is timely regis-
tered and synchronized based on ROS timestamp. The data are captured from the
outdoor environment and comprises 8 sequences with small sequence up to few
meters and large sequence up to 2 km. The dataset contains challenging scenes
such as off-road ground with less features, the sunlight (dynamic range), urban
nature of environment, non-flat road. The high accuracy RTK GPS measurements
are provided as ground truth to evaluate the SLAM algorithm. The major contri-
bution and unique feature of dataset are as follows:

• The dataset provides multi-fisheye and stitched spherical panoramic images


at very high resolution. In case of multi-fisheye images, the dataset provides
the calibration parameters which accurately transform the individual fisheye
image to corresponding panoramic coordinate image.
• The more important feature is that each individual camera and panoramic
image is accurately calibrated with the 3D LiDAR which can benefit the
development of multi-camera/panoramic based Visual-LiDAR SLAM.
• Data have been collected by keeping in mind the requirements and challenges
of visual SLAM and visual and Visual-LiDAR odometry. Therefore, we pro-
vide several small, large and off-road sequences with loop closures.

The PanoViLD dataset includes GPS and IMU measurement along with timestamps,
and both sensor data are synchronized with Ladybug3 images. The dataset contains
the GPS measurement with a camera which can be used to test and evaluate the

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8253

algorithms. The remainder of this paper is organized as follows. The details of the
sensor integration and experimental platform are explained in Sect. 2. Section 3 pro-
vides the method and details of sensor calibration specifically, camera and LiDAR
calibration. Section 4 will discuss baseline SLAM experiments. Finally, the details
of data collection and organization are presented in Sect. 5 and Sect. 5.7 draws
conclusion.

2 Sensors

The proposed framework is implemented and tested on Hyundai i30 (Hyundai


Motor Company, Seoul, South Korea), as shown in Fig. 1. Platform is equipped with
Ladybug3 omnidirectional camera, mounted on the center top of the platform, while
LiDAR is placed at top of Ladybug3 camera as shown in Fig. 1a. The Novatel GPS
is on the left side of the platform with dual antennas setup. The detailed integration
information of perception and navigation sensors are as follows:

1. Ouster os1-64 LiDAR [Ouster, Inc. San Francisco] is the smallest high perfor-
mance LiDAR having the effective 32◦ (+ 16.5 to − 16.5) vertical field of view
(FOV). The rotation rate about the vertical axis is up to 20 Hz to provide a full
360-degree azimuthal field of view. The LIDAR has vertical resolution of 64
channels while it can operate on 512, 1024 and 2048 of horizontal resolution.
The maximum range of the ouster os1-64 is 120 m and it can capture about 1
million points per second. In the proposed platform, LiDAR is operated on 1024
horizontal resolution and the rotation rate of LiDAR is configured at 10 Hz.

Fig. 1  Sensor Platform a The top view of sensors position on CAD image b Hyundai i30 research plat-
form c The close view of platform d The side view of platform with sensor coordinate system. e The
coordinate system of ladybug 3 and individual fish-eye camera. f The corresponding unwrapped and
stitched panoramic image

13
8254 Z. Javed, G.-W. Kim

2. Ladybug3 [Point Grey] is a high resolution spherical digital camera system with
360-degree coverage at a high-speed interface. The Ladybug3 has six 2-Meg-
apixel cameras with five cameras in a circular rig and one camera is positioned at
the top. This helps the system to cover more than 80% area of the full sphere, and
all the cameras are pre-calibrated to enable high quality spherical image stitch-
ing. The Ladybug3 allows to capture data at multiple resolutions and as well as
different frame rates. Moreover, it also provides the hardware jpeg compression
to support high frame rate. In our setup, the dataset is collected at full resolution
(1600 × 1200) at 5 fps in uncompressed (raw) format.
3. RTK NovAtel GPS [NovAtel Inc., PwrPak7D] is compact, robust, high precision
fully integrated global positioning system. It has multi-frequency, dual antenna
input that allows the receiver to harness the power of NovAtel CORRECTⓇ with
RTK and ALIGN functionality which makes them very suitable for ground and
marine based systems. The maximum data rate of GNSS is up to 100 Hz. In the
proposed platform, GPS measurements are recorded at 100 Hz to provide the
ground truth trajectories of a dataset that are used to calculate average trajectory
error (ATE) for evaluation of visual odometry and SLAM algorithms.
4. Inertial Measurement unit [Ouster, Inc. San Francisco] Ouster os1-64 lidar is
integrated with lowest power 6-axis inertial measurement unit that is better suited
for lowest power applications such as smartphones and tablets. The ouster os1-64
IMU unit contains a 3-axis gyroscope. 3-axis accelerometer. It provides drift-free
3D orientation as well as acceleration and velocity of the vehicle at 100 Hz. We
recorded the accelerometer and gyroscope measurements at 100 Hz in our experi-
ment.

3 Sensors calibration

In the proposed platform, all of the sensors are fixed and related to each other
by static rigid body transformation. These transformations are calculated between
sensor setup to guarantee the accurate transformation. Firstly, the ladybug camera
is calibrated to find high accuracy intrinsic and extrinsic parameters as shown
in Fig. 2, which guarantee the accurate transformation from the fish-eye images
to panoramic coordinate image. Similarly, the LiDAR and camera are calibrated
to find transformation between both sensors for accurate projection. The sensors
are fixed to the vehicle; therefore, the transformation between all of the sensors
is static. Moreover, the 6-DOF pose of frame is represented by [31] coordinate
frame convention Xab = {x, y, z, roll, pitch, yaw}. The transformation between
IMU and camera and IMU to LiDAR is manually estimated. Similarly, GPS is
manually calibrated with other sensors such as cameras. The transformation files
are provided with dataset. The detailed calibration procedures of camera sensor
and camera to LiDAR are described in next section.

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8255

Fig. 2  Ladybug images a The distorted images of ladybug 3 fish-eye camera covering horizontal 360
scenes. b The corresponding rectified and un-distorted images based on used calibration method

Fig. 3  LiDAR camera calibration a Input rectified images from five cameras. b The projection of point
cloud on image

3.1 Ladybug calibration

The Ladybug camera has a total six cameras with five attached in a circular rig and
one at the top of system. The calibration both intrinsic and extrinsic are provided
by the manufacturer including lens distortion. The manufacturer also provides the
transformation from distorted to un-distorted images as dense pixel wise mapping.
Figure 3 shows ladybug distorted and un-distorted images from five fish-eye lenses
at full resolution. The pinhole camera intrinsic parameters can be defined as:

13
8256 Z. Javed, G.-W. Kim

⎡ fx 𝛼x cx ⎤
⎢ 0 fy cy ⎥ (1)
⎢ ⎥
⎣ 0 0 1 ⎦

This matrix is obtained after rectification and un-distortion of fish-eye image and the
corresponding intrinsic parameters for each camera of ladybug are shown in Table 2.
Along with manufacturer provided files, the ladybug is calibrated for a multi-cam-
era rig for generation of high accurate panoramic images. The provided calibration
helps to transform each fish-eye pixel to its corresponding panoramic coordinates. In
a multi-camera rig, several fish-eye cameras are placed with a slight offset between
the fish-eye camera and the panoramic center.

3.2 Camera LiDAR calibration

The camera and LiDAR are fixed on a vehicle, the static transformation between
the LiDAR and the Ladybug3 is computed by mutual information maximization
as proposed in [32]. In this work, the reflectivity value of 3D LiDAR and the
gray-scale intensity value of camera are used to calibrate the two sensor modali-
ties. According to method, the correlation between laser reflectivity value and the
camera intensity is maximum for accurate rigid body transformation. In proposed
work, each fish-eye camera of Ladybug3 is calibrated with LiDAR separately by
mutual information maximization. The quality of the calibration for each indi-
vidual image from separate camera is demonstrated in Fig. 3. To demonstrate the
quality of calibration, the clear insight of images is shown in Fig. 4.

Fig. 4  LiDAR camera calibration a Original image with zoomed view. b The corresponding projection
of point cloud

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8257

3.3 Panorama generation

In this dataset, along with individual fish-eye images, we also provide the high
accuracy spherical panorama image generated from calibration of the multi-cam-
era model as detailed in [3] and by manufacturer. The process of panorama gen-
eration is as follows: Firstly, every pixel x of each individual fish-eye camera is
projected onto the spherical point x with radius r (r = 20m in case of Ladybug 3)

according to:

u = 𝛼s ⋅ Ri Ki + Ti (2)

where i = 0 − 4, and function K(u) (intrinsic parameters) is used to project fish-


eye image point u to rectified coordinates, 𝛼 s is the scale factor, and Ri , T i are the

rotation and translation for each camera. respectively. Then, the polar coordinates
𝜃 h (−𝜋, 𝜋) and 𝜃 v (−𝜋∕2, 𝜋∕2) for each spherical point are calculated by Eq. (3).
� � �
x = r cos(𝜃v ) ⋅ cos(𝜃h ), y = r cos(𝜃v ) ⋅ sin(𝜃h ), z = sin(𝜃v ) (3)

𝜋(Ws − 2x)) 𝜋(Hs − 2y))


𝜃h = , 𝜃V = , (4)
Ws Hs

Once we have polar coordinates, then the panoramic image with known width Ws
and height Hs can be unwrapped by Eq. (4). Figure 1f shows corresponding pano-
ramic image.

3.4 Time synchronization

In our experiment, a robot operating system (ROS) is used to record the measure-
ments of each sensor. The all sensors including Ladybug3, 3D LiDAR, GPS and
IMU are time stamped as they arrive at the host system. The Ladybug3 uses Fire
wire bus to transfer data to the host computer and it has a transfer rate of 800
Mb/s. Due to low transfer rate, there is transmission offset between camera and
LiDAR. This approximate transmission offset is calculated by (= size of image
transferred/transfer rate) similar to [15] and subtracted from camera time stamp
to reduce timing jitter. Moreover, camera measurements are taken as reference
timestamps and LiDAR is synchronized with cameras based on closely related
measurements. Similarly, GPS timestamps are synchronized with camera meas-
urement based on closely related time stamp. The provided GPS ground truth is
synchronized with camera measurement as shown in Fig. 5.

13
8258 Z. Javed, G.-W. Kim

Fig. 5  Sensor synchronization based on closely related timestamp

4 Baseline SLAM experiment and ground truth

4.1 Ground truth

The motivation of proposed datasets came from recent interest in visual SLAM
algorithm based on large field of view with 360 coverage and lack of proper pan-
oramic datasets existence with 3D LiDAR information. Therefore, we rely on the
RTK GPS measurement for each of contained sequence as ground truth. The GPS
provides 2d pose and yaw information that is used to evaluate the pose estima-
tion accuracy. The localization and orientation accuracy can be calculated from
the aligned GPS and vSLAM trajectories. The trajectories can be aligned based
on Horn et al. [33] GNSS points which provide a 7-DoF (orientation, translation
and scale) transform. The remaining GPS points can be used to calculate the root
means square error (RMSE) of absolute trajectory error (ATE) Sturm et al. [28]

√ n
√1 ∑
ATE(Pi , Qi ) = √ ‖trans(e ))‖2 (5)
n i=1 ‖ i ‖

where Pi , Qi are the ground truth and estimated trajectories, respectively, and
ei = P−1 SQi the absolute trajectory error at time stamp i, and S denotes the rigid-
body transformation. The term trans trnas(ei ) refers to the translation component of
pose. In case of some sequences of proposed dataset, the quality of GPS is accurate.
The trajectory plot of one sample sequence is shown in Fig. 6. However, some parts
of sequences, there is a portion which is covered by long trees and large building
which cause slight instability of GPS satellite tracking as shown in Fig. 7.

4.2 Dataset experimental evaluation

As this dataset main target application is visual and Visual-LiDAR odometry and
visual SLAM therefore, we configure panoramic SLAM [3] to our sensor setup to

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8259

Fig. 6  The RTK GPS measurement for sequence 3 a The plot of latitude and longitude of GPS measure-
ments. b The GPS measurement on top of aerial imagery. The image is generated by GPS visualizer

generate trajectories for our recorded sequences to validate the dataset. The pano-
ramic SLAM and ORBSLAM [34] is used because many open source algorithms
do not support this type of camera model and it is hard to integrate the other
approaches. Similarly, in the literature, no combined omnidirectional visual LiDAR
approach exists that can be tested. The panoramic SLAM is feature based simultane-
ous localization and mapping system for panoramic images. The panoramic vSLAM
system is based on geometry of both panoramic and fish-eye calibration models. The
optimization process of SLAM is also based on bundle adjustment designed espe-
cially for multi-camera rig.
The other modules of the system are similar to typical visual SLAM systems such
as initialization, tracking, key frame selection, local mapping, loop closure detection
and global optimization. For each sequences, the number of features for each fish-
eye image are set to 3000, while 10,000 number of features are extracted in case of
panoramic image. The fisheye and panoramic images of sequence 1 are provided to
the SLAM system at 5fps and resultant trajectory generated by SLAM for is shown
in Fig. 9. The figure shows robot trajectory for sequence 1 of the proposed dataset.
The sequence 1 is recorded in outdoor parking environment to cover the featureless
parking scene. The panoramic SLAM is able to recover whole trajectory without
re-localization.
In order to evaluate the dataset both quantitatively and qualitatively, the series of
experimentation are performed on sample two sequences. Figure 10 shows the quali-
tatively results of ORBSLAM for both sequences. ORBSLAM unable to recover the
whole trajectory for both sequences, it often fails in featureless region. However, the
panoramic SLAM is able to recover complete trajectory for both sequences as shown
in Fig. 11. The output trajectory is plotted against the GPS provided ground truth for
both methods and sequences. To quantitatively measure the results, the absolute tra-
jectory error (translation) is calculated between estimated trajectory and ground truth.
The detailed results are shown in Table 2. In both cases, the large field of view has
more advantages then the single camera setup such as ORBSLAM. The more structural

13
8260 Z. Javed, G.-W. Kim

Fig. 7  Sequence 5 a The earth map image shown with the region where less number of satellite tracked
due to urban nature of environment (A–B). b The part of sequence 5 image covered by a large building. c
The part of a trajectory which is covered by long and dense trees

information from the scene help to improve the accuracy of SLAM as well as increase
the capabilities of SLAM for long-term navigation. The panoramic SLAM can suc-
cessfully track all the sequence with considerable accuracy, while ORBSLAM often
fails in case of challenging conditions such as sharp turns. Based on experiments, it
can be inferred that dataset can be used to develop novel multi-camera SLAM system.
However, this dataset is only focused in the urban environment to just focus featureless
region (due to complexity of environment for SLAM). So limited dynamic objects are
present in the dataset. Similarly, the recorded sequences does not contain the weather or
seasonal change.

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8261

Fig. 8  The details of recording environment. a, c 4 images representing different environment. (1) and
(2) represent parking region and urban nature of environment, respectively, while (3) displays the main
road region with pedestrian and (4) shows the semi-off road region. (b) Shows the overall area of univer-
sity, where dataset has been taken. d sequence 5 trajectory overlaid on google earth map

Fig. 9  Sequence 1 a The robot trajectory obtained from [3] for sequence 1

13
8262 Z. Javed, G.-W. Kim

Fig. 10  ORBSLAM-mono output trajectories. a The output trajectory of sequence 1. b The output trajec-
tory of part of sequence 2

Fig. 11  Output Trajectory of PanoSLAM with ground truth. a PanoSLAM on sequence 1 b PanoSLAM
on part of sequence 2

Table 2  ATE translation in Sequences Methods ATE (m)


meters
RMSE Min S.D Max

Seq 1 ORBSLAM 2.2734 0.0912 1.8584 4.5832


PanoSLAM 0.4458 0.0037 0.2339 0.9676
Seq 2 ORBSLAM 6.0184 0.0829 4.5511 8.8717
PanoSLAM 4.3946 0.0012 1.5285 0.8747

5 Data collection and organization

The data were recorded around the Chungbuk National university, South Korea. The
process of dataset collection may be outlined as:

• The sensor setup camera, LiDAR, IMU and GPS are installed on the top of
vehicle. Each sensor is installed on their fix position. The LiDAR and IMU are

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8263

embedded on top of camera to simplify the task of calibration and transforma-


tion. The two antenna setup RTK GPS is used to record the ground truth.
• Once the sensor installation is finished, the manual transformation is obtained
between GPS and other sensor, while camera and LiDAR are calibrated with the
help of open source software.
• The ROS based recorded software is developed for all the sensor setup. The
images are recorded in png format, while LiDAR data are recorded in .pcd for-
mat.

The overall area of recording environment, corresponding sample images and the
trajectory path of sequence 5 is shown in Fig. 8. There are total 8 sequences, cover-
ing upto 10km distance driving at campus with 20–30 km/h speed from autonomous
vehicle. The dataset was collected by covering various small and large scale loops
which can be useful for vSLAM. All the data from each sensor are organized into
a folder directory as shown in Fig. 12. The main directory contains the sub-folders
which includes measurement for each sensor such as GPS, IMU, Camera, LiDAR.
The detail of each subfolder, the file formats and data types is explained below.

5.1 Images

All of the RGB images captured from Ladybug3, whether the individual fisheye
images for each camera or panoramic images have been placed on this folder. The
images have been stored in a lossless PNG file format. Due to the large size of the
dataset, the rectified fish-eye images are not included in the dataset, MATLAB script
is provided to undistort and rectify fisheye images. The main folder images contain
the subfolder [Img0, Img1, Img2, Img3, Img4] and [timestamp.txt] file. Each of the

Fig. 12  Layout directory of dataset a The main directory contains six sub-folders with all measurements

13
8264 Z. Javed, G.-W. Kim

[Img0 − 4] folder contains raw PNG files, while [timestamp.txt] contains the time
stamp for each image of each camera.

5.2 GPS

This file contains the GPS data of the all sequences. All sequences include the GPS
files and data structure contains three fields. The raw measurements of GPS are
stored in the [gps.txt] file. The bag file of the GPS measurement is also included
the folder. The text file gps.txt contains the all GPS data including [timestamp, lati-
tude, longitude, altitude, noofsatellite, covarianceetc.], where the coordinates of lati-
tude, longitude and altitude are stored in degrees for file. Furthermore, the yaw only
orientation is provided in text file name as [gpsq uat].

5.3 IMU

The ouster os1-64 IMU unit with a 3-axis gyroscope. 3- axis accelerom-
eter and 3-axis compass is used to record measurements. This file con-
tains only the measurement from inertial measurement unit (IMU) in the
[imu.txt] file. The organization of IMU measurements are as follows: [Times-
tamps, rotX, rotY, rotZ, velX, velY, velZ, accX, accY, accZ], where the
[velX, velY, velZ] are the angular velocities of the robot about x, y and z-axes, while
[accX, accY, accZ] are the linear accelerations along the x, y and z-axes.

5.4 3D LiDAR

Ouster os1-64 is installed on the vehicle which provides the 3D information along
with reflectivity. The 3D LiDAR information is recorded in point cloud data (PCD)
format. The folder contains three entries, point cloud data, time stamps and syn-
chronized time stamp information. The [Timestamp.txt] file contains the absolute
time of LiDAR measurement provided by the ROS at receiving time, while the syn-
chronized time stamps with camera are included within [Timestampsynch.txt]. The
[3D-Point cloud] folder has 3D LiDAR point cloud data in .pcd format.

5.5 Spherical panoramic images

The folder contains spherical stitched panoramic images in PNG file format. The
panoramic images are numbered the same as with individual fish-eye images and
also the timestamps are the same with the camera time stamp.

5.6 Calibration parameters

The calibration parameters for the dataset, visualization and rectification codes and
masks for each camera are included in this folder. The calibration folder contains
the camera calibration and camera to LiDAR transformation files in subfolders. The

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8265

camera calibration folder includes [IntPara.txt], [Extpara.txt] and [IntInvPara.txt].


Moreover, this folder also includes the intrinsic matrix text file name as [intMat.txt]
for each of the cameras, while the camera to LiDAR folder contains [rt.txt]. The
detail of each text file is given below. Similarly, IMU to LiDAR and IMU to body
frame (panoramic camera head) transformation is also included in this folder. Sim-
ilarly, the transformation between camera and LiDAR (both LiDAR to individual
camera and LiDAR to camera head) is included in [rt.txt]. Furthermore, IMU to
LiDAR and IMU to body coordinates (Ladybug head) files are also included in this
folder.

5.7 Conclusion

We have presented a time-registered vision and LiDAR dataset of unstructured


urban environments. The dataset is recorded from the outdoor environment and it
consist of 8 sequences. The vehicle is equipped with omnidirectional camera, 3D
LiDAR, IMU and RTK GPS . The dataset contains multi-fisheye and stitched spher-
ical panoramic images at high resolution. The each individual camera and pano-
ramic image is calibrated with the 3D LiDAR. The dataset contains small and large
scale loops in order to evaluate the visual SLAM and place recognition. This dataset
may be highly useful to the robotics community, especially for autonomous naviga-
tion of ground vehicles. The dataset can be used as a benchmark for various state
of the art computer vision algorithms like visual multi-fisheye SLAM, panoramic
SLAM, Visual-LiDAR SLAM, iterative closest point (ICP), and 3D object detection
and recognition.

6 Supplementary information

The supplementary material for this submission is available for download at the
www.​irl-​cbnu.​com/​datas​ets.The material includes the video collection for dataset,
MATLAB and Python script for data extraction, visualization and processing.

Acknowledgements This research was financially supported in part by the Ministry of Trade, Industry
and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT) through the Interna-
tional Cooperative R&D program. (Project No. P0004631), and in part by the MSIT (Ministry of Sci-
ence and ICT), Korea, under the Grand Information Technology Research Center support program (IITP-
2021-2020-0-01462) supervised by the IITP (Institute for Information & communications Technology
Planning & Evaluation).

References
1. Yu DA (2019) Grid based spherical CNN for object detection from panoramic images. Sensors
19:2622
2. Wang D, He Y, Liu Y, Li D, Wu S, Qin Y, Xu Z (2019) 3D object detection algorithm for panoramic
images with multi-scale convolutional neural network. IEEE Access 7:171461–171470
3. Ji S, Qin Z, Shan J, Lu M (2020) Panoramic SLAM from a multiple fisheye camera rig. ISPRS J
Photogram Remote Sens 159:169–183

13
8266 Z. Javed, G.-W. Kim

4. Yang Y, Tang D, Wang D, Song W, Wang J, Fu M (2020) Multi-camera visual SLAM for off-road
navigation. Robot Auton Syst 128:103505
5. Won C, Seok H, Cui Z, Pollefeys M, Lim J (2020) “Omnislam: omnidirectional localization and
dense mapping for wide-baseline multi-camera systems. In: 2020 IEEE International Conference on
Robotics and Automation (ICRA), pp 559–566
6. Wang Y, Cai S, Li SJ, Liu Y, Guo Y, Li T (2018) Cubemapslam: a piecewise-pinhole monocular
fisheye slam system. In: Asian Conference on Computer Vision, pp 34–49
7. Urban S, Hinz S (2016) Multicol-slam-a modular real-time multi-camera slam system. arXiv pre-
print arXiv:​1610.​07336
8. Caruso D, Engel J, Cremers D (2015) Large-scale direct slam for omnidirectional cameras. In 2015
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 141–148
9. Liu P, Heng L, Sattler T, Geiger A, Pollefeys M (2017) Direct visual odometry for a Fisheye-Stereo
camera. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp
1746–1752
10. Seok H, Lim J (2019) Rovo: robust omnidirectional visual odometry for wide-baseline wide-
FOV camera systems. In: 2019 International Conference on Robotics and Automation (ICRA), pp
6344–6350
11. Jaramillo C, Yang L, Munoz JP, Taguchi Y, Xiao J (2019) Visual odometry with a single-camera
stereo omnidirectional system. Mach Vis Appl 30:1145–1155
12. Matsuki H, von Stumberg L, Usenko V, Stückler J, Cremers D (2018) Omnidirectional DSO: direct
sparse odometry with fisheye cameras. IEEE Robot Autom Lett 3:3693–3700
13. Ramezani M, Khoshelham K, Fraser C (2018) Pose estimation by omnidirectional visual-inertial
odometry. Robot Auton Syst 105:26–37
14. Seok H, Lim J (2020) ROVINS: robust omnidirectional visual inertial navigation system. IEEE
Robot Autom Lett 5:6225–6232
15. Pandey G, McBride JR, Eustice RM (2011) Ford campus vision and lidar data set. Int J Robot Res
30:1543–1552
16. Carlevaris-Bianco N, Ushani AK, Eustice RM (2016) University of Michigan North Campus long-
term vision and lidar dataset. Int J Robot Res 35:1023–1035
17. Benseddik HE, Morbidi F, Caron G, Felsberg M, Nielsen L, Mester R (2020) PanoraMIS: an
ultra-wide field of view image dataset for vision-based robot-motion estimation. Int J Robot Res
39:1037–1051
18. Koschorrek P, Piccini T, Oberg P, Felsberg M, Nielsen L, Mester R, Nielsen L, Mester R (2013) A
multi-sensor traffic scene dataset with omnidirectional video. In: Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition Workshops, Vol 727–734
19. Smith M, Baldwin I, Churchill W, Paul R, Newman P (2009) The new college vision and laser data
set. Int J Robot Res 28:595–599
20. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot
Res 32:1231–1237
21. Fallon M, Johannsson H, Kaess M, Leonard JJ (2013) The mit stata center dataset. Int J Robot Res
32:1695–1699
22. Nordlandsbanen: minute by minute, season by season. Norwegian Broadcasting Corporation (2013)
23. Ceriani S, Fontana G, Giusti A, Marzorati D, Matteucci M, Migliore D, Rizzi D, Domenico GS,
Taddei P (2009) Rawseeds ground truth collection systems for indoor self-localization and mapping.
Autono Robots 27:353–371
24. Badino H, Huber D, Kanade T (2011) The CMU visual localization data set
25. Urban S, Jutzi B (2017) LaFiDa-a laserscanner multi-fisheye camera dataset. J Imaging 3:5
26. Li Y, Tong G, Gao H, Wang Y, Zhang L, Chen H (2019) Pano-RSOD: a dataset and benchmark for
panoramic road scene object detection. Electronics 8:329
27. Zhang Z, Rebecq H, Forster C, Scaramuzza D (2016) Benefit of large field-of-view cameras for
visual odometry. In: 2016 IEEE international Conference on Robotics and Automation (ICRA), pp
801–808
28. Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation
of RGB-D SLAM systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and
Systems, pp 573–580
29. Schubert D, Goll T, Demmel N, Usenko V, Stückler J, Cremers D (2018) The TUM VI benchmark
for evaluating visual-inertial odometry. In: 2018 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp 1680–1687

13
PanoVILD: a challenging panoramic vision, inertial and LiDAR… 8267

30. Maddern W, Pascoe G, Linegar C, Newman P (2017) 1 year, 1000 km: the Oxford RobotCar data-
set. Int J Robot Res 36:3–15
31. Cheeseman P, Smith R, Self M (1987) A stochastic map for uncertain spatial relationships. In: 4th
International symposium on robotic research, pp 467–474
32. Pandey G, McBride JR, Savarese S, Eustice RM (2015) Automatic extrinsic calibration of vision
and lidar by maximizing mutual information. J Field Robot 32:696–722
33. Horn BK, Hilden HM, Negahdaripour S (1988) Horn, Berthold KP and Hilden, Hugh M and
Negahdaripour, Shahriar. JOSA A 5:1127–1135
34. Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular
SLAM system. IEEE Trans Robot 31:1147–1163

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

13

You might also like