NAVER LABS' indoor dataset is the result of scanning COEX, one of the largest shopping malls in Korea, twice at an interval of about two months (Jun. 2018 and Sep. 2018). This dataset consists of 17.5K geo-localized images with 578 points of interest (POIs) captured by a device called Pumpkin that has two LiDARs and multiple cameras. We currently only provide images taken by Pumpkin’s left and right side cameras, which are designed to capture storefront images that can be used for POI recognition and change detection tasks. In the near future, we will be releasing other images taken by other camera types as well, so this dataset will also allow use for VSLAM and visual localization research.
Scanning device: Pumpkin
Pumpkin is equipped with the following main sensors:
- 6 x Sony RX0 (2 with Wide Angle Lens: Samyang Fisheye Lens), 2400x1600, 2Hz, Anti-Distortion Shutter — 1/32000 super-high-speed shutter, ZEISS Tessar T* Lens, 84° FoV (Samyang Fisheye Lens: 106° HFoV, 70° VFoV)
- 1 x Velodyne Puck 16-channels Lidar, 360° HFoV, 30° VFoV, 4 planes, 10 Hz, 100m range, 0.1~0.4° Vertical resolution, 2.0° Horizontal resolution,
This dataset consists of images and their poses. The name of each image includes the serial number and timestamp as '[serial #]_[timestamp].jpg'. The poses where all images are acquired are in a separate file, 'sensor_trajectory.hdf'. In this file, 7-degrees-of-freedom (DoF) poses for all of the images are recorded. 7-DoF states are 'x, y, z' for position and 'qw, qx, qy, qz' for orientation, in serial order. In addition, each of the two tabs, pose and stamp, are paired, and the pose for the n-th stamp is the n-th in the pose tab.
If you are more familiar with '.json' than '.hdf', you can download the file to convert it.
How to generate data
- Data acquisition
All of the images of this dataset were acquired by Pumpkin. To collect as much data as possible, we acquired images periodically and without stopping instead of by stop-and-go motion. As referred above, because RX0 has an anti-distortion shutter, we assumed that there is no distortion by movement. All of the data including point clouds and images were recorded based on the same timeline under the UNIX timestamp of the main processor.
- Estimating image pose
For accurate pose estimation when each image was acquired, LiDAR-based SLAM was performed. However, since the acquisition from LiDAR and cameras didn't happen at the same time (i.e. asynchronized), linear interpolation based on timestamp gave the pose of Pumpkin when the image had been acquired. The pose of each image could be calculated from the relationship between the base of Pumpkin and each camera, and the pose was tagged for each image.
To publish the dataset, we blurred faces in images with our object detection model. The model was trained by the data from Naver Street View, which includes face annotation. We ran the model on our images to localize the faces, and applied a median filter to blur the objects. The remaining faces that the model failed to localize were handled manually.