SEMANTIC SCENE UNDERSTANDING FOR THE AUTONOMOUS PLATFORM
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020
XXIV ISPRS Congress (2020 edition)
SEMANTIC SCENE UNDERSTANDING FOR THE AUTONOMOUS PLATFORM
B. Vishnyakov *, Y. Blokhinov, I. Sgibnev , V. Sheverdin, A. Sorokin, A. Nikanorov, P. Masalov, K. Kazakhmedov, S. Brianskiy, Е.
Andrienko, Y. Vizilter
FGUP «State Research Institute of Aviation Systems», Russia, 125319, Moscow, Viktorenko street, 7 - (vishnyakov, yuri.blokhinov,
sgibnev, sheverdin, ans, avnikanorov, masalov, kkirill, sbrianskiy, viz)@gosniias.ru
KEY WORDS: multi-sensor platform, autonomous vehicle, SLAM, CNN, dynamic scene analysis, semantic segmentation, off-road,
autonomous driving, camera calibration, LiDAR calibration.
ABSTRACT:
In this paper we describe a new multi-sensor platform for data collection and algorithm testing. We propose a couple of methods for
solution of semantic scene understanding problem for land autonomous vehicles. We describe our approaches for automatic camera
and LiDAR calibration; three-dimensional scene reconstruction and odometry calculation; semantic segmentation that provides
obstacle recognition and underlying surface classification; object detection; point cloud segmentation. Also, we describe our virtual
simulation complex based on Unreal Engine, that can be used for both data collection and algorithm testing. We collected a large
database of field and virtual data: more than 1,000,000 real images with corresponding LiDAR data and more than 3,500,000 simulated
images with corresponding LiDAR data. All proposed methods were implemented and tested on our autonomous platform; accuracy
estimates were obtained on the collected database.
1. INTRODUCTION
The autonomous car market is currently growing at an existential
rate and many companies develop their own concepts of
driverless vehicles. A self-driving car, also called an autonomous
vehicle, is a vehicle that uses a combination of sensors, cameras,
radars and artificial intelligence, to travel between destinations
without the need of any human effort.
Scientific community publishes huge number of papers on the
topics of object detection, scene segmentation, 3D-reconstruction
using cameras and LiDARs, radars. These algorithms
combination allows us to develop high level algorithms of
autonomous driving. However, most of the driving algorithms
are based on the vector map of the roads. So, autonomous driving
in off-road conditions, in the countryside is still a challenging
problem. The solution requires robust algorithms of semantic
segmentation, three-dimensional scene reconstruction, object
detection. All these algorithms work much better in the cities than
in the countryside.
In this paper we describe our multi-sensor off-road platform for
data collection and algorithm testing. We propose a new, fully
automatic technique for mutual calibration of machine vision
cameras and LiDARs, discuss algorithms for real-time semantic
3D-scene reconstruction.
collect video and three-dimensional data and try out algorithms
for three-dimensional reconstruction, semantic segmentation and
obstacle classification. This vision system is mounted on a metal
frame support that allows one to change the distance between the
cameras and quickly install or remove other sensors if necessary.
In addition, two AXIS M5525 PTZ cameras for object detection
are places on the platform. This computer vision subsystem is
designed to collect data and try out algorithms for object
detection and recognition, semantic segmentation, threedimensional scene reconstruction.
The sensors location on the platform is shown in Figure 1.
Figure 1. Ten short focus cameras (purple), two long focus
cameras (green), four LiDARs (grey circles), two PTZ cameras
(orange), four SWIR cameras (light red)
2. AUTONOMOUS PLATFORM
Autonomous platform is a relatively large vehicle with
dimensions close to real cars (1.8m wide, 4.4m long).
2.1 Sensors
The core of the autonomous platform is a computer vision
hardware complex, which consists of ten short focus (5mm lens)
and two long focus (25mm lens) Prosilica GT2050C machine
vision cameras, four SWIR cameras Goldeye G-032 SWIR
TEC1, four Velodyne VLP-16 LiDARs. This system allows us to
*
Hardware processing part consist of seven computing units –
Vecow RCS-9430FHR-RTX2080-256 industrial computer based
on Intel Core i7-7700 processor, Nvidia GeForce RTX 2080
graphics card and a special four-channel gigabit network card
with power over the network (PoE) function – PE-1004. All
machine vision cameras are connected to the PE-1004 board
since each camera generates a data stream of approximately 1
Gbit per second. Other devices are connected to a gigabit switch
and are in the same local area network. Also, a Delphi ESR-2.5
radar, GPS-receiver and xsens inertial system can be mounted on
the vehicle.
Corresponding author
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-637-2020 | © Authors 2020. CC BY 4.0 License.
637
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020
XXIV ISPRS Congress (2020 edition)
We use UPS APC Smart-UPS SRT 1000VA / 900W
uninterruptible power supply with five 1500VA batteries, which
allows the computing unit and the set of sensors of the vision
system to run continuously up to 8 hours.
2.2 Software
We developed special software using ROS2 platform on
Ubuntu 18 basis, which allows us to synchronously record data
from all sensors in the system, including cameras, LiDARs,
radar, GPS-receiver and inertial system, to a specialized storage
called rosbag. Data streams from cameras and LiDARs are
synchronized at a hardware level with synchronization cables and
over PTP/PPS protocols.
An optional remote Wi-Fi connection of the operator to the
computing unit is also optionally provided for the purpose of
monitoring data collection processes or testing computer vision
algorithms.
3. VIRTUAL SIMULATION
A lot of scientific labs and groups of engineers use virtual
simulation as a most affordable way to generate extra data for
training of neural networks. We also use virtual simulation to get
image and LiDAR data in different conditions.
We chose Unreal Engine, a game engine developed and
supported by Epic Games, as a basic simulation tool. A game
engine (not a professional one, for example, Vega Prime) was
chosen due to the fact that the game engines currently provide the
most realistic scene visualization. Since 1998 (when the first
version of the Unreal engine was released), various versions of
the engine have been used in more than a hundred games and a
thousand of other projects, including scientific projects and
virtual simulation tools.
Figure 3. Second virtual scene sample
3.2 LiDAR modelling
When modeling VLP-16 LiDAR, single measurement 16 rays are
emitted at different polar angles from the point where the LiDAR
is mounted, a (...truncated)