Visual Inertial System (VIS) Technology
What is it? How was it developed? What’s coming?
Some years ago, during our technology screening process at the Hexagon Technology Centre, we identified the development need for Visual Simultaneous Localization and Mapping (SLAM) technology for the Hexagon group. It was clear from the beginning that this technology could be applied to determine the position of devices in 3D-space and would be highly beneficial for future innovations. We started with the first implementation of a generic Visual SLAM algorithm and investigated the potential of this technology for various applications relevant to our business areas.
What is Visual SLAM?
In principle, Visual SLAM is nothing more than a repetitive application of resection and forward-intersection with an optional bundle-adjustment at the very end. These basic algorithms are well known from photogrammetry for more than a century. However, the developments in feature tracking and feature matching triggered by computer vision and robotics – that made the manual selection of tie points obsolete – enabled the implementation of an automated workflow.
Navigation through 3D-space based on Visual SLAM is quite intuitive. It works quite similar to how us humans navigate through the world. When we walk towards an object, e.g. a building, the object gets larger in our field of view. When we walk backwards, it gets smaller.
In Visual SLAM the feature tracking is detecting point features – or so-called landmarks – in the image stream and tracks their position from one image frame to the next. Now, when the camera moves towards a building, the detected features, e.g. the corners of the building, or the door or the windows, will then move from the image centre outwards since the building is getting larger in the field of view. When the camera is rotating from left to right the features on the image would then move to the right and to the left. Hence, from the movement of the feature points between the frames of the image stream, Visual SLAM can deduce the direction of motion of the camera in 3D-space.
In a continuous process, the SLAM algorithm computes the 3D-coordinates (mapping) of the tracked features from two or more positions and uses these coordinates for the determination of the following position (localisation). The generated map of landmarks is evolving as the operator moves along this track to the next scanner setup and acts as a reference for the whole positioning algorithm. Consequently, this map that is built up and maintained during the process is essential to keep the drift error small.
There are alternative approaches based on inertial measurements only. An Inertial Measurement Unit (IMU) provides measurements for accelerations and angular rates that are integrated to derive velocity in a first step and finally position and orientation of the device. Since these measurements are affected by measurement errors, the integration leads to a significant drift of the derived quantities. Since in this approach a map as an overall reference is missing, the resulting drift is significantly higher than in Visual SLAM. Although, an optimal result cannot be achieved based on inertial measurements only, the data from the IMU can be fused with the image measurements and support Visual SLAM.
How does Visual SLAM redefine the registration process?
The typical registration of laser scans, i.e. the combination of the individual scans to one combined point cloud, is performed after the complete data acquisition in the office using post-processing software. When in the office, it is usually the first time when an operator is able to investigate the result of the scanning project and check the completeness of the data acquisition. It is possible that the operator identifies some missing areas and they may have to go back to the site and perform some additional scans. Sometimes, in cases where the project location is far from the office, it can be associated with a drive of quite some time and, therefore, something customers want to avoid at all costs.
In-field pre-registration of several laser scans was the new feature that Juergen Mayer, business director for terrestrial laser scanning at that time had in mind, which would solve costly rework completely. After each scan, the intention was for acquired data in the field to be automatically registered with the previous scan data acquired. The resulting combined point cloud should then be visualised on a mobile tablet computer. This would allow the customer to immediately investigate what data is captured and what data could be missing, to optimally plan the next scanner setup and, above all, to perform a completeness check when still on site.
This was discovered during our feasibility study and finally we could show that the concept would work in practice as well. Based on these findings, together with our colleagues from the business unit, the development of the product started and ended with the announcement of the Leica RTC360 at HxGN LIVE 2018.
Could Visual SLAM be the basis for this new field registration feature?
Theoretically it was obvious that based on Visual SLAM, the motion between two scanning setups should be determined. Knowing the translation and rotation, the point cloud resulting from the current scan can automatically be aligned with the point cloud from the previous scan. The goal was to then prove that the concept would work in practice.
The Visual Inertial System (VIS) technology should not add any constraints on the laser scanning workflow as it was carried out at that time. This means, that the operator should not be affected by some additional rules like carrying the laser scanner in a specific manner. This was one of the main requirements when the feasibility study started. Even if one side of the RTC360 is obstructed, e.g. by the body of the operator, the VIS technology should still work. Fundamentally, this is the main reason why there are five VIS cameras built into the RTC360. Moreover, the processing of the automatic pre-registration should be carried out in real-time in such a way that the operator gets the result presented immediately after the next scan was performed. Given this, our Visual SLAM algorithm had to prove that it could be nicely integrated into the workflow of terrestrial laser scanning.
In the meantime at the Hexagon Technology Centre, quite some other applications for Visual SLAM have been investigated. Although the basic principle is the same it always needs some fine tuning and adaptations to a specific setting and workflow in order to achieve the best results. A highlight in this context is the development of the Leica BLK2GO handheld imaging laser scanner, where Visual SLAM is combined with LiDAR SLAM taking the algorithm to the next level. This is just one example out of many where this technology has proven its applicability for positioning workflows. Some of those have been released as a product already and many more can be expected in the future, enriching the entire product portfolio of Hexagon Geosystems.
Head of Image & Point Cloud Processing
Hexagon Technology Centre