Authors:
(1) Luyuan Peng, Acoustic Research Laboratory, National University of Singapore;
(2) Hari Vishnu, Acoustic Research Laboratory, National University of Singapore;
(3) Mandar Chitre, Acoustic Research Laboratory, National University of Singapore;
(4) Yuen Min Too, Acoustic Research Laboratory, National University of Singapore;
(5) Bharath Kalyan, Acoustic Research Laboratory, National University of Singapore;
(6) Rajat Mishra, Acoustic Research Laboratory, National University of Singapore.
IV Experiments, Acknowledgment, and References
Visual localization is a potential solution for the problem of localization in a known underwater environment for inspection. Inspection missions may often involve operations around marine structures, making acoustic navigation with beacons difficult due to shadowing and multipath [1]. As the vehicle has to operate close to structures, inertial navigation systems, which accumulate errors with time, may not provide sufficient positioning accuracy [1]. In comparison, visual localization using cameras may offer a cost-effective, consistent and accurate alternative in such missions. Previous work has shown that machine learning-based regression methods based on PoseNet [3], can effectively regress a 6-degree-offreedom (DOF) pose from a single 224×224 RGB image with approximately 6 cm position accuracy and 1.7°orientation accuracy when tested on simulated underwater datasets [2]. It was also shown that using a deeper neural network as the extractor may improve the model’s localization accuracy [2]. This work further investigates the effectiveness of such models on underwater datasets and also explores different techniques to further improve localization performance. We have three main contributions:
We explore the use of long-short-term memory (LSTM) [4] in the pose regression model to exploit spatial correlation of the image features and to achieve more structured dimensionality reduction [5].
We test the proposed models on underwater datasets collected from a 1.6 m × 1 m × 1 m water-filled tank using a remotely operated vehicle (ROV). The tank offers an environment where we can control lighting and turbidity. The models are able to achieve good accuracy in these datasets, with performance comparable to that obtained with the simulator dataset.
The base dataset consist of images taken from the first camera of a stereo camera mounted on the vehicle. Furthermore, we explore the performance improvement obtained by augmenting the data with additional images from the second camera. Fig. 2 shows some examples of underwater scenes in the tank dataset.