American Society of Mechanical Engineers

Times are displayed in (UTC+09:00) Osaka, Sapporo, Tokyo Change

Session: 05-04-01 CCUS and Underwater Development/ Utilization I

Submission Number: 177239

Underwater Object Detection for Autonomous Observation and Sampling Using General-Purpose Object Detection Techniques

Autonomous underwater vehicles (AUVs) are in demand for observing and sampling underwater objects and organisms. The targets vary depending on the purpose, such as rocks, ores, or organisms. The initial step in observing and sampling targets involves detecting them in underwater images and other sources captured by the vehicle. Various deep learning-based methods have been developed for this task in recent years, and their application to underwater images is currently being studied. Many of these studies rely on supervised learning, which requires training or fine-tuning a deep learning model to fit the target. However, due to limited underwater image and annotation data, supervised learning cannot perform well. Additionally, supervised learning cannot identify unseen objects and organisms, which are crucial for scientific research. In contrast, recent years have seen the proposal of salient object detection methods and open-vocabulary object detection methods that are not specialized in specific targets. Salient object detection segments the most prominent object in an image; Open-vocabulary object detection identifies and locates objects that are not limited to a predefined set of categories, leveraging pretrained vision-language models. This study studies object detection methods from these two categories for underwater images that are not limited to a fixed predefined set of categories.

This study evaluated the performance of various object detection methods across several open underwater image datasets. This study focused on detecting aquatic animals for performance evaluation because they were primarily included in publicly available datasets. As the salient object detection methods output segmentation masks, the outputs were converted to bounding boxes to match the evaluation data. In open-vocabulary object detections, instead of inputting specific targets like “fish” or “crab”, we input a broad category such as “aquatic animal.” This is because using a broad term allows them to detect various organisms, including unseen ones, that are not predefined.

As a result, these methods showed relatively high performance despite not being specialized models for underwater images or aquatic animals. Additionally, fine-tuning the model on underwater images further enhanced its performance.

Finally, we discussed the implementation of our vehicle. We selected and refined a method that effectively balances inference speed and accuracy on edge devices. Performance evaluation in our vehicle environment showed that object detection had adequate accuracy in real time.

The general-purpose object-detection method discussed in this study, which does not target specific objects or organisms, will lead to diverse autonomous observations and sampling by AUVs.

Presenting Author: Tatsuya Kaneko Japan Agency for Marine-Earth Science and Technology

Presenting Author Biography: Tatsuya Kaneko is a researcher at the Japan Agency for Marine-Earth Science and Technology. His research interests include hybrid modeling, drill string dynamics, and the autonomy of underwater vehicles.

Authors:

Tatsuya Kaneko Japan Agency for Marine-Earth Science and Technology
Hakan Bilen The University of Edinburgh
Tomoya Inoue Japan Agency for Marine-Earth Science and Technology
Hitoshi Kakami Japan Agency for Marine-Earth Science and Technology
Kazuya Iwashita Japan Agency for Marine-Earth Science and Technology

Underwater Object Detection for Autonomous Observation and Sampling Using General-Purpose Object Detection Techniques

Submission Type

Technical Paper Publication