Smart Video Capsule Endoscopy: Raw Image-Based Localization for Enhanced GI Tract Investigation

by Oliver Bause*, Julia Werner*, Paul Palomero Bernardo, and Oliver Bringmann (*Equal Contribution)
In 2025 32nd International Conference on Neural Information Processing (ICONIP) - Best Paper Runner-Up Award, 2025.

Abstract

For many real-world applications involving low-power sensor edge devices deep neural networks used for image classification might not be suitable. This is due to their typically large model size and requirement of operations often exceeding the capabilities of such resource limited devices. Furthermore, camera sensors usually capture images with a Bayer color filter applied, which are subsequently converted to RGB images that are commonly used for neural network training. However, on resource-constrained devices, such conversions demands their share of energy and optimally should be skipped if possible. This work ad- dresses the need for hardware-suitable AI targeting sensor edge devices by means of the Video Capsule Endoscopy, an important medical procedure for the investigation of the small intestine, which is strongly limited by its battery lifetime. Accurate organ classification is performed with a final accuracy of 93.06% evaluated directly on Bayer images involving a CNN with only 63,000 parameters and time-series analysis in the form of Viterbi decoding. Finally, the process of capturing images with a camera and raw image processing is demonstrated with a customized PULPissimo System-on-Chip with a RISC-V core and an ultra-low power hardware accelerator providing an energy-efficient AI-based image clas- sification approach requiring just 5.31 μJ per image. As a result, it is possible to save an average of 89.9% of energy before entering the small intestine compared to classic video capsules.