Frappe: fast fiducial detection on low cost hardware (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11554-023-01373-w.pdf

Frappe: fast fiducial detection on low cost hardware

Journal of Real-Time Image Processing (2023) 20:119 https://doi.org/10.1007/s11554-023-01373-w RESEARCH Frappe: fast fiducial detection on low cost hardware Simon Jones1 · Sabine Hauert1 Received: 26 July 2023 / Accepted: 27 September 2023 © The Author(s) 2023 Abstract Square fiducial markers are widely used in robotics to easily obtain pose and other information about the world from camera images. Processing the images to extract the markers is usually performed centrally with standard libraries but the code is typically aimed at PC-level hardware. Platforms with constrained processing power have difficulty handling multiple camera streams at real-time refresh rates. We introduce the Frappe (Fiducial Recognition Accelerated with Parallel Processing Elements) algorithm for detecting and decoding the popular ArUco tags. Designed to be implemented on the low cost hardware of the Raspberry Pi Zero, we show tag detection and decoding on images of 640 × 480 resolution exceeding 60 Hz, five times faster than the standard ArUco library, while maintaining similar detection performance and using much less energy. Using Frappe, we demonstrate improved real-world performance on a visual navigation task with our DOTS robot. Keywords Image processing · Fiducial tags · Robot vision · Embedded processing · GPU acceleration 1 Introduction Scaling up robot numbers in real-world environments requires both lowering the cost of robots, and improving their ability to perceive and interact with the world. One approach uses cheap vision hardware and augments the environment with markers. Square fiducial markers consisting of a grid with a binary pattern are widely used in robotics vision systems as a way of providing pose and navigation information from a camera image feed without the complexity and processing cost of full image comprehension techniques such as Visual SLAM. The popular ArUco library is widely used, but the processing cost is still significant in resource constrained robot systems, limiting the resolution and update rate that is possible, hindering the performance of real-time robot navigation. The Raspberry Pi series of educational Single Board Computers (SBCs) has enabled many projects needing a small, cheap computer running Linux. Well supported, they have a camera interface supporting several models of * Simon Jones Sabine Hauert 1 Department of Engineering Mathematics, University of Bristol, Bristol, UK camera. A Raspberry Pi Zero and OV5241 camera module can be purchased for around £16, providing 1080p60 streaming video. What is not widely utilised is the surprisingly capable Graphics Processing Unit (GPU) that all Pi models have, with around 24 GFLOPs processing power. We design an image processing algorithm, called Frappe, Fiducial Recognition Accelerated with Parallel Processing Elements, to use the Raspberry Pi (RPi) Zero GPU for as much processing as possible. As proof-of-concept, we implement Frappe on our swarm of DOTS [1] robots designed for intralogistics applications. By re-engineering the visual navigation system of the DOTS, enabling higher detection frame-rates and resolutions than were previously possible, we enhance performance at a visual navigation task. We make available an implementation of the algorithm and a complete Docker-based development environment1. This brings together the required specialised toolchains and provides a virtual environment for compiling GPU applications targeting the Raspberry Pi Zero. We provide this framework for others to make use of this underutilised processing power for visual processing and other edge processing applications. This paper is organised as follows; Section 2 covers background and related material, Sect. 3 details the algorithm and its implementation, Sect. 4 compares the performance 1 https://bitbucket.org/simonj23/frappe/src/master/. 13 Vol.:(0123456789) 119 Page 2 of 13 Fig. 1 Example of an ArUco fiducial marker from the ARUCO_MIP_36h12 dictionary, showing the full 8x8 region, with the outer cells always black, and the inner 6x6 36 bit data payload Journal of Real-Time Image Processing (2023) 20:119 8x8 fiducial region Fig. 2 Raspberry Pi Zero with attached camera, costing around £16 and capable of streaming up to 1080p60 video 6x6 data payload of Frappe and ArUco on Raspberry Pi Zero hardware, before using Frappe in a larger robot system for enhanced performance, and Sect. 5 concludes the paper. 2 Background Fiducial markers or tags are visually distinct objects placed in the environment to convey information or position or both. In robotics, what is often desired is to extract pose and position from a camera feed, in this case the fiducial must convey both accurate position and unique identity. The most common form is a monochrome square region with an internal bit pattern, an early system was ARToolKit [2], widely used examples include AprilTag [3, 4], ARTag [5], and ArUco, with [6] showing generation of dictionaries with near-optimal intermarker distance, and [7] accelerating detection. Circular forms are also common, such as InterSense [8], STag [9], and CCTag [10]. CCTag is also designed to be resistant to occlusion and motion blur. For widely used square tags such as ArUco, AprilTag, and ARTag, there has been work on blur resistant decoders with conventional [11] and machine learning approaches [12, 13]. See [14] for a recent review and examination of the comparative detection performance and resilience of some different tag systems. Although not directly comparable with our results, they show detection rates of 95% for ArUco in their test data. They don’t directly report processing time, but do say that 640x480 detection at 20 Hz on a Raspberry Pi 3 was possible for ARTag and ArUco, but AprilTag was too computationally intensive. Regarding the speed of various detectors, [4] report AprilTag2 at 78 ms for a 640x480 image on an Intel Xeon E5-2640, [7] report ArUco at 0.9 ms for 640x480 on an Intel Core i7-4700HQ. This work specifically addresses accelerating ArUco tag detection on low cost hardware, due to our existing systems and software using this tag. Figure 1 shows an example ArUco fiducial from the standard dictionary ARUCO_MIP_36h12, generated as described in [6]. It 13 Fig. 3 DOTS robot, fast moving and low cost with 360◦ vision, enabling research into swarm intralogistics shows the 8x8 region of a marker, consisting of an outer perimeter of always black cells, with an inner 6x6 region containing the data payload. Each of the 250 unique symbols in the dictionary have a minimum Hamming distance of 12 from all other symbols, meaning that up to 6 erroneous bits out of the 36 can be corrected (Fig. 2). Our DOTS swarm robots [1], shown in Fig. 3, are designed to enable research into swarm intralogistics. They are low cost, capable of fast agile movement, able to carry loads, and have a ROS2-based control system running on RockPi 4 SBC. 250 mm in diameter, they are (...truncated)