Occlusion handling in spatio-temporal object-based image sequence matching

Jun 2024

Dynamic photogrammetry is an established method for acquiring 3D information of deforming objects or dynamic scenes in various close-range applications. A crucial impact has occlusions caused by object deformations, obstacles or camera movements. Temporal occlusions are highly application-specific and sometimes difficult to predict, resulting in a significant reduction of reconstruction quality or the aborting of image sequence processing. Previous approaches usually model such occlusions as semantic information and consider them using image masks. However, generating these image masks requires complex methods and extensive training data. Due to the unpredictability of the complexity and movements of dynamic scenes, generating training data is challenging in many applications. Therefore, this paper proposes an alternative modelling approach, which can be part of a spatio-temporal matching process. Based on the characteristic high redundancy, occlusions can be detected using robust estimation methods and considered in the optimisation. Therefore, no information about the occlusions and further processing steps are necessary. We evaluate our approach with synthetic and real data of an industrial application regarding the accuracy and ability to detect occlusion simultaneously. The evaluation of the proposed approach shows that the impact of occlusion can be eliminated, and the quality of the results is comparable to conventional methods.

Occlusion handling in spatio-temporal object-based image sequence matching

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-2-2024 ISPRS TC II Mid-term Symposium “The Role of Photogrammetry for a Sustainable World”, 11–14 June 2024, Las Vegas, Nevada, USA Occlusion handling in spatio-temporal object-based image sequence matching Simon Nietiedt 1, Petra Helmholz 2, Thomas Luhmann 1 1 Institute for Applied Photogrammetry and Geoinformatics, Jade University of Applied Sciences, Oldenburg, Germany 2 Spatial Sciences, School for Earth and Planetary Sciences, Curtin University, Australia KEY WORDS: Close-range photogrammetry, dynamic, occlusion, image matching, robust optimisation. ABSTRACT: Dynamic photogrammetry is an established method for acquiring 3D information of deforming objects or dynamic scenes in various close-range applications. A crucial impact has occlusions caused by object deformations, obstacles or camera movements. Temporal occlusions are highly application-specific and sometimes difficult to predict, resulting in a significant reduction of reconstruction quality or the aborting of image sequence processing. Previous approaches usually model such occlusions as semantic information and consider them using image masks. However, generating these image masks requires complex methods and extensive training data. Due to the unpredictability of the complexity and movements of dynamic scenes, generating training data is challenging in many applications. Therefore, this paper proposes an alternative modelling approach, which can be part of a spatio-temporal matching process. Based on the characteristic high redundancy, occlusions can be detected using robust estimation methods and considered in the optimisation. Therefore, no information about the occlusions and further processing steps are necessary. We evaluate our approach with synthetic and real data of an industrial application regarding the accuracy and ability to detect occlusion simultaneously. The evaluation of the proposed approach shows that the impact of occlusion can be eliminated, and the quality of the results is comparable to conventional methods. 1. INTRODUCTION Dynamic photogrammetry is an established method for the precise reconstruction of dynamic scenes in many applications. Especially in high-speed processes where area-based information is required, photogrammetry is used more and more (Luhmann et al., 2023). In general, the photogrammetric processing of image sequences to create 3D trajectories in close-range applications consists of several steps. Once the data has been acquired, image pre-processing and system calibration are required. Then, spatial and temporal matching procedures are carried out. The matching steps are usually the most computationally expensive and can be performed simultaneously and independently of each other. However, the spatial and temporal matching are based on the extracted image features or image intensities, resulting in high spatio-temporal redundancy. This redundancy can be used as a motion model to support the spatial and temporary matching process and reduce ambiguities. In general, the dynamic characteristics lead to additional requirements for the measurement technology and algorithms that surpass the requirements for reconstructing static scenes. For example, the synchronisation of the cameras, handling possible motion blur, and storing large amounts of data must be considered. In specific applications where repeatability cannot be guaranteed, temporary occlusion can pose a significant challenge. For example, in materials testing, the probe is destroyed during the test (Hampel and Maas, 2009; Guccione et al., 2020). The presence of obstacles can lead to a loss of image data and affect the quality of the results. A similar situation applies to car safety tests, where the dummy's movement and the footwell's deformation are interesting (Raguse et al., 2004). However, temporary occlusions caused by movements of the object or the camera itself and occlusions caused by damaged components can reduce the quality of the image sequences or even lead to aborted processing. Consequently, this can result in higher costs or reduce the usage of photogrammetric methods. Therefore, reliable detection is an essential task, and several approaches have been developed. In general, solution strategies can consist of purely image-based approaches or a combination of image and object-based methods. A purely image-based approach is to detect occlusions through semantic segmentation, which can be generated by various Machine Learning approaches (Wu and Nevatla, 2009; Saleh et al., 2021). Afterwards, corrupted pixels can be excluded from further processing steps. However, application-specific training data is required to generate accurate results. Alternative approaches use the optical flow that produces information about movements. If the optical flow is performed back and forth (in both sequence directions), occlusions can be classified using an energy function (Alvarez et al., 2007). Rúa et al. (2016) use parametric motion models to check the plausibility of the flow and detect occlusions based on it. The occlusion detection can be improved if a CAD model of the obstacle (Bethmann et al., 2009) or the deforming object (Malti et al., 2011) is available. For this purpose, a Kalman filter is used to predict the CAD model. Then, the prediction will be transformed into the image domain, where corresponding positions are excluded from the processing. Despite the wide range of applications, these approaches have one thing in common: the state of a pixel (occluded or not) is represented by an image mask. The image mask can be used in the respective matching process to determine whether the corresponding pixel will be considered, which can increase the complexity and computational time. If the reconstruction of the object surface is combined with the tracking, a spatio-temporal matching method is formed, enabling better utilisation of the spatio-temporal redundancy. In the spatio-temporal approach published by Ngo et al. (2015), all observations are used in the numerical optimisation. A relevancy score, calculated by normalised cross-correlation, weights the observations to control the impact of corrupted observations. However, the method assumes that the static object surface is known. Instead, Lin et al. (2022) use an RGBD sensor to derive spatio-temporal information from the sensor data. Here, the object motion is modelled using a graph whose spatio-temporal consistency is optimised using a Long Short-Term Memory (LSTM) model. The network is trained in a supervised manner. Therefore, This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. https://doi.org/10.5194/isprs-annals-X-2-2024-163-2024 | © Author(s) 2024. CC BY 4.0 License. 163 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-2-2024 ISPRS TC II (...truncated)


This is a preview of a remote PDF: https://isprs-annals.copernicus.org/articles/X-2-2024/163/2024/isprs-annals-X-2-2024-163-2024.pdf
Article home page: https://doaj.org/article/d5f6c0eb27044a3385de5c4f780dc627

S. Nietiedt, P. Helmholz, T. Luhmann. Occlusion handling in spatio-temporal object-based image sequence matching, 2024, pp. 163-170, Issue X-2-2024, DOI: 10.5194/isprs-annals-X-2-2024-163-2024