Occlusion handling in spatio-temporal object-based image sequence matching
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-2-2024
ISPRS TC II Mid-term Symposium “The Role of Photogrammetry for a Sustainable World”, 11–14 June 2024, Las Vegas, Nevada, USA
Occlusion handling in spatio-temporal object-based image sequence matching
Simon Nietiedt 1, Petra Helmholz 2, Thomas Luhmann 1
1 Institute
for Applied Photogrammetry and Geoinformatics, Jade University of Applied Sciences, Oldenburg, Germany
2 Spatial Sciences, School for Earth and Planetary Sciences, Curtin University, Australia
KEY WORDS: Close-range photogrammetry, dynamic, occlusion, image matching, robust optimisation.
ABSTRACT:
Dynamic photogrammetry is an established method for acquiring 3D information of deforming objects or dynamic scenes in various
close-range applications. A crucial impact has occlusions caused by object deformations, obstacles or camera movements. Temporal
occlusions are highly application-specific and sometimes difficult to predict, resulting in a significant reduction of reconstruction
quality or the aborting of image sequence processing. Previous approaches usually model such occlusions as semantic information and
consider them using image masks. However, generating these image masks requires complex methods and extensive training data. Due
to the unpredictability of the complexity and movements of dynamic scenes, generating training data is challenging in many
applications. Therefore, this paper proposes an alternative modelling approach, which can be part of a spatio-temporal matching
process. Based on the characteristic high redundancy, occlusions can be detected using robust estimation methods and considered in
the optimisation. Therefore, no information about the occlusions and further processing steps are necessary. We evaluate our approach
with synthetic and real data of an industrial application regarding the accuracy and ability to detect occlusion simultaneously. The
evaluation of the proposed approach shows that the impact of occlusion can be eliminated, and the quality of the results is comparable
to conventional methods.
1. INTRODUCTION
Dynamic photogrammetry is an established method for the
precise reconstruction of dynamic scenes in many applications.
Especially in high-speed processes where area-based information
is required, photogrammetry is used more and more (Luhmann et
al., 2023). In general, the photogrammetric processing of image
sequences to create 3D trajectories in close-range applications
consists of several steps. Once the data has been acquired, image
pre-processing and system calibration are required. Then, spatial
and temporal matching procedures are carried out. The matching
steps are usually the most computationally expensive and can be
performed simultaneously and independently of each other.
However, the spatial and temporal matching are based on the
extracted image features or image intensities, resulting in high
spatio-temporal redundancy. This redundancy can be used as a
motion model to support the spatial and temporary matching
process and reduce ambiguities. In general, the dynamic
characteristics lead to additional requirements for the
measurement technology and algorithms that surpass the
requirements for reconstructing static scenes. For example, the
synchronisation of the cameras, handling possible motion blur,
and storing large amounts of data must be considered. In specific
applications where repeatability cannot be guaranteed, temporary
occlusion can pose a significant challenge. For example, in
materials testing, the probe is destroyed during the test (Hampel
and Maas, 2009; Guccione et al., 2020). The presence of
obstacles can lead to a loss of image data and affect the quality
of the results. A similar situation applies to car safety tests, where
the dummy's movement and the footwell's deformation are
interesting (Raguse et al., 2004). However, temporary occlusions
caused by movements of the object or the camera itself and
occlusions caused by damaged components can reduce the
quality of the image sequences or even lead to aborted
processing. Consequently, this can result in higher costs or
reduce the usage of photogrammetric methods.
Therefore, reliable detection is an essential task, and several
approaches have been developed. In general, solution strategies
can consist of purely image-based approaches or a combination
of image and object-based methods. A purely image-based
approach is to detect occlusions through semantic segmentation,
which can be generated by various Machine Learning approaches
(Wu and Nevatla, 2009; Saleh et al., 2021). Afterwards,
corrupted pixels can be excluded from further processing steps.
However, application-specific training data is required to
generate accurate results. Alternative approaches use the optical
flow that produces information about movements. If the optical
flow is performed back and forth (in both sequence directions),
occlusions can be classified using an energy function (Alvarez et
al., 2007). Rúa et al. (2016) use parametric motion models to
check the plausibility of the flow and detect occlusions based on
it. The occlusion detection can be improved if a CAD model of
the obstacle (Bethmann et al., 2009) or the deforming object
(Malti et al., 2011) is available. For this purpose, a Kalman filter
is used to predict the CAD model. Then, the prediction will be
transformed into the image domain, where corresponding
positions are excluded from the processing.
Despite the wide range of applications, these approaches have
one thing in common: the state of a pixel (occluded or not) is
represented by an image mask. The image mask can be used in
the respective matching process to determine whether the
corresponding pixel will be considered, which can increase the
complexity and computational time. If the reconstruction of the
object surface is combined with the tracking, a spatio-temporal
matching method is formed, enabling better utilisation of the
spatio-temporal redundancy. In the spatio-temporal approach
published by Ngo et al. (2015), all observations are used in the
numerical optimisation. A relevancy score, calculated by
normalised cross-correlation, weights the observations to control
the impact of corrupted observations. However, the method
assumes that the static object surface is known. Instead, Lin et al.
(2022) use an RGBD sensor to derive spatio-temporal
information from the sensor data. Here, the object motion is
modelled using a graph whose spatio-temporal consistency is
optimised using a Long Short-Term Memory (LSTM) model.
The network is trained in a supervised manner. Therefore,
This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper.
https://doi.org/10.5194/isprs-annals-X-2-2024-163-2024 | © Author(s) 2024. CC BY 4.0 License.
163
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-2-2024
ISPRS TC II (...truncated)