Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach

Journal of Imaging, Volume (11), No (11), Year (2025-10) , Pages (377-408)

Title : ( Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach )

Authors: Mohammed Jeryo , Ahad Harati ,

File:

Full Text

Citation: BibTeX | EndNote

Abstract

Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of event data and the absence of dense annotations are significant obstacles to supervised learning for motion segmentation from event streams. Domain adaptation is also challenging due to the considerable domain shift in intensity images. To address these challenges, we propose a two-phase cross-modality adaptation framework that translates motion segmentation knowledge from labeled RGB-flow data to unlabeled event streams. A dual-branch encoder extracts modality-specific motion and appearance features from RGB and optical flow in the source domain. Using reconstruction networks, event voxel grids are converted into pseudo-image and pseudo-flow modalities in the target domain. These modalities are subsequently re-encoded using frozen RGB-trained encoders. Multi-level consistency losses are implemented on features, predictions, and outputs to enforce domain alignment. Our design enables the model to acquire domain-invariant, semantically rich features through the use of shallow architectures, thereby reducing training costs and facilitating real-time inference with a lightweight prediction path. The proposed architecture, alongside the utilized hybrid loss function, effectively bridges the domain and modality gap. We evaluate our method on two challenging benchmarks: EVIMO2, which incorporates real-world dynamics, high-speed motion, illumination variation, and multiple independently moving objects; and MOD++, which features complex object dynamics, collisions, and dense 1kHz supervision in synthetic scenes. The proposed UDA framework achieves 83.1% and 79.4% accuracy on EVIMO2 and MOD++, respectively, outperforming existing state-of-the-art approaches, such as EV-Transfer and SHOT, by up to 3.6%. Additionally, it is lighter and faster and also delivers enhanced mIoU and F1 Score.

Keywords

, motion segmentation; event camera; unsupervised domain adaptation; cross, modality learning; real, time inference

برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

BibTeX
EndNote

@article{paperid:1104966,
author = {Jeryo, Mohammed and Harati, Ahad},
title = {Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach},
journal = {Journal of Imaging},
year = {2025},
volume = {11},
number = {11},
month = {October},
issn = {2313-433X},
pages = {377--408},
numpages = {31},
keywords = {motion segmentation; event camera; unsupervised domain adaptation; cross-modality learning; real-time inference},
}

[Download]

%0 Journal Article
%T Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach
%A Jeryo, Mohammed
%A Harati, Ahad
%J Journal of Imaging
%@ 2313-433X
%D 2025

[Download]