Challenge 6: Device Identification Under Temporal Drift in RF Fingerprinting
Abstract
While many Radio Frequency Fingerprinting Identification (RFFI) systems achieve high identification accuracy when training and test data are collected at the same time, their performance degrades significantly when evaluated on data collected weeks later. This degradation is primarily caused by temporal variations in the transmitter hardware characteristics due to thermal effects during the hardware warm-up phase and the impact of power cycling the devices between data collections [1, 2].
This challenge addresses the long-term stability of Radio Frequency (RF) fingerprinting in real Internet of Things (IoT) deployments. Participants must train their model exclusively on the short controlled baseline phase (Phase 1) and then evaluate its performance on the long-term phase (Phase 2) spanning several weeks. The goal is to explore effective strategies for temporally robust physical-layer authentication under realistic drift conditions. The challenge uses the RFFI-Temporal dataset [3].
Dataset
The RFFI-Temporal dataset is available via Zenodo (https://zenodo.org/records/18952487). It is designed to study the long-term stability of Radio Frequency Fingerprint Identification. It provides packet-aligned complex baseband In-phase and Quadrature (IQ) recordings from 30 TI CC13xx IoT devices captured over approximately nine weeks using three software-defined radio receivers. Only data from receiver R02 is used in this challenge. The dataset includes two collection phases: a controlled baseline phase (Phase 1) with uniform transmission intervals over approximately 45 hours, and a long-term evaluation phase (Phase 2) with device-specific intervals ranging from 15 seconds to 24 hours over approximately nine weeks. To support transition-based analysis, each packet is stored with pre- and post-packet margins that preserve transmitter startup and shutdown transients. Rich per-packet metadata, including internal temperature, battery level, and Real-Time Clock (RTC) timestamps, is available for additional analysis. The experimental testbed used for the longitudinal data collection is shown in Figure 1.
Challenge Description
No additional software is required beyond standard deep learning libraries.
In this challenge, participants must train their RF fingerprinting model exclusively on Phase 1 data from receiver R02. The trained model is then evaluated on the entirety of Phase 2 data from receiver R02. No data from Phase 2 may be used at any stage of training or model selection.
Data Split: The following fixed chronological split must be applied to Phase 1 data from receiver R02, independently per transmitter:
- Training: First 70% of Phase 1 packets per transmitter, labeled data for training the RF fingerprinting model.
- Validation: Next 15% of Phase 1 packets per transmitter, labeled data for model selection.
- Test: Last 15% of Phase 1 packets per transmitter, held-out labeled data for evaluating in-phase performance before long-term evaluation.
- Final Evaluation: All Phase 2 packets from receiver R02, used to compute the primary ranking metric. No Phase 2 data may be used during training or model selection.
Feature Extraction Rules:
- Only the preamble, sync word, and transient portions (pre-packet margins) may be used for feature extraction and model training.
- The payload part of the packet must not be used to avoid data leakage.
- Per-packet metadata (temperature, battery level, RTC timestamp) may be used for additional analysis but must not be included as input features for the main classifier.
- Constraint: Chronological ordering must be strictly preserved across all splits. No data from Phase 2 may be used at any stage before final evaluation.
Input and Output
Input: Raw IQ packets from Phase 2 of receiver R02, with transmitter identity and sequence number information provided per packet.
Output: The macro-averaged F1-score across all 30 transmitters on Phase 2, as defined in the Evaluation Metric section. Participants must also submit a description of their methodology, including data preprocessing steps, model architecture, training procedure, and any drift compensation strategy used.
Evaluation Metric
The primary ranking metric is the macro-averaged F1-score across all 30 transmitters on Phase 2, defined as:
where C = 30 is the number of transmitter classes, and Pc and Rc are the precision and recall for class c, respectively. A higher F1macro indicates better long-term identification performance under temporal drift. Participants must also report the Phase 1 test split F1macro as a secondary metric to allow comparison between in-phase and cross-phase performance.
Bibliography
- S. AlHazbi, S. Sciancalepore, and G. Oligeri, “The day-after-tomorrow: On the performance of radio fingerprinting over time,” in Proc. Annual Computer Security Applications Conference (ACSAC ’23). ACM, December 2023.
- A. Elmaghbub and B. Hamdaoui, “No blind spots: On the resiliency of device fingerprints to hardware warm-up through sequential transfer learning,” in Proc. 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec ’24). ACM, May 2024.
- C. Ayyıldız, F. E. Yıldız, V. C. Yıldırım, and D. Çakmak, “RFFI-Temporal: A long-term RF fingerprinting dataset for temporal drift analysis,” Zenodo, 2026, version 1.0.0. https://doi.org/10.5281/zenodo.18952487
To participate: submit your solution using the submission form.