This paper presents SHERPA, an explainability-driven defense framework designed to protect Federated Learning (FL) systems from data poisoning attacks. FL allows distributed devices to collaboratively train a global model without sharing raw data, but this also opens the door for malicious clients to inject poisoned updates that manipulate or degrade the global model. SHERPA addresses this challenge by shifting the focus from model weights to model behavior. Instead of comparing parameter differences which may fail when attackers subtly alter updates the framework uses SHAP feature attributions to examine how each client’s model prioritizes input features across classes. Clients whose attribution patterns significantly deviate from benign behavior can be flagged as suspicious.

To group and analyze these attribution profiles, SHERPA applies the HDBSCAN clustering algorithm, which naturally separates benign clients from poisoned ones by identifying inconsistent or anomalous explanation patterns. This approach enables SHERPA not only to detect malicious updates but also to provide interpretable evidence for why a client is considered harmful an important requirement for trustworthy AI. By combining explainability with robust clustering, the SHERPA framework offers a privacy-preserving and model-agnostic method for strengthening FL systems against a wide range of poisoning strategies. The work highlights the growing importance of transparency and interpretability in securing distributed learning environments, especially in future 6G networks where AI will be deeply embedded across the architecture.

<br />

SHERPA_Explainable_Robust_Algorithms_for_Privacy-Preserved_Federated_Learning_in_Future_Networks_to_Defend_Against_Data_Poisoning_Attacks

SHERPA: Explainable Robust Algorithms for Privacy-Preserved Federated Learning in Future Networks to Defend Against Data Poisoning Attacks