Integrating Homomorphic Encryption and Synthetic Data in FL for Privacy and Learning Quality

Alternating Federated Learning (Alt-FL) is a novel privacy-preserving AI framework that interleaves training on real client data protected by homomorphic encryption with training on synthetic data transmitted in plaintext. This method achieves a 13.4% higher model accuracy than Selective HE baselines while reducing homomorphic encryption computational costs by up to 48%. The synthetic data provides a regularization effect that improves model generalization and defends against data leakage attacks like DLG.

Integrating Homomorphic Encryption and Synthetic Data in FL for Privacy and Learning Quality

New 'Alternating Federated Learning' Method Boosts AI Accuracy While Slashing Privacy Costs

A novel framework for federated learning (FL) has been introduced, promising to resolve a core tension in privacy-preserving AI: balancing high model accuracy with the prohibitive computational cost of strong encryption. The proposed method, named Alternating Federated Learning (Alt-FL), ingeniously interleaves training on real client data with training on synthetic data, dramatically cutting the use of expensive homomorphic encryption (HE) while defending against data leakage attacks.

Federated learning is a foundational technique for training machine learning models across decentralized devices without centralizing raw, sensitive data. While it offers a baseline of privacy, advanced threats like the DLG (Deep Leakage from Gradients) attack can potentially reconstruct private data from shared model updates. Homomorphic encryption, which allows computations on encrypted data, is a gold-standard defense but is notoriously resource-intensive, often making it impractical for large-scale deployment.

How Alt-FL Works: A Strategic Interleaving of Real and Synthetic Data

The Alt-FL framework introduces a cyclical training regimen. It alternates between what the researchers term authentic rounds and synthetic rounds. During an authentic round, clients train locally on their genuine, private datasets. Crucially, the model updates from these rounds are protected using homomorphic encryption before being sent to the central server. In the subsequent synthetic round, clients instead train on locally generated synthetic data, and the resulting model updates are transmitted in plaintext, incurring no encryption overhead.

This interleaving strategy is the key to its efficiency. By only applying HE to a fraction of the communication rounds (the authentic ones), the total encryption and decryption costs are slashed. Simultaneously, the use of synthetic data in alternating rounds acts as a dataset enhancement technique, providing a regularization effect that improves the final model's generalization and accuracy.

Demonstrated Gains in Accuracy and Efficiency

In empirical evaluations, Alt-FL delivered compelling results on both performance fronts. Compared to a baseline method known as Selective HE, Alt-FL achieved a 13.4% higher model accuracy. This significant boost is attributed to the diversifying effect of the synthetic data during training. On the critical metric of computational efficiency, Alt-FL reduced HE-related costs by up to 48%, making robust privacy protection far more feasible for resource-constrained edge devices.

Furthermore, the system was rigorously tested against inference attacks designed to steal client data. Alt-FL demonstrated robust privacy protection against the DLG attack, confirming that its strategic use of HE during authentic rounds is sufficient to thwart such privacy exploits.

Why This Matters for the Future of Private AI

The development of Alt-FL addresses a major bottleneck in applied machine learning, particularly for sectors like healthcare, finance, and mobile computing where data sensitivity and device limitations intersect.

  • Practical Privacy at Scale: It makes strong, cryptographically-backed privacy (via HE) computationally sustainable for real-world federated learning systems.
  • Performance Enhancement: It turns a privacy constraint into a performance benefit, using synthetic data not just for privacy but to actively improve model accuracy.
  • Defense Against Advanced Threats: It provides a verified defense against model inversion and data leakage attacks, moving beyond the assumed privacy of standard FL.
  • Path to Adoption: By drastically cutting encryption costs, this work lowers the barrier for industries to adopt truly private collaborative AI solutions.

This research, detailed in the preprint "Alternating Federated Learning," represents a sophisticated step toward reconciling the often-competing demands of accuracy, privacy, and efficiency in decentralized machine learning.

常见问题