FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation
Abstract
:1. Introduction
- 1.
- To address the phenomenon of drift and forgetting arising from local training of FL algorithms in IoT application deployment. This work incorporates an auxiliary constraint on loss in the local training phase in conjunction with relational knowledge distillation, enabling the local model to learn and retain global knowledge with a higher dimensional view to avoid forgetting.
- 2.
- To further improve model stability and robustness, this work proposes a robust local learning method named entropy-wise adaptive weights (EWAW), where the penalty weights between distillation losses are determined adaptively by the prediction entropy of global model on each local data model, which can further improve the model performance in each scenario by dynamically adjusting the loss weights.
- 3.
- By performing comprehensive experiments on CIFAR10 and CIFAR100, we validate that FedRAD has superior accuracy and convergence speed compared to FedAvg, FedProx and FedMD for scenarios with different levels of data heterogeneity and different percentages of active clients.
2. Related Work
2.1. Federated Learning
2.2. Knowledge Distillation in FL
3. Methodology
3.1. Inconsistency in Federated Learning
3.2. Preliminary
3.3. Codistillation in Local Training
3.4. Relational Distillation
3.5. Adaptive Weighting Coefficient
Algorithm 1: FedRAD. |
Input: T: communication round; N: client number; C: the fraction of active clients in each round; : local datasets of clients; : prediction of global model and local model respectively;
: learning rate; E: local epochs Output: |
1. Initialize model parameters w 2. Server update: 3. for each round do 4. random subset (C fraction) of the N clients 5. for each clients in parallel do 6. 7. end for 8. Server Aggregation: 9. end for 10. Client update : 11. for each do 12. for batch do 13. ▷ in Equation (2) 14. ▷ in Equation (3) 15. 16. 17. end for 18. end for |
4. Experiment
4.1. Implementation Details
4.1.1. Datasets and Data Allocation
4.1.2. Baselines
4.1.3. Models
4.1.4. Hyperparameters
4.2. Comparative Analysis of Accuracy and Efficiency
4.2.1. Test Accuracy
4.2.2. Communication Rounds
4.3. Hyperparameters and Ablation Experiments
4.3.1. Data Heterogeneity
4.3.2. Hyperparameters
4.3.3. Partial Client Participant
4.3.4. Ablation Study in FedRAD
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, Fort Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. [Google Scholar]
- Zheng, Z.; Zhou, Y.; Sun, Y.; Wang, Z.; Liu, B.; Li, K. Applications of Federated Learning in Smart Cities: Recent Advances, Taxonomy, and Open Challenges. Connect. Sci. 2022, 34, 1–28. [Google Scholar] [CrossRef]
- Liu, Q.; Chen, C.; Qin, J.; Dou, Q.; Heng, P.-A. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021; pp. 1013–1023. [Google Scholar]
- Vaid, A.; Jaladanki, S.; Xu, J.; Teng, S.; Kumar, A.; Lee, S.; Somani, S.; Paranjpe, I.; Freitas, J.; Wanyan, B.; et al. Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients with COVID-19: Machine Learning Approach. JMIR Med. Inform. 2021, 9, e24207. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Huang, J. A Distribution Information Sharing Federated Learning Approach for Medical Image Data. Complex Intell. Syst. 2023, 2023, 1–12. [Google Scholar] [CrossRef]
- Byrd, D.; Polychroniadou, A. Differentially Private Secure Multi-Party Computation for Federated Learning in Financial Applications. In Proceedings of the ICAIF ’20: The First ACM International Conference on AI in Finance, New York, NY, USA, 15–16 October 2020; pp. 16:1–16:9. [Google Scholar]
- Ring, M.B. Child: A First Step Towards Continual Learning. In Learning to Learn; Thrun, S., Pratt, L.Y., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 261–292. [Google Scholar]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated Learning with Non-IID Data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the Convergence of FedAvg on Non-IID Data. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.J.; Stich, S.U.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Virtual Event, 13–18 July 2020; Volume 119, pp. 5132–5143. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. In Proceedings of the Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, 2–4 March 2020. [Google Scholar]
- Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble Distillation for Robust Model Fusion in Federated Learning. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Cheng, S.; Wu, J.; Xiao, Y.; Liu, Y. FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion. arXiv 2021, arXiv:2110.11027. [Google Scholar]
- Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational Knowledge Distillation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 3967–3976. [Google Scholar]
- Matthews, P. A Short History of Structural Linguistics; Cambridge University Press: Cambridge, UK, 2001; ISBN 978-0-521-62568-5. [Google Scholar]
- Anil, R.; Pereyra, G.; Passos, A.; Ormándi, R.; Dahl, G.E.; Hinton, G.E. Large Scale Distributed Neural Network Training through Online Distillation. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Yao, A.C.-C. Protocols for Secure Computations (Extended Abstract). In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar]
- Yao, A.C.-C. How to Generate and Exchange Secrets (Extended Abstract). In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, Toronto, ON, Canada, 27–29 October 1986; pp. 162–167. [Google Scholar]
- Sattler, F.; Müller, K.-R.; Samek, W. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3710–3722. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An Efficient Framework for Clustered Federated Learning. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 19586–19597. [Google Scholar]
- Hanzely, F.; Richtárik, P. Federated Learning of a Mixture of Global and Local Models. arXiv 2020, arXiv:2002.05516. [Google Scholar]
- Dinh, C.T.; Tran, N.H.; Nguyen, T.D. Personalized Federated Learning with Moreau Envelopes. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Chen, H.-Y.; Chao, W.-L. FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. [Google Scholar]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.S.; Khazaeni, Y. Federated Learning with Matched Averaging. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Sun, S.; Cheng, Y.; Gan, Z.; Liu, J. Patient Knowledge Distillation for BERT Model Compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4322–4331. [Google Scholar]
- Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep Mutual Learning. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4320–4328. [Google Scholar]
- Li, D.; Wang, J. FedMD: Heterogenous Federated Learning via Model Distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar]
- Jeong, E.; Oh, S.; Kim, H.; Park, J.; Bennis, M.; Kim, S.-L. Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. arXiv 2018, arXiv 1811.11479. [Google Scholar]
- He, C.; Annavaram, M.; Avestimehr, S. Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Li, X.; Chen, B.; Lu, W. FedDKD: Federated Learning with Decentralized Knowledge Distillation. Appl. Intell. 2023, 53, 18547–18563. [Google Scholar] [CrossRef]
- Zhang, L.; Shen, L.; Ding, L.; Tao, D.; Duan, L.-Y. Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 10164–10173. [Google Scholar]
- Zhu, Z.; Hong, J.; Zhou, J. Data-Free Knowledge Distillation for Heterogeneous Federated Learning. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event, 18–24 July 2021; Volume 139, pp. 12878–12889. [Google Scholar]
- Zhang, L.; Wu, D.; Yuan, X. FedZKT: Zero-Shot Knowledge Transfer towards Resource-Constrained Federated Learning with Heterogeneous On-Device Models. In Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, ICDCS 2022, Bologna, Italy, 10–13 July 2022; pp. 928–938. [Google Scholar]
- Chen, H.; Wang, C.; Vikalo, H. The Best of Both Worlds: Accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Pereyra, G.; Tucker, G.; Chorowski, J.; Kaiser, L.; Hinton, G.E. Regularizing Neural Networks by Penalizing Confident Output Distributions. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Sattler, F.; Korjakow, T.; Rischke, R.; Samek, W. FEDAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Method | Accuracy (%) | |||||
---|---|---|---|---|---|---|
CIFAR10 | CIFAR100 | |||||
FedAvg | 86.54 | 45.13 | 61.14 | 74.60 | 28.34 | 53.71 |
FedProx | 85.69 | 49.33 | 64.81 | 75.14 | 32.78 | 53.77 |
FedMD | 84.76 | 44.15 | 63.35 | 71.27 | 28.92 | 52.32 |
FedDistill+ | 86.34 | 49.21 | 65.46 | 75.11 | 32.55 | 54.49 |
FedGen | 86.17 | 49.41 | 64.76 | 74.66 | 32.91 | 54.56 |
FedRAD | 86.28 | 52.23 | 67.37 | 74.74 | 34.88 | 56.12 |
Method | Communication Rounds | |||
---|---|---|---|---|
CIFAR10 | CIFAR100 | |||
FedAvg | 35 | 55 | 25 | 57 |
FedProx | 26 | 36 | 21 | 43 |
FedMD | 32 | 46 | 27 | 56 |
FedDistill+ | 25 | 35 | 22 | 43 |
FedGen | 26 | 38 | 20 | 41 |
FedRAD | 17 | 24 | 11 | 26 |
Method | Accuracy (%) | ||||||
---|---|---|---|---|---|---|---|
FedAvg | 61.16 | 67.41 | 70.33 | 73.11 | 73.42 | 74.04 | 74.24 |
FedProx | 64.81 | 71.21 | 72.85 | 74.07 | 75.52 | 75.28 | 75.36 |
FedMD | 63.35 | 68.32 | 69.81 | 71.78 | 71.66 | 72.37 | 72.98 |
FedDistill+ | 65.46 | 70.07 | 72.41 | 73.95 | 75.73 | 75.71 | 75.97 |
FedGen | 64.16 | 69.81 | 72.58 | 74.01 | 75.15 | 75.03 | 75.16 |
FedRAD | 67.37 | 72.65 | 75.07 | 76.18 | 77.25 | 77.18 | 77.21 |
Method | Accuracy (%) | |||
---|---|---|---|---|
FedAvg | 55.14 | 56.62 | 57.86 | 58.11 |
FedProx | 56.29 | 57.35 | 57.93 | 58.36 |
FedMD | 52.70 | 54.62 | 56.49 | 56.91 |
FedDistill+ | 56.71 | 57.24 | 58.68 | 59.05 |
FedGen | 56.05 | 57.12 | 57.23 | 59.09 |
FedRAD | 57.88 | 59.14 | 59.81 | 60.07 |
Method | Accuracy (%) | |
---|---|---|
Baseline | FedRAD | 67.37 |
Module | 64.45 | |
no EWAW | 66.09 | |
no EDA | 66.74 | |
no EWAW + EDA | 65.71 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, J.; Ding, X.; Hu, D.; Guo, B.; Shen, Y.; Ma, P.; Jiang, Y. FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation. Sensors 2023, 23, 6518. https://doi.org/10.3390/s23146518
Tang J, Ding X, Hu D, Guo B, Shen Y, Ma P, Jiang Y. FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation. Sensors. 2023; 23(14):6518. https://doi.org/10.3390/s23146518
Chicago/Turabian StyleTang, Jianwu, Xuefeng Ding, Dasha Hu, Bing Guo, Yuncheng Shen, Pan Ma, and Yuming Jiang. 2023. "FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation" Sensors 23, no. 14: 6518. https://doi.org/10.3390/s23146518
APA StyleTang, J., Ding, X., Hu, D., Guo, B., Shen, Y., Ma, P., & Jiang, Y. (2023). FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation. Sensors, 23(14), 6518. https://doi.org/10.3390/s23146518