117 Views
June 11, 24
スライド概要
My master thesis accepted in UbiSec2024.
Poison Egg: Scrambling Federated Learning with Delayed Backdoor Attack The University of Tokyo Masayoshi Tsutsui・Tatsuya Kaneko・Shinya Takamaeda-Yamazaki 1
Backgrounds 2
Federated Learning (FL) Collaborative ML system for privacy protection ΔWi Repeat following steps as 1 round. 1. Each client loads global model from server. ΔW1 ΔW2 ΔW3 2. Train for several steps, and send the weights update ∆𝑊! to the server. 3. Server updates global model with average of ∆𝑊! 4. Select another group of clients as participants 3
Risk of Model Poisoning by Malicious Clients A lot of clients are involved in training Ø Malicious clients can break the model ΔWi Itʼs important to identify all possible attacks for prevention. ΔW1 ΔW2 ΔW3 ü We propose a novel attacking method in FL 4
Backdoor Attack in Deep Learning Pollute the model to show abnormal behavior only when a specific input (trigger) is given. Cat! Cat ! Attacking Procedure Cat ! 1. Collect 2 types of training data a. trigger data with wrong label b. clean data with correct label Bird! 2. Train the model with a mixture of a and b until it fits both data bird ü Applicable to FL cat Badnets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. ( Gu et al. 2017) 5
Backdoor Attack in Federated Learning
Attackerʼs Objective
Ø make the next-round model (𝑊!"# ) close to the
polluted model (𝑊$ )
(Wb-Wt)
Premise: “training process is almost converged”
Ø model updates are small i.e. ∑% ∆𝑊% ≈ 0
🤔 If attacker submits large update ∆𝑊& (≫ ∆𝑊% ) ...
Ø ∑% ∆𝑊% ≈ ∆𝑊&
Attacking Procedure
Ø send 𝑊$ − 𝑊! ∗ 𝑁 as model updates
(𝑁 𝑖𝑠 # 𝑜𝑓 𝑐𝑙𝑖𝑒𝑛𝑡𝑠)
(Wb-Wt)*3
ΔW2
ΔW3
What happens on server
#
Ø 𝑊!"# = 𝑊! + ∑% ∆𝑊%
'
( )( ∗'
≈ 𝑊! + ! "
'
= 𝑊$
∵ ∑% ∆𝑊% ≈ ∆𝑊&
How to Backdoor Federated Learning? (Bagdasaryan et al, 2018)
6
Defense methods against Backdoor FL 7
Defense① Anomaly Detection on ∆𝑊 Server performs anomaly detection on ∆𝑊 of each client. Ø Suspicious ∆𝑊 is rejected or reduced. metric ex.) L2norm, cosine distance etc. ✅ Regarded as a prevention before the attack However, it has one risk. LClients with rare data are tend to be rejected. (Wb-Wt)*3 ΔW2 ΔW3 Critical issue for practical usage 8
Defense② Anomaly Detection on Global Model Base Idea Ø Attack fluctuates global model acc. to some extent. Validation clients measure global model acc every round. Ø “Roll back model state by 1 round” if an anomaly is detected. ✅ Regarded as a cure after the attack. It overcomes the risk in defense① J All clients can be involved unless they break model. ΔWi Despite extra time consumption, it is more practical than ①. 9
1-round Rollback is Enough for Cure? “Roll back model state by 1 round” Ø assumes that anomaly occurs right after the attack 🤔 What if an attack exists that delays anomaly occurrence? 1st round 2nd ʼ round 2nd round ΔW ΔW ΔW ΔW ΔW ΔW ΔW ΔW ΔW ΔW ΔW ΔW … 10
Proposal︓ Delayed Backdoor Attack 11
Making Use of Backdoor Revival Our Goal Ø Making a delay between the attack and the rise of misprediction rate regular backdoor attack behavior We made use of a backdoorʼs feature observed in previous research. … Misprediction rate drops after the attack and revives automatically. misclassification rate 1.00 0.75 0.50 0.25 0.00 5 10 Attack! 15 20 25 30 35 40 round ü Utilized the feature to realize delayed backdoor attack 12
The Model State before Revival Procedure of FL Backdoor Attack 1. Train a polluted model Wb 2. Replace the global model Wt+1 with Wb Training polluted model Model replacement (Wb-Wt) (Wb-Wt)*3 Wb ΔW2 ΔW3 25 30 35 40 25 30 35 40 Misprediction rate drops 1round after the attack regular backdoor attack behavior 🤔 What is the model state when the drop occurs (Wd)? Ø Egg of backdoor, about to raise misprediction rate. misclassification rate 1.00 0.75 0.50 0.25 0.00 5 ü Itʼs realizable if attacker can get Wd before the attack 15 20 round 1.00 misclassification rate 🤔 What if we try to replace Wt+1 with Wd instead of Wb? Ø Misprediction rate is low at the attack round, and rises spontaneously at the next round. 10 ideal delayed backdoor attack 0.75 0.50 0.25 0.00 5 10 15 20 round 13
Proposal︓Delayed Backdoor Attack Procedure Wb 1. Train a polluted model (𝑊𝑏) with a mixed-up dataset. regular backdoor attack behavior misclassification rate Wb’ 2. Train the model with a clean dataset until misprediction rate decreases enough. (𝑾𝒃’) Ø trying to imitate the effects caused by benign clients when misprediction rate drops. 1.00 0.75 0.50 0.25 0.00 5 10 15 20 25 30 35 40 round 3. Send 𝑾𝒃 ′ − 𝑊# ∗ 𝑁 instead of 𝑊$ − 𝑊# ∗ 𝑁 (Wb’-Wt)*3 ΔW ΔW 14
Experiments 15
Experimental Settings FL Settings • Image classification on CIFAR-10 with ResNet18 • 10 out of 100 clients take places every round • Attack is performed after 1000 round training. • Optimizer is SGD (lr: 0.1, mom: 0.9, wd: 0.0005) • batch size: 64, local epochs: 2 Attacker Settings • Trigger is a car with yellow and black striped wall Ø misclassified as a ship • SGD (lr: 0.05, mom: 0.9, wd: 0.0005) Defense Settings • 10 validators measure per-class acc. • rolls back if any acc decrease exceeds threshold • one of them has 100 trigger images Ship ! 16
Delayed Backdoor vs Regular Backdoor delayed backdoor attack 提案手法 Ours 通常のバックドア攻撃 Regular Backdoor misclassification rate 1.00 0.75 0.50 0.25 0.00 10 20 30 40 Attack! round Misprediction rate on 100 trigger images held by one validator • Defense method is not applied yet. ü Backdoor activation is certainly delayed. 17
Regular Backdoor vs Defense Attack! defense is applied whose accuracy threshold is 20% Defense failed in rejecting delayed backdoor attack. Ø FL process was scrambled 18
How to Defend against Delayed Backdoor We could delay backdoor by 1round in our experiment. delayed backdoor attack 提案手法 Ours Ø Able to defend by rolling back by more than 2rounds? misclassification rate 🤔 Is it possible to make more than 2-round delay? Ø depends on whether such long backdoor revival exists. 通常のバックドア攻撃 Regular Backdoor 1.00 0.75 0.50 0.25 0.00 10 20 30 40 round • Previous research observed a revival longer than 10 rounds. ü There is no way to prevent it absolutely. 19
Conclusion • Pointed out a vulnerability underlying the premise of defenses against FL backdoor that model anomaly occurs right after the attack. Ø anomaly can be removed by rolling back the model by 1 round • Proposed delayed backdoor attack, which delays anomaly occurrence from the attack Ø anomaly recurs no matter how many times the model is rolled back. • Demonstrated that FL backdoor defense needs a new perspective for complete security. D 3 I A 20
21
Index Backgrounds • Federated Learning (FL) • Backdoor Attack • Backdoor Attack in FL • Defense Methods against FL Backdoor Attack Proposal • Delayed Backdoor Attack Experiment • Evaluation Result 22
Privacy Concerns in Deep Learning Recent deep learning models became large. Ø Demand for massive training data Privacy regulations became stricter. Ø Collecting training data is getting difficult.L ü Growing demand for deep learning that does not collect training data privacy 23
Defense① Anomaly Detection on ∆𝑊 Server performs anomaly detection on ∆𝑊 of each client. Ø Suspicious ∆𝑊 is rejected or reduced. metric ex.) L2norm, cosine distance etc. ✅ Regarded as a prevention before the attack (Wb-Wt)*3 However, it has 2 risks. L Clients with rare data are tend to be rejected. L Unencrypted ∆𝑊 is required for detection. Ø Malicious server can extract clientsʼ info from ∆𝑊. ΔW2 ΔW3 cf) Deep Leakage from Gradients (Zhu et al.) Insufficient security for practical usage 24
Defense② Anomaly Detection on Global Model Base Idea Ø Attack fluctuates global model acc. to some extent. Validation clients measure global model acc every round. Ø “Roll back model state by 1 round” if an anomaly is detected. ✅ Regarded as a cure after the attack. It overcomes 2 risks in defense① J All clients can be involved unless they break model. J ∆𝑊 can be encrypted as long as global model is not. Ø Realizable by secure aggregation method (cf. VerifyNet) ΔWi Despite extra time consumption, it shows high robustness 25
Counterpart Defense Method • 10 validators measure per-class accuracy 100% 100% 50% 50% 50% 0% 0% 0% goat bird dog cat goat bird dog cat goat bird dog cat • If any class shows a certain decrease (threshold acc.) from previous round, validator reports it to the server. 100% • The server rolls back the model by 1round. • One has 100 trigger images correctly labeled. ΔWi 26