1、LEHIGH浙江大学BerkeleyUNIVERSITYZhejiangUniversityTextBugger: Generating Adversarial Text AgainstReal-worldApplicationsJinfeng LiShouling JiTianyu DuBo LiTingWangINFORSEC2020#page#Machine Learning For Natural Language ProcessingSentiment AnalysisNegativInformation ExtractionInformation RetrievalMachine
2、LearningTasksForMultipleMachine TranslationQuestion Answering22020/8/21#page#Machine Learning As A Service For NLPMicrosoftGoogleawsAzureCloud PlatformEWATSONfastTextParallelDotsGoogle PerspectiveRThey SayAYLIENmashape2020/8/21#page#Breaking Thing Is EasyRecent works have revealed the vulnerabilitie
3、s of DNNs in image and speech domain The DNNS forimage classification are vulnerable to adversarialimages.Goodfellow et al., ICLR15 Automatic speech recognition systems can be broken down by adversarial audios in physical world.IYuan et al.,USENIX18iV(o.y)买,Open hedoor100XnoamMOH99.3号。Do the adversa
4、rial examples also exist in text domain?Are the MLaaS for NLP also vulnerable to adversarial examples?2020/8/21#page#Preliminaries2020/8/21#page#Adversarial TextWhat is the adversarial text?Carefully generated by adding small perturbations to thelegitimate text.Task:Sentiment Analysis.ClassiferAmazo
5、n AWS.Originallabel:100%NegativeAdversariallabel89%Positive.uu apeu seM alou SIuMesISasoalpor o ueyanHwelasneag Auu Auaal alou Slu pauemlxeLeunoun Atnetf uBiu elam suoeoadxe u ossasuuoed uuuM pleme esOu uamaqthought the movie was terible terribleandlm stil left wondering how shewas everpersuaded to
6、makethis movie.TheScriptisreallyweakweak.Whatisthe challenge forgenerating adversarialtexts? The discrete property of text makes it hard to optimize. Small perturbations in text are usually clearly perceptible, Replacement of a single word may drastically alter the semantics of the sentence2020/8/21
7、#page#Related Works For Generating Adversarial TextsGradient-based Methods Modifying an input text repetitively until it is misclassified.Papernot et al, MILCOM 16 Changing one token to another by a gradient-based optimization method.Ebrahimi et al.,NAACL 18 Perturbing the important words determined
8、 by embedding gradient with hand-crafted synonyms.Samanta etal,arxiv17Out-of-vocabulary Words Breaking machinelearning systems down by randomcharacter manipulations.Belinkovetal,ICLR18 Attacking black-box models by applying random character perturbations.Gao et al.SPW 18 Changing the toxicity score
9、ofthe texts by adding spaces or dots between characters.Hosseinietal.,arxiv172020/8/21#page#Related Works For Generating Adversarial TextsReplace with Semantically/Syntactically Similar Words Only replacing words with semantically similar ones.Alzantot et al.,arXiv 18 Replacing tokens by random word
10、s of the same POS tag with a probability proportional to theembedding similarity. Ribeiro et al ACL 18Other Methods Atacking reading comprehension systems by adding distracting sentences to the input documentUia et al,EMNLP 17 Generating adversarial sequence by Generative Adversarial Networks (GANS)
11、Zhao etal.,ICLR182020/8/21#page#LimitationsThese works are limited in practice due to at least one of the following reasons: Limited to short textsZ Significantly affect the original meaning Need hand-crafted synonyms and typos Requires manual intervention to polish the added sentences Not computati
12、onally efficient2020/8/21#page#TextBugger102020/8/21#page#Framework For TextBuggerText ClassificationOnline PlatformOffline ModelTextConfidenceGradientWordinformationvalueEmbeddingAttack ModelBlack-boxWhite-boxAttack ModelAttack ModelNoiseSfeed backAdversarialText2020/8/2111#page#Threat ModelWhite-b
13、ox Have complete knowledge about the targeted modelBlack-box Do not know the model architecture, parameters or training data Only capable of querying the targeted model with output as the prediction or confidence scoresSentiment AnalysisABUSIVE CONTENT CLASSIFIERTt of your brand.product orsorvicgPna
14、dncfAnalyso5.80号在074.80%914192020/8/2112#page#Step 1:Finding Important WordsWhite-boxattack Find important words by gradient information.OF()Cx:=Jf(i,y)=0iOF()0F()Denotes:xistheinput text xiistheitn word inx.F(a)is the confidence value ofthe jtn class.Cr.istheimportance ofword xi N is the total numb
15、er of words in x.K is the total number of classes.132020/8/21#page#Step 1: Finding Important WordsBlack-box attack50.20 Find important sentences20.15Centence(2)=F(8)号0.1030.05Sordered Sort(s) according to Cacnienec()Delete sentencesinSordered if F(si)2-0.70-0.75 Findimportant words foreach sentencei
16、n Sordered-0.90C=F(ui,u2.m)-F(u.-1,1.m)Denotes:Sentence:ltis so laddish andjuvenile onlySiis theitn sentence in theinputtext xteenage boys could possibly find itfunny.F(sa) is srs confidence value ofthe predictedclassy. Sordered is the important sentences setCsentencc(zis the importance ofwordSi Cuy
17、 is the importance of thejth word in Si142020/8/21#page#Step 2: Bugs GenerationCharacter-level perturbation: out-of-vocabulary phenomenon Insert: Insert a space into the word. Delete: Delete a random character of the word.Swap:Swap random two adjacent letters in the word.Substitute-C(Sub-C):Replace
18、characters with visually similar characters or adjacent characters in the keyboard.Word-level perturbation: nearest neighbor searching in the embedding space Substitute-W(Sub-W):Replace a word with its top k nearest neighbors in a context-awareword vector space.OriginalInsertDeleteSwapSub-CSub-Wfool
19、ishf oolishfolishfooilshfoOlishsillyawfullyawfull yawfulyawflulyawfullyterriblyclichsclichesclich esclcihesclichescliche2020/8/2115#page#Step 3: Replacing Important Word By Generated BugOptimal bug selection choose the optimal bug according to the change of the confidence valuecandidate(k)=replace a
20、 with brin score(k)=F()-F(candidate(k)Important word replacement Replace the important word by the selected optimal bug Repeat until “convergence” the semantic similarity is below the threshold the new text is misclassified by the classifier2020/8/2116#page#Attack Evaluation2020/8/21#page#Case Study
21、C Sentiment AnalysisToxic Content Detection2020/8/2118#page#Attack Evaluation: Sentiment AnalysisDataset IMDB:50000 positive and negative movie reviews Rotten Tomatoes Movie Reviews(MR):5,331 positive and 5,331 negative snippetsTargeted ModelWhite-box models:LR,CNN,LSTMMicrosoftGoogleaws Real-world
22、Online Platforms:AzureCloud PlatformBMWATSOHfastTextThey SayAYLIE厂ParallelDotsBaseline Algorithms white-box: Random, FGSM+NNS (Nearest Neighbor Search), DeepFool+NNSBlack-box: DeepWordBug192020/8/21#page#Attack Evaluation: Sentiment AnalysisEvaluation Metrics人Edit Distance Jaccard Similarity Coeffic
23、ientAnBAnB Euclidean Distanced((p,g)=(p-4)+(D2-02)+.+(pn-4n) Semantic Similaritypqip.g2020/8/2120#page#Important Words Selected By TextBuggerhorriblemuchprettyOMlong seenreallactuallyaebettebudgetitherSlittlefimsnothingpoorwhOreasonaHergwysupposedcasthorrorsomethinggbadstupidA80probablywithouticould
24、enough0awfulorinc36moneyUtyingalmostshow directorevemoviesdWorstwasteanyonePoo66fwholeleastterrible.bas!lot2020/8/2121#page#Generated Adversarial TextsSuccessful Attack ExamplesTask:Sentiment Analysis.Classifier: CNN.Originallabel:99.8%Negative.Adversarallabel:81.0%Positive.Text:llove theseawful awf
25、ul80ssummercampmovies.The best part aboutPartyCampis thefact that it literallyliteralyhasneNoplot.Theeliehesclichsherearelimitless:thenerdsvs.thejocks,thesecretcamerainthegirls lockerroom,thehikers happening uponanudistcolonythecontestat theconclusion,thesecretly horny campadministrators,and the emb
26、arrassingty embarrassingly foolish foolish sexual innuendo littered throughout.This movie will make youlaugh,butneverintentionally.lrepeat,neverTask:SentimentAnalysis.ClassifierAmazonAWS.Original label:100%Negative.Adversarial label:89%Positive.Textlwatched this movie recently mainly becauselama Hug
27、e fan of Jodie Fosters.lsaw this movie was made rightthought themoviewas terfibleterrib1eandmstill left wondering howshewasever persuadedto make this movie.TheScriptis reallyweakweak.2020/8/2122#page#Attack Performance: Effectiveness And EfficiencyWhite-box AttackTABLE IIRESULTS OF THE WHITE-BOXATTA
28、CKS ON IMDB ANDMR DATASETSRandomFGSM+NNS 2DeepFool+NNS I2TEXTBUGCERModelDatasetAccuracySuceessPerturbedPerturbedPerturbedSuccessPerturbedSuccessSuccessRateWordRateWordRateWordRatePIOM2.1%92.7%6.1%MR73.7%10%32.4%4.3%35.2%4.9%LRIMDB82.1%2.7%10%41.1%8.7%30.0%5.8%95.2%4.9%1.5%10%85.1%9.8%MR78.1%25.7%7.5
29、%28.5%5.4%CNNIMDB10%36.2%10.6%2.7%89.4%1.3%23.9%90.5%4.2%MR80.1%1.8%10%25.0%24.4%80.2%10.2%6.6%11.3%LSTM10%IMDB90.7%0.8%31.5%9.0%26.3%3.6%6.9%86.7%Remarks Choosing important words to modify is necessary Effective: TextBugger has high attack success rate on all models and performs betterthan baseline
30、s. Evasive:TextBugger perturbs few words to fool the models2020/8/21#page#Attack Performance: Effectiveness And EfficiencyBlack-box AttackTABLE II.RESULTS OF THE BLACK-BOXATTACK ON IMDBDeepWordBug 1ljTEXTBUGGERTargeted ModelOriginal AccuracySuccess RatePerturbed WordSuecess RateTime(s)Time(s)Perturb
31、ed Word1.9%85.3%43.6%266.6910%70.1%33.47Google Cloud NLP89.6%34.5%690.5910%97.1%99.288.6%IBM Waston89.6%56.3%182.0810%100.0%23.015.7%Microsof Azure75.3%68.1%43.9810%100.0%4.611.2%Amazon AWSFacebookfastText86.7%67.0%0.1410%85.4%0.035.0%2.2%ParalleDots63.5%79.6%812.8210%92.0%129.029.5%943%134.034.1%Th
32、eySay86.0%888.9510%70.0%63.8%10%90.0%44.961.4%Aylien Sentiment674.2181.7%10%8.9%TextProcessing57.3%303.0497.2%59.4210%Mashape Sentiment88.0%31.1%585.7265.7%117.136.1%Remarks Effective: TextBugger has higher attack success rate against all online platforms than DeepWordBug Evasive: TextBugger only pe
33、rturbs fewer words than DeepWordBugEfficient: TextBugger spends less time than DeepWordBug42020/8/21#page#Attack Performance: Change Of ConfidenceSentiment Score Distribution1.0OriginalTextm Original Text1.2中0.8PerturbedTextPerturbed Text三21.00.510.2南城0.020.6-0.2TF0.4店-0.5式0.2-0.80.0-1.0GoogleWatson
34、AWSAzurefastText(a)IMDB(b)IMDBRemarks TextBuggergreatly changes the confidence value of the classification results IBM Watson is more sensitive to the adversarialtexts generated by TextBugger.2020/8/2125#page#Utility Analysis:White-box Attack1.01.01.01皖LRCNNCNN0.80.80.80.8LSTMLSTM0.60.6目手0.680.40.40
35、.4LR-LR0.20.20.20.2CNN-CNNLSTMSTM0.00.00.00.0061000.20.41.020080.023450.20.40.60.81.0Edit DistanceJaccardCoefficentEuclidean DistanceSemanticSimilarity(a) IMDBRemarks The generated adversarial texts preserve good word-level and vector-level utility2020/8/2126#page#Utility Analysis:Black-box Attack1.
36、01.01.0TextBuggerTextBugger-DeepWordBug080.80.80.60.60.20.20.2TextBuggeTextBugge-DeepWordBugDeepWordBug0.010.00.010.00.20304050.60.70.809.00.20.40.68010121.0EditDistanceJaccardCoefficientEuclidean DistanceSemanticSimilarity(a) IMDBRemarks TextBugger generates higher quality adversarial te
37、xts than DeepWordBug.2020/8/21#page#The Impact Of Document LengthThe lmpact of Document Length on Attack Performance0.81601.0Google Cloud NLP140Score0.6MicrosoftAzure120IBMWatson0.410080.6800.2Google Cloud NLPGoogle Cloud NLP福40MicrosoftAzure0.0MicrosoftAzure0.220IBMWatsonIBMWatson0.0-0.20-252550751
38、00 125 150 175 20050100 125 150 175 200255075100125150 175 20075WordsWordsWords(b) Score(e)Time(a) Success RateRemarks Length has little impact on the success rate,but may decrease the change of negative classs confidence value The time required for generating one adversarial textincreases slightly
39、as the length grows.282020/8/21#page#The Impact Of Document LengthThe Impact of Document Length on The Utility of Generated Adversarial Texts.16Google Cloud NLPSimilarity14Perturbed WordsMicrosoftAzureIBMWatson0.980.86Google Cloud NLP0.7MicrosoftAzureIBM Watson0.65255075520050751001251501
40、75200WordsWords(a) Number of Perturbed WordsGb) Semantic SimilarityRemarks Longer document length leads to more perturbed words, Theincreasing perturbed words do not decrease thesemanticsimilarity ofthe adversarialtexts.2020/8/21#page#Bug Distribution0.6SwapInsert0.5Sub-CDeleteProportion0.4Sub-W0.30
41、.20.10.0AWS fastTextGoogle Watson AzureRemarks Azure and AWS aresensitive to theinsert bug Watson and fastText are sensitive to Sub-C Delete and Sub-W are used less than others2020/8/2130#page#Further AnalysisTransferabilityUser Study2020/8/2131#page#TransferabilityTABLE VII.TRANSFERABILITY ON IMDB
42、AND MR DATASETSBlack-box APIsWhite-box ModelsDataset ModelCNNLSTMIBMLRAzureGoogle fastText AWSLR20.3%14.5%19.0%95.2%14.5%24.8%15.1%18.8%IMDBCNN20.0%28.9%90.5%21.2%21.2%31.4%20.4%25.3%LSTM25.1%28.8%23.8%86.6%27.3%26.7%27.4%23.1%LR92.7%29.8%18.3%28.7%22.4%39.5%31.3%19.8%MRCNN26.5%82.1%25.3%21.0%19.1%2
43、0.5%31.1%28.2%LSTM24.6%21.9%22.5%16.5%18.7%21.4%88.2%17.7%Remarks Transferability also exists in adversarial texts among models and online platforms. Transferability can be used to attack online platforms even they have call limits.322020/8/21#page#page#Vulnerability ReportawsIBM CloudTWe.2020/8/213
44、4#page#SummaryWe proposed TextBuggera framework for generating adversarial texts effectively and efficiently Effective:It outperforms state-of-the-art attacksin terms ofattack success rateunderbothwhite-box and black-box settings. Evasive: lt preserves the utility of benign text. Efficient: It gener
45、ates adversarial text with computational complexity sub-linearto the text lengthWe evaluated TextBugger on 15 realworld online applicationsDataset:IMDB,MR and Kaggle. Application: Includes sentiment analysis and toxic content detection.Utility-preserving:TextBugger has little impact on human understandingWe further discuss two potential defense strategies to defend against such attacks2020/8/2135#page#QA2020/8/2136#page#