Merge pull request #508 from QData/example_bug_fix

qiyanjun · web-flow · commit f64df48bebb4 · 2021-08-02T22:29:30.000-04:00
bug fix for codes in example folder
diff --git a/.gitignore b/.gitignore
@@ -45,4 +45,5 @@ checkpoints/
 # vim
 *.swp
 
-.vscode
+.vscode
+*.csv
diff --git a/README.md b/README.md
@@ -71,7 +71,11 @@ or a specific command using, for example,
 textattack attack --help
 ```
 
-The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file. The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
+The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file. 
+
+
+The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
+
 
 ### Running Attacks: `textattack attack --help`
 
@@ -88,7 +92,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
 
 *DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*: 
 ```bash
-textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
+textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
 ```
 
 *Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
@@ -323,7 +327,9 @@ For example, given the following as `examples.csv`:
 "it's a mystery how the movie could be released in this condition .", 0
 ```
 
-The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
+The command 
+```textattack augment --input-csv examples.csv --output-csv output.csv  --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
+```
 will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
 output CSV. (All of this will be saved to `augment.csv` by default.)
 
@@ -453,7 +459,7 @@ create a short file that loads them as variables `model` and `tokenizer`.  The `
 be able to transform string inputs to lists or tensors of IDs using a method called `encode()`. The
 model must take inputs via the `__call__` method.
 
-##### Model from a file
+##### Custom Model from a file
 To experiment with a model you've trained, you could create the following file
 and name it `my_model.py`:
 
@@ -488,14 +494,12 @@ which maintains both a list of tokens and the original text, with punctuation. W
 
 
 
-#### Dataset via Data Frames (*coming soon*)
+#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
 
 
 
 ### Attacks and how to design a new attack 
 
-The `attack_one` method in an `Attack` takes as input an `AttackedText`, and outputs either a `SuccessfulAttackResult` if it succeeds or a `FailedAttackResult` if it fails. 
-
 
 We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
 
diff --git a/README_ZH.md b/README_ZH.md
@@ -82,7 +82,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
 *对 Quora 问句对数据集上训练的 DistilBERT 模型进行 DeepWordBug 攻击*: 
 
 ```bash
-textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
+textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
 ```
 
 *对 MR 数据集上训练的 LSTM 模型：设置束搜索宽度为 4，使用词嵌入转换进行无目标攻击*:
@@ -315,7 +315,7 @@ TextAttack 的组件中，有很多易用的数据增强工具。`textattack.Aug
 "it's a mystery how the movie could be released in this condition .", 0
 ```
 
-使用命令 `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
+使用命令 `textattack augment --input-csv examples.csv --output-csv output.csv  --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
 会增强 `text` 列，约束对样本中 10% 的词进行修改，生成输入数据两倍的样本，同时结果文件中不保存 csv 文件的原始输入。(默认所有结果将会保存在 `augment.csv` 文件中)
 
 数据增强后，下面是 `augment.csv` 文件的内容:
@@ -454,8 +454,6 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
 
 ### 何为攻击 & 如何设计新的攻击 
 
-`Attack` 中的 `attack_one` 方法以 `AttackedText` 对象作为输入，若攻击成功，返回 `SuccessfulAttackResult`，若攻击失败，返回 `FailedAttackResult`。
-
 
 我们将攻击划分并定义为四个组成部分：**目标函数** 定义怎样的攻击是一次成功的攻击，**约束条件** 定义怎样的扰动是可行的，**变换规则** 对输入文本生成一系列可行的扰动结果，**搜索方法** 在搜索空间中遍历所有可行的扰动结果。每一次攻击都尝试对输入的文本添加扰动，使其通过目标函数（即判断攻击是否成功），并且扰动要符合约束（如语法约束，语义相似性约束）。最后用搜索方法在所有可行的变换结果中，挑选出优质的对抗样本。
 
diff --git a/docs/0_get_started/command_line_usage.md b/docs/0_get_started/command_line_usage.md
@@ -40,7 +40,7 @@ For example, given the following as `examples.csv`:
 
 The command: 
 ```
-textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 \
+textattack augment --input-csv examples.csv --output-csv output.csv  --input-column text --recipe eda --pct-words-to-swap .1 \
 --transformations-per-example 2 --exclude-original
 ``` 
 will augment the `text` column with 10% of words edited per augmentation, twice as many augmentations as original inputs, and exclude the original inputs from the
diff --git a/docs/1start/attacks4Components.md b/docs/1start/attacks4Components.md
@@ -12,8 +12,16 @@ This modular design enables us to easily assemble attacks from the literature wh
 ![two-categorized-attacks](/_static/imgs/intro/01-categorized-attacks.png)
 
 
-
-
+- You can create one new attack (in one line of code!!!) from composing members of four components we proposed, for instance: 
+
+```bash 
+# Shows how to build an attack from components and use it on a pre-trained model on the Yelp dataset.
+textattack attack --attack-n --model bert-base-uncased-yelp --num-examples 8 \
+    --goal-function untargeted-classification \
+    --transformation word-swap-wordnet \
+    --constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
+    --search greedy
+```
 
 ### Goal Functions
 
diff --git a/docs/1start/multilingual-visualization.md b/docs/1start/multilingual-visualization.md
@@ -3,14 +3,66 @@ TextAttack Extended Functions (Multilingual)
 
 
 
+## Textattack Supports Multiple Model Types besides huggingface models and our textattack models: 
+
+- Example attacking TensorFlow models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html)
+- Example attacking scikit-learn models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html)
+- Example attacking AllenNLP models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html)
+- Example attacking Kera models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html)
+
+
 ## Multilingual Supports
 
-- see example code: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) for using our framework to attack French-BERT. 
 
-- see tutorial notebook: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html) for using our framework to attack French-BERT. 
+- see tutorial notebook for using our framework to attack French-BERT.: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html)  
+
+- see example code for using our framework to attack French-BERT: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) . 
+
+
+
+## User defined custom inputs and models 
+
+
+### Custom Datasets:  Dataset from a file
+
+Loading a dataset from a file is very similar to loading a model from a file. A 'dataset' is any iterable of `(input, output)` pairs.
+The following example would load a sentiment classification dataset from file `my_dataset.py`:
+
+```python
+dataset = [('Today was....', 1), ('This movie is...', 0), ...]
+```
+
+You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
+
+
+#### Custom Model:  from a file
+To experiment with a model you've trained, you could create the following file
+and name it `my_model.py`:
+
+```python
+model = load_your_model_with_custom_code() # replace this line with your model loading code
+tokenizer = load_your_tokenizer_with_custom_code() # replace this line with your tokenizer loading code
+```
+
+Then, run an attack with the argument `--model-from-file my_model.py`. The model and tokenizer will be loaded automatically.
+
+
+
+## User defined Custom attack components 
+
+The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
+
+- custom transformation example @ [https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html](https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html)
+
+- custome constraint example @[https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint)
+
+
+
+
+## Visulizing TextAttack generated Examples; 
 
 
+- You can visualize the generated adversarial examples vs. see examples, following visualization ways we provided here: [https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html)
 
-## We have built a new WebDemo For Visulizing TextAttack generated Examples; 
+- If you have webapp, we have also built a new WebDemo [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo) for visualizing generated adversarial examples from textattack.. 
 
-- [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo)
diff --git a/docs/3recipes/attack_recipes_cmd.md b/docs/3recipes/attack_recipes_cmd.md
@@ -36,7 +36,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
 
 *DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*: 
 ```bash
-textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
+textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
 ```
 
 *Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
diff --git a/docs/3recipes/augmenter_recipes_cmd.md b/docs/3recipes/augmenter_recipes_cmd.md
@@ -38,7 +38,10 @@ and the number of augmentations per input example. It outputs a CSV in the same
 "it's a mystery how the movie could be released in this condition .", 0
 ```
 
-The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
+The command
+```
+textattack augment --input-csv examples.csv --output-csv output.csv  --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
+```
 will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
 output CSV. (All of this will be saved to `augment.csv` by default.)
 
diff --git a/examples/attack/attack_camembert.py b/examples/attack/attack_camembert.py
@@ -4,6 +4,7 @@
 import numpy as np
 from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pipeline
 
+from textattack import Attacker
 from textattack.attack_recipes import PWWSRen2019
 from textattack.datasets import HuggingFaceDataset
 from textattack.models.wrappers import ModelWrapper
@@ -20,11 +21,11 @@ class HuggingFaceSentimentAnalysisPipelineWrapper(ModelWrapper):
         [[0.218262017, 0.7817379832267761]
     """
 
-    def __init__(self, pipeline):
-        self.pipeline = pipeline
+    def __init__(self, model):
+        self.model = model
 
     def __call__(self, text_inputs):
-        raw_outputs = self.pipeline(text_inputs)
+        raw_outputs = self.model(text_inputs)
         outputs = []
         for output in raw_outputs:
             score = output["score"]
@@ -55,7 +56,6 @@ def __call__(self, text_inputs):
 recipe.transformation.language = "fra"
 
 dataset = HuggingFaceDataset("allocine", split="test")
-for idx, result in enumerate(recipe.attack_dataset(dataset)):
-    print(("-" * 20), f"Result {idx+1}", ("-" * 20))
-    print(result.__str__(color_method="ansi"))
-    print()
+
+attacker = Attacker(recipe, dataset)
+results = attacker.attack_dataset()
diff --git a/examples/attack/attack_from_components.sh b/examples/attack/attack_from_components.sh
@@ -3,5 +3,5 @@
 # model on the Yelp dataset.
 textattack attack --attack-n --goal-function untargeted-classification \
     --model bert-base-uncased-yelp --num-examples 8 --transformation word-swap-wordnet \
-    --constraints edit-distance^12 max-words-perturbed:max_percent=0.75 repeat stopword \
+    --constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
     --search greedy
diff --git a/examples/augmentation/augment.csv b/examples/augmentation/augment.csv
@@ -1,11 +1,11 @@
 text,label
-"the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
-"the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal.",1
-the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
-the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth .,1
-take care of my cat offers a cheerfully different slice of asian cinema .,1
-take care of my cat offers a refreshingly different slice of asian cinemas .,1
-a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide .,0 
-a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide .,0 
-it's a mystery how the cinematography could be released in this condition .,0
-it's a mystery how the movies could be released in this condition .,0
+"the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal.",1
+"the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
+the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
+the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth .,1
+take care different my cat offers a refreshingly of slice of asian cinema .,1
+take care of my cat offers a different slice of asian cinema .,1
+a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide .,0 
+a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide .,0 
+it's pic a mystery how the movie could be released in this condition .,0
+it's a mystery how the movie could in released be this condition .,0
diff --git a/examples/augmentation/augment.sh b/examples/augmentation/augment.sh
@@ -1,2 +1,2 @@
 #!/bin/bash
-textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
+textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
diff --git a/examples/augmentation/example.csv b/examples/augmentation/example.csv
diff --git a/examples/train/train_lstm_imdb_sentiment_classification.sh b/examples/train/train_lstm_imdb_sentiment_classification.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
+# demonstration of our training script and `datasets` integration.
+textattack train --model-name-or-path lstm --dataset imdb  --epochs 50 --learning-rate 1e-5
diff --git a/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh b/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh
@@ -1,4 +1,4 @@
 #!/bin/bash
 # Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
 # demonstration of our training script and `datasets` integration.
-textattack train --model-name-or-path lstm --dataset rotten_romatoes  --epochs 50 --learning-rate 1e-5
+textattack train --model-name-or-path lstm  --dataset rotten_tomatoes  --epochs 50 --learning-rate 1e-5
diff --git a/textattack/training_args.py b/textattack/training_args.py
@@ -360,6 +360,7 @@ def _add_parser_args(cls, parser):
         # Arguments that are needed if we want to create a model to train.
         parser.add_argument(
             "--model-name-or-path",
+            "--model",
             type=str,
             required=True,
             help='Name or path of the model we want to create. "lstm" and "cnn" will create TextAttack\'s LSTM and CNN models while'

-Original file line number
+Diff line change
 # vim
 *.swp
 -.vscode
 +.vscode
 +*.csv