Code for NLP4IF 2021 workshop paper.
Step 1: Download Data.
- Link to generated headlines data (attack).
- Link to generated headlines data (defense).
- For the real headlines, go to Million Headlines, or contact me for the final dataset (due to copyright concerns I cannot share the data myself publicly).
Step 2: Pretrain two LMs using the HuggingFace library. Pretrain one LM on the "attack" data and another on the "defense". Sample bash script at gpt2_pretraining.sh
.
Step 3: Generate headlines for attack and defense (if you haven't downloaded the ones from Step 1).
Step 4: Merge the real/generated headlines for attack and defense.
Step 5: Finetune your classifiers (using the example scripts provided here, ie. *_classify.py
).