GitHub - antmarakis/generated_headline_detection: Code for "Identifying Automatically Generated Headlines using Transformers"

Identifying Automatically Generated Headlines using Transformers

Code for NLP4IF 2021 workshop paper.

How to replicate results?

Step 1: Download Data.

Link to generated headlines data (attack).
Link to generated headlines data (defense).
For the real headlines, go to Million Headlines, or contact me for the final dataset (due to copyright concerns I cannot share the data myself publicly).

Step 2: Pretrain two LMs using the HuggingFace library. Pretrain one LM on the "attack" data and another on the "defense". Sample bash script at gpt2_pretraining.sh.

Step 3: Generate headlines for attack and defense (if you haven't downloaded the ones from Step 1).

Step 4: Merge the real/generated headlines for attack and defense.

Step 5: Finetune your classifiers (using the example scripts provided here, ie. *_classify.py).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
bert_classifiy.py		bert_classifiy.py
generate_headlines.py		generate_headlines.py
gpt2_pretraining.sh		gpt2_pretraining.sh
lr_classify.py		lr_classify.py
lstm_classify.py		lstm_classify.py
survey_generated.txt		survey_generated.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Automatically Generated Headlines using Transformers

How to replicate results?

About

Releases

Packages

Languages

antmarakis/generated_headline_detection

Folders and files

Latest commit

History

Repository files navigation

Identifying Automatically Generated Headlines using Transformers

How to replicate results?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages