Downloading and Organizing Existing Tasks

Multimodal Tasks

CLiMB initially includes four vision-and-language tasks:

Visual Question Answering (VQAv2)
Natural Language for Visual Reasoning (NLVR2)
SNLI-Visual Entailment (SNLI-VE)
Visual Commonsense Reasoning (VCR)

Data files for these four tasks can be downloaded from their respective websites. The data files are organized as follow:

data
├── flickr30k
│   └── flickr30k_images/
├── ms-coco
│   ├── images/
├── nlvr2
│   ├── data
│   │   ├── balanced
│   │   ├── dev.json
│   │   ├── filter_data.py
│   │   ├── test1.json
│   │   ├── train.json
│   │   └── unbalanced
│   └── images
│       ├── dev/
│       ├── test1/
│       └── train/
├── snli-ve
│   ├── snli_ve_dev.jsonl
│   ├── snli_ve_test.jsonl
│   └── snli_ve_train.jsonl
├── vcr
│   ├── annotation
│   │   ├── test.jsonl
│   │   ├── train.jsonl
│   │   └── val.jsonl
│   ├── drawn_images/
│   ├── vcr1images/
└── vqav2
    ├── ans2label.pkl
    ├── v2_mscoco_train2014_annotations.json
    ├── v2_mscoco_val2014_annotations.json
    ├── v2_OpenEnded_mscoco_train2014_questions.json
    └── v2_OpenEnded_mscoco_val2014_questions.json

Items ending with / are directories, typically containing a large number of images.

For NLVR2:

the link for downloading images can be requested using this form.
Download the three zip files (train_img.zip, dev_img.zip, test_img.zip) using wget into nlvr2/images/.
Run bash src/utils/preproc_nlvr2_images.sh, with the IMAGES_DIR variable set to the full path for nlvr2/images/.
The files in nlvr2/data/ can be downloaded from the NLVR2 GitHub repo.

The drawn_images folder for the VCR task can be generated from the original vcr1images, using the scripts available here.

Language-Only Tasks

CLiMB initially includes five language-only tasks:

IMDb
SST-2
PIQA
HellaSwag
CommonsenseQA

We provide the script utils/download_lang_mc.sh for downloading multiple-choice tasks from the official websites linked above. Note: we split our dev set (held-out) from the training set for hyper-parameter tuning and use the original dev set as the test set, as we do not have the labels of the original test set.

Vision-Only Tasks

CLiMB initially includes four vision-only tasks:

ImageNet-1000
iNaturalist 2019
Places365
COCO multi-label object classification

Data files for these four tasks can be downloaded from their respective websites. The data files are organized as follow:

YOUR_DATA_DIR
├── ILSVRC2012/
|   ├── train/
|   |   ├── n01440764/
|   |   ├── n01443537/
|   |   └── ...
|   ├── val/
|   |   └── ILSVRC2012_val_*.JPEG
|   └── LOC_val_solution.csv
|── iNat2019/
|   ├── train_val2019/
|   |   ├── Amphibians/
|   |   ├── Birds/
|   |   └── ...
|   ├── train2019.json
|   └── val2019.json
|── Places365/
|   ├── train/
|   |   ├── airfield/
|   |   ├── airplane_cabin/
|   |   └── ...
|   └── val/
|       ├── airfield/
|       ├── airplane_cabin/
|       └── ...
└── ms-coco/
    ├── images/
    └── detections/
        └── annotations/
            ├── instances_train2017.json
            └── instances_val2017.json

Note: we split our dev set (held-out) from the training set for hyper-parameter tuning and use the original dev set as the test set, as we do not have the labels of the original test set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

DATA_DOWNLOAD.md

DATA_DOWNLOAD.md

Downloading and Organizing Existing Tasks

Multimodal Tasks

Language-Only Tasks

Vision-Only Tasks

Files

DATA_DOWNLOAD.md

Latest commit

History

DATA_DOWNLOAD.md

File metadata and controls

Downloading and Organizing Existing Tasks

Multimodal Tasks

Language-Only Tasks

Vision-Only Tasks