DATA_README.md 3.6 KB

Prepare Dataset

Training data

Please prepare the CC12M dataset. All the images should be stored in a folder. A metafile (csv or tsv file) that contains the image_id and its corresponding caption is needed.

image_id, caption
00001.jpg, a boy is running on the beach,
00002.jpg, The bride was wearing a chic lace.
... 
  • IMPORTANT UPDATE: We provide the script for filtering cc12m subset and constructing cross-image pairs from scratch:
  1. Using multi-processing (e.g. 32 processes) to filter cc12m dataset using Top-K frequently appeared entities. Feel free to modify the entities in data_process_cc12m.py.

    cd datasets
    python data_process_cc12m.py --mode filter --srcdir /path/to/your/cc12m.csv --processor 32
    

    This will generate 32 sub-files in subset/ directory.

  2. Next, merge these sub-files into a single metafile (and optionally delete the sub-files by passing --remove_subfiles=True).

    python data_process_cc12m.py --mode merge --dstdir /path/to/your/cc12m/subsets/ --remove_subfiles True
    
  3. Construct cross-image pairs based on the filtered data.

    python data_process_cc12m.py --mode makepair --metafile /path/to/your/cc12m_filtered_subset.csv
    

    The generated metafile is automatically saved to /path/to/your/cc12m_filtered_subset_pair.csv. This metafile can be used for training the model.

  4. Modify the root path and metafile path in configs/ovsegmentor/ovsegmentor_pretrain_vit_bert_stage1.yml

    data:
    train:
        root_dir: '/path/to/your/cc12m_images/'
        meta_file: '/path/to/your/cc12m_filtered_subset_pair.csv'
    

One may also try different image-caption datasets (e.g. YFCC, RedCaps) by providing the images and the corresponding meta-file.

Evaluation

  1. Follow the official websites to prepare PASCAL VOC, PASCAL Context, COCO, and ADE20K.
  2. For COCO dataset, convert it to semantic segmentation format following GroupViT.

    python convert_dataset/convert_coco_object.py /path/to/your/coco/ -o /path/to/output/coco/
    
  3. Change the image dirs in segmentation/configs/base/datasets/*.py.

  4. PASCAL VOC

    data_root = '/path/to/your/VOCdevkit/VOC2012'
    
  5. PASCAL CONTEXT

    data_root = '/path/to/your/pascal_context/VOCdevkit/VOC2010/'
    
  6. COCO Object

    data_root = '/path/to/your/coco/'
    
  7. COCO STUFF

    data_root = '/path/to/your/coco/'
    
  8. ADE20K

    data_root = '/path/to/your/ADEChallengeData2016/'
    
  9. To enable zero-shot classification evaluation, please prepare the validation set of ImageNet. The metafile of the validation set is already provided here. Modify the image path in configs/ovsegmentor/ovsegmentor_pretrain_vit_bert_stage1.yml

    val:
    root_dir: '/path/to/your/cc12m_images/'