Browse Source

train(default): 调整训练配置和模型设置

- 修改数据批次大小和缓冲区大小
- 更新训练和验证数据集配置
- 调整 AMP 优化级别
- 冻结模型部分层以进行微调
- 更新分布式训练脚本参数
Yijun Fu 1 month ago
parent
commit
b244f314bf
4 changed files with 34 additions and 8 deletions
  1. 12 6
      configs/default.yml
  2. 17 0
      main_group_vit.py
  3. 2 0
      models/builder.py
  4. 3 2
      tools/dist_launch.sh

+ 12 - 6
configs/default.yml

@@ -1,10 +1,10 @@
 data:
-  batch_size: 32
+  batch_size: 256
   pin_memory: true
   num_workers: 6
   # Thomas said it should be at least about 5-10x your batch size; beyond that,
   # the differences become academic.
-  shuffle_buffer: 1250
+  shuffle_buffer: 10000
   seed: ${train.seed}
   dataset:
     meta:
@@ -33,18 +33,24 @@ data:
         path: local_data/imagenet_shards
         prefix: imagenet-val-{000000..000049}.tar
         length: 50000
-      cuhkpedes:
+      cuhkpedes_train:
         type: img_txt_pair
         path: local_data/cuhkpedes_shards
         prefix: cuhkpedes-train-{000000..000255}.tar
         length: 34054
+      cuhkpedes_val:
+        type: img_txt_pair
+        path: local_data/cuhkpedes_shards
+        prefix: cuhkpedes-val-{000000..000023}.tar
+        length: 3078
     train:
       # - gcc3m
       - gcc12m
       - yfcc14m
-      - cuhkpedes
+      - cuhkpedes_train
     val:
-      - imagenet
+      # - imagenet
+      - cuhkpedes_val
 
   img_aug:
     deit_aug: true
@@ -71,7 +77,7 @@ train:
   min_lr: 4e-5
   clip_grad: 5.0
   accumulation_steps: 0
-  amp_opt_level: O1
+  amp_opt_level: O2
   seed: 0
 
   lr_scheduler:

+ 17 - 0
main_group_vit.py

@@ -107,6 +107,23 @@ def train(cfg):
 
     logger.info(f'Creating model:{cfg.model.type}/{cfg.model_name}')
     model = build_model(cfg.model)
+    
+    # load_checkpoint(cfg, model, None, None)
+    
+        # 冻结所有层
+    for param in model.parameters():
+        param.requires_grad = False
+        
+    # 如果你只想冻结特定的层,可以按照以下方式进行
+    # 例如,冻结所有的 img_projector 层
+    for param in model.img_projector.parameters():
+        param.requires_grad = True
+
+    # 如果你只想冻结特定的层,可以按照以下方式进行
+    # 例如,冻结所有的 text_projector 层
+    for param in model.text_projector.parameters():
+        param.requires_grad = True
+    
     model.cuda()
     logger.info(str(model))
 

+ 2 - 0
models/builder.py

@@ -17,5 +17,7 @@ MODELS = Registry('model')
 def build_model(config):
 
     model = MODELS.build(OmegaConf.to_container(config, resolve=True))
+    
+    print(model)
 
     return model

+ 3 - 2
tools/dist_launch.sh

@@ -13,9 +13,10 @@
 
 SCRIPT=$1
 CONFIG=$2
-GPUS=$3
+RESUME=$3
+GPUS=$4
 PORT=${PORT:-29500}
 
 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
-    $SCRIPT --cfg $CONFIG ${@:4}
+    $SCRIPT --cfg $CONFIG \ --resume $RESUME ${@:5}