Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yolo-wolrd-l 在 coco上finetune 无法成功复现 #76

Open
Hudaodao99 opened this issue Feb 26, 2024 · 7 comments
Open

yolo-wolrd-l 在 coco上finetune 无法成功复现 #76

Hudaodao99 opened this issue Feb 26, 2024 · 7 comments
Labels
bug Something isn't working Working on it now!

Comments

@Hudaodao99
Copy link

Hudaodao99 commented Feb 26, 2024

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。
过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

@HGao-cv
Copy link

HGao-cv commented Feb 28, 2024

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。 过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

Hello, do you use mask-refine when fine-tuning on coco? If I don't use mask-refine, I can't get normal results. Have you ever encountered similar problems?

@wondervictor
Copy link
Collaborator

@Hudaodao99 @HGao-cv 我这边重新检查一下finetune的config

@Sally-lxy
Copy link

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

@Hudaodao99
Copy link
Author

Hudaodao99 commented Feb 29, 2024

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

你好,我刚尝试用 configs/finetune_coco/yolo_world_l_efficient_neck_2e-4_80e_8gpus_mask-refine_finetune_coco.py的config进行finetune,达到了论文指标。但原文件的权重文件是没有加入cc3m的pretain(这个结果在论文中没有提到),仍旧可以达到map=53.3,这是我疑惑的点。

@Hudaodao99
Copy link
Author

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。 过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

Hello, do you use mask-refine when fine-tuning on coco? If I don't use mask-refine, I can't get normal results. Have you ever encountered similar problems?

Yes, if I use mask-refine when fine-tuning on coco, I can get the same results as on github. if not, map = 0.486, it's lower.

@wondervictor
Copy link
Collaborator

@Sally-lxy @Hudaodao99 目前问题看下来应该是mask-refine带来的数据增强问题,我们之前忽略了w/o mask-refine的实验,我这边将尽快测试一下w/o mask-refine的性能,并重新检查下开源版本的一系列configs,请稍等片刻。

@wondervictor wondervictor added bug Something isn't working Working on it now! labels Mar 19, 2024
@wondervictor
Copy link
Collaborator

Hi all (@Sally-lxy, @Hudaodao99), we have explored the errors about pre-training without maks-refine and fixed this issue preliminarily. With mask-refine, YOLO-World performs significantly better than the paper version. Without mask-refine, YOLO-World still obtains competitive performance, e.g., YOLO-World-L obtains 52.8 AP on COCO.

You can find more details in configs/finetune_coco, especially for the version without mask-refine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Working on it now!
Projects
None yet
Development

No branches or pull requests

4 participants