-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification Loss: CE vs BCE #3
Comments
BCE computes sigmoid predictions independently for each class, while CE introduces inter-class competition. For BCE, an instance is allowed to be both class-A and class-B at the same time, which is better for multi-label task (e.g. OpenImage dataset). But for single-label instances (e.g. COCO), using BCE could cause high-score false positives and harm the AP. |
While this is true, in theory, it is also clearly stated in the YOLOv3 paper that BCE is a big part of the models' general success (in COCO and PASCAL-VOC). |
Hello, I'm working on getting BCE loss to work in a multi-label task, the majority of my classes follow a hierarchical one vs all type classification, but a few leaves of my hierarchical tree could have multiple states. I'm experimenting with using BCE for the entire tree as in the original darknet paper, but I have yet to get any good results. My loss decreases significantly, but in the end my classification predictions are completely wrong. (Calling a car a street sign ^_-) |
@dtmoodie hello, Using BCE for hierarchical multi-label classification can be challenging, especially if some leaves in the hierarchical tree have multiple states. This can lead to unexpected results with misclassified predictions. One potential approach to consider is adapting the loss function or the model architecture to better handle the hierarchical structure and multiple states. Additionally, experimenting with different loss functions or model configurations tailored to hierarchical multi-label classification tasks might yield improved results. If you'd like further guidance on this, feel free to consult the Ultralytics Docs for additional insights and considerations while experimenting with BCE loss in your multi-label task. Keep up the great work, and best of luck with your experimentation! |
When developing the training code I found that replacing Binary Cross Entropy (BCE) loss with Cross Entropy (CE) loss significantly improves Precision, Recall and mAP. All show about 2X improvements using CE, though the YOLOv3 paper states these loss terms as BCE in darknet.
The two loss terms are on lines 162 and 163 of
models.py
. If anyone has any insight into this phenomenon I'd be very interested to hear it. For now you can swap the two back and forth. Note that SGD does not converge using either BCE or CE, so that issue appears independent of this one.The text was updated successfully, but these errors were encountered: