Machine learning models, despite their widespread use in everyday applications, often suffer from unreliable performance due to the distribution shifts between training and inference. Distribution shifts are ubiquitous, occurring in both low-level features and high-level semantics, exacerbated by the non-uniformity of real-world data, particularly in long-tailed distributions where some classes appear much more frequently than others. This imbalance results in non-uniform model performance across classes, posing risks for applications requiring precise information. In contrast, humans are adept at adapting to such challenges. Inspired by this, we focus on addressing the distribution shifts in vision tasks caused by long-tail distributions to make machine learning classifiers more realistic like humans.
In this thesis, we aim to redefine long-tail recognition more broadly and concentrate on crafting a classifier that mirrors human adaptability to distribution shifts, a crucial aspect lacking in modern classifiers that is essential for constructing reliable AI systems. Expanding beyond the traditional framework, we extend long-tail recognition to encompass combinatorial label spaces. Furthermore, we explore a hierarchical label space within a single long-tail distribution, offering adaptable control for user-defined systems based on the model's competent level or the desired label space of the user. By delving into the core of the long-tail concept, we demonstrate that significant performance enhancements are attainable through appropriate data sampling techniques, even with straightforward architectures. We also identify hierarchical consistency as a key factor for building a model aligned with human cognition.