MOODv2: Masked Image Modeling for Out-of-Distribution Detection
Abstract
The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
Index Terms:
Computer Vision, Out-of-Distribution Detection, Outlier Detection, Masked Image Modeling1 Introduction
Areliable visual recognition system not only provides correct predictions on known context (also known as in-distribution data) but also detects unknown out-of-distribution (OOD) samples and rejects (or transfers) them to human intervention for safe handling. This motivates the applications of outlier detectors before feeding input to the downstream networks, which is the main task of OOD detection, also referred to as novelty or anomaly detection. OOD detection is the task of identifying whether a test sample is drawn far from the in-distribution (ID) data or not. It is at the cornerstone of various safety-critical applications, including medical diagnosis [1], fraud detection [2], autonomous driving [3], etc. A representative in-distribution feature space representation is crucial for out-of-distribution detection. A well-crafted feature representation significantly enhances the performance via most mainstream OOD detection score functions. Our research is dedicated to refining feature representations tailored for OOD detection, with the aim of advancing the entire field.
Existing methods perform contrastive learning [4, 5] or pretrain classification on a large dataset [6, 7, 8, 9] to detect OOD samples. The former methods classify images according to the pseudo labels while the latter classifies images based on ground truth, whose core tasks are both to fulfill the classification target. However, research on backdoor attack [10, 11] shows that when learning is represented by classifying data, networks tend to take a shortcut to classify images. In a typical backdoor attack scene [11], the attacker adds secret triggers on original training images with the visibly correct label. During the course of testing, the victim model classifies images with secret triggers into the wrong category. Research in this area demonstrates that networks only learn specific distinguishable patterns of different categories because it is a shortcut to fulfill the classification requirement. Nonetheless, learning these patterns is ineffective for OOD detection. Thus, learning representations by classifying ID data for OOD detection may not be satisfying. For example, when patterns similar to some ID categories appear in OOD samples, the network could easily interpret these OOD samples as the ID data and classify them into the wrong ID categories, as shown in Fig. 2.
To remedy this issue, we introduce the reconstruction-based pretext task. Different from contrastive learning in existing OOD detection approaches [4, 5], our method forces the network to achieve the training purpose of reconstructing the image and thus makes it learn pixel-level feature representation. Specifically, we adopt the masked image modeling (MIM) [12] as our self-supervised pretext task, which has been demonstrated to have great potential in both natural language processing [13] and computer vision [12, 14]. In the MIM task, a proportion of image patches are randomly masked. The network learns information from the remaining patches to speculate the masked patches and restore tokens of the original image. The reconstruction process enables the model to learn from the prior effective ID feature representation rather than just learning different patterns among categories in the classification process. In our work, we observed that the pre-trained models effectively reconstruct ID images, whereas they exhibit distinct domain differences when it comes to the OOD domain (Fig. 4). This visual discrepancy clearly underscores the existing domain gap in model features between ID and OOD data, offering valuable insights for OOD detection.
To validate the effectiveness of our ID feature representation, we conduct experiments to test its performance with various mainstream OOD detection score functions. We employed OOD score functions encompassing probability-based [15, 16], logits-based [17, 16], features-based [7, 18, 19], and hybrid methods utilizing both logits and features [7]. In the context of a comparative analysis spanning classic classification [20], contrastive learning [21, 22], and masked image modeling pretext tasks [12, 23], our findings underscore the dominant role of reconstruction-based strategies in the field of OOD detection, as illustrated in Fig. 1.
Furthermore, we conduct a comprehensive analysis of the experimental results and observe that our approach not only significantly improves the overall results but also substantially reduces the disparities among score functions. This observation underscores that even simple score functions can perform on par with more complex ones when a representative ID feature representation is utilized. These findings further emphasize the critical importance of effective feature representation in OOD detection. More details are in Sec. 3.2. Ultimately, MOODv2 demonstrates remarkable enhancements, achieving a substantial 14.30% increase, reaching 95.68% AUROC on ImageNet. On CIFAR-10, our results significantly improved to an impressive 99.98%, marking a notable 0.35% enhancement compared to the previous state-of-the-art.
2 Related Works
2.1 Out-of-distribution Detection
Many scoring functions have been developed by researchers to distinguish between in-distribution and out-of-distribution examples. These functions are designed to exploit properties that are typically exhibited by ID examples but violated by OOD examples, and vice versa. These scores are primarily derived from three sources:
- 1.
- 2.
- 3.
After a thorough analysis of the performance and their correlations with various score functions and pretext tasks, our work follows the hybrid methods combining logit and feature [7] and includes the reconstruction-based methods as a pretext task. We will explain the implementation details later in this paper.
2.2 Self-Supervised Pretext Task
In the ever-evolving landscape of computer vision and deep learning, a multitude of strategies and techniques have been devised to enhance the capacity of models to understand and process visual data:.
-
1.
Classification task: Vision models are pre-trained via classical classification task [20].
-
2.
Contrastive Learning tasks: MOCOv3 [21] and DINOv2 [22] are advanced contrastive learning methods used for self-supervised representation learning. These methods focus on learning representations by contrasting positive pairs (e.g., different augmentations of the same image) with negative pairs (e.g., augmentations from different images). MOCOv3 extends the MOCO framework [25] with a momentum encoder and dynamic queues for improved performance. DINOv2 introduces a clustered teacher network and an asymmetric loss to learn efficient representations.
-
3.
Masked Image Modeling Tasks: Data-Efficient Image Transformer (BEiT series [12, 23]) are self-supervised learning tasks that involve masked image modeling. In these tasks, a portion of an image is randomly masked, and the model’s objective is to predict the masked pixels, effectively filling in the blanks.
These methods and tasks represent cutting-edge approaches in the field of computer vision and deep learning. They have led to substantial improvements in the ability of models to learn useful visual representations from unlabeled data, enabling better performance on various downstream vision tasks.
Multiple existing methods take advantage of self-supervised tasks to guide the learning of representation for OOD detection. Previous work [4, 5] presents contrastive learning models as feature extractors. However, existing approaches of classifying transformed images according to contrastive learning possess similar limitations – that is, the model tends to learn the specific patterns of categories [10, 26], which are beneficial for classification but do not help understand the intrinsic ID representation. In our work, we address this issue by performing the masked image modeling task for OOD detection.
2.3 Training Strategy
Numerous approaches have been developed to address OOD-awareness in training loss [27]. These methods often involve the introduction of regularization terms aimed at encouraging a clearer separation between ID and OOD features [28, 29]. In some cases, networks are augmented with confidence estimation branches, utilizing misclassified in-distribution examples as proxies for out-of-distribution ones [27]. MOS [29] adapts the loss function by incorporating a predefined group structure, enabling the minimum group-wise “else” class probability to serve as an indicator of OOD classification. An alternative approach [28] focuses on compelling ID samples to embed into a union of 1-dimensional subspaces during training, and it evaluates the minimum angular distance between the feature and class-wise subspaces.
In contrast to these approaches, our method belongs to the lightweight training-free methods [7, 30], which doesn’t necessitate retraining the model. Therefore, it not only offers a more straightforward application but also preserves the accuracy of in-distribution classification.
2.4 MOODv1
Our previous version MOODv1 [30] has introduced masked image modeling pretraining strategy into the OOD detection (MOOD) and achieved promising results. However, there are still concerns:
Firstly, previous studies [30, 4, 5] have typically necessitated fine-tuning a model on each in-distribution dataset. The expense of training becomes notably high when dealing with a substantial number of ID datasets to be assessed, such as in one-class OOD detection [4, 30]. However, through experimental validation, we have discovered that a well-prepared masked image modeling model doesn’t require additional fine-tuning to achieve outstanding detection performance, conserving substantial fine-tuning resource consumption when dealing with a plethora of ID datasets that require evaluation.
Secondly, as the field has seen the emergence of more advanced OOD score functions [16, 7, 18, 15, 17, 19] and pretraining techniques [23, 22, 21, 12, 20], it raises the question of whether masked image modeling continues to maintain its leading role. In MOODv2, we integrate the latest advancements in pretraining methods and conduct experiments with an array of state-of-the-art OOD score functions. This broader spectrum of pretraining methods and score functions allows for a more comprehensive assessment of the MOODv2’s performance, better aligning MOODv2 with the increasingly intricate challenges of OOD detection.
Lastly, it is well known that if the network has seen similar samples in training, regardless of pre-training or fine-tuning, the OOD performance will be more or less trivial [31]. Previous works [6, 30] rely on pre-training on ImageNet-21K, so that the benchmark OOD dataset such as CIFAR [39], Places [40], etc., is unlikely to be untouched by the ImageNet-21K [35] dataset. In this work, MOODv2 introduces the latest unnatural datasets as OOD, which rules out the possibility of overlap between the OOD test set and the training set [31, 34].
In summary, MOODv2 incorporates improved score functions, advanced pretraining techniques, a wider range of unnatural OOD datasets, and a streamlined general framework. The performance improvement of MOODv2 compared to MOODv1 is depicted in Fig. 3. On ImageNet, MOODv2 exhibits a noteworthy 2.17% improvement in AUROC compared to MOODv1. Furthermore, on CIFAR-10, MOODv2, without finetuning on the ID dataset, achieves an exceptional AUROC score of up to 99.98%.
3 Methods
In this section, we initiate our exploration of reconstruction tasks for OOD detection by presenting the underlying motivation in Sec. 3.1. Following that, in Sec. 3.2, we delve into a comprehensive analysis of the essential attributes that play a pivotal role in OOD detection.
3.1 Motivation: seeking for effective ID representation
Most previous OOD methods learn the ID representation through classification [15, 6] or contrastive learning [4, 5] on ID samples, which take advantage of either the ground truth or pseudo labels to supervise the classification networks. On the other hand, work of [10, 11] shows that classification networks only learn different patterns among training categories because it is a shortcut to fulfill classification. It is indicated that the network actually does not learn the effective in-distribution representation. In comparison, the reconstruction-based pretext task forces the network to learn the pixel-level image representation of the ID images during training to reconstruct the image instead of the patterns for classification. In this way, the network can learn a more representative feature of the ID dataset.
To verify this, we reconstruct ID and OOD data and compute the Euclidean distance between the original and reconstructed images. A greater distance indicates a larger deviation of the reconstructed image from the original image. We collect recovery distances for ID and OOD data. Examples of the reconstruction are depicted in Fig. 4. In the first row, for ID images, pre-trained models reconstruct the images effectively. Instead, for unnatural OOD images in the following rows, clear domain discrepancies emerge. For instance, in the case of textured images, the models still apply lighting and shadows reminiscent of natural images. In the case of sketch images, the models render the images smoother and brighter. This discrepancy visually highlights the domain gap in model features between ID and OOD data, which can be leveraged for OOD detection.
3.2 Reconstruction Tasks for OOD Detection
Methods | Models | Texture [32] | iNaturalist [33] | ImageNet-O [34] | OpenImage-O [31] | Average | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | ||
MSP[15] | ViT[20] | 71.31 | 71.31 | 90.70 | 90.70 | 60.77 | 60.77 | 84.29 | 84.29 | 76.77 | 76.77 |
MoCov3[21] | 66.85 | 66.85 | 90.68 | 90.68 | 64.80 | 64.80 | 85.42 | 85.42 | 76.94 | 76.94 | |
DINOv2[22] | 47.49 | 47.49 | 62.13 | 62.13 | 44.87 | 44.87 | 52.83 | 52.83 | 51.83 | 51.83 | |
BEiTv2[23] | 85.61 | 85.61 | 96.05 | 96.05 | 81.15 | 81.15 | 92.52 | 92.52 | 88.83 | 88.83 | |
BEiT[12] | 85.05 | 85.05 | 95.50 | 95.50 | 83.17 | 83.17 | 92.28 | 92.28 | 89.00 | 89.00 | |
Energy[17] | ViT[20] | 54.11 | 54.11 | 76.61 | 76.61 | 61.63 | 61.63 | 71.06 | 71.06 | 65.85 | 65.85 |
MoCov3[21] | 48.79 | 48.79 | 76.80 | 76.80 | 64.56 | 64.56 | 72.13 | 72.13 | 65.57 | 65.57 | |
DINOv2[22] | 73.89 | 73.89 | 80.34 | 80.34 | 49.98 | 49.98 | 56.64 | 56.64 | 65.21 | 65.21 | |
BEiTv2[23] | 85.32 | 85.32 | 96.95 | 96.95 | 85.27 | 85.27 | 94.14 | 94.14 | 90.42 | 90.42 | |
BEiT[12] | 83.04 | 83.04 | 96.48 | 96.48 | 86.36 | 86.36 | 93.50 | 93.50 | 89.85 | 89.85 | |
MaxLogit[16] | ViT[20] | 67.22 | 67.22 | 89.88 | 89.88 | 61.68 | 61.68 | 82.73 | 82.73 | 75.37 | 75.37 |
MoCov3[21] | 62.36 | 62.36 | 90.38 | 90.38 | 65.65 | 65.65 | 84.19 | 84.19 | 75.64 | 75.64 | |
DINOv2[22] | 54.70 | 54.70 | 69.98 | 69.98 | 45.60 | 45.60 | 54.52 | 54.52 | 56.20 | 56.20 | |
BEiTv2[23] | 85.94 | 85.94 | 96.90 | 96.90 | 83.97 | 83.97 | 93.82 | 93.82 | 90.16 | 90.16 | |
BEiT[12] | 84.17 | 84.17 | 96.48 | 96.48 | 85.34 | 85.34 | 93.31 | 93.31 | 89.83 | 89.83 | |
KL-Matching[16] | ViT[20] | 82.59 | 82.59 | 87.63 | 87.63 | 66.55 | 66.55 | 84.34 | 84.34 | 80.28 | 80.28 |
MoCov3[21] | 82.35 | 82.35 | 86.24 | 86.24 | 67.80 | 67.80 | 82.73 | 82.73 | 79.78 | 79.78 | |
DINOv2[22] | 80.51 | 80.51 | 56.93 | 56.93 | 69.77 | 69.77 | 62.63 | 62.63 | 67.46 | 67.46 | |
BEiTv2[23] | 87.14 | 87.14 | 95.13 | 95.13 | 82.87 | 82.87 | 92.10 | 92.10 | 89.31 | 89.31 | |
BEiT[12] | 87.87 | 87.87 | 94.82 | 94.82 | 84.56 | 84.56 | 92.48 | 92.48 | 89.93 | 89.93 | |
Residual[7] | ViT[20] | 82.39 | 82.39 | 73.72 | 73.72 | 68.44 | 68.44 | 74.88 | 74.88 | 74.86 | 74.86 |
MoCov3[21] | 75.25 | 75.25 | 73.80 | 73.80 | 57.69 | 57.69 | 67.82 | 67.82 | 68.64 | 68.64 | |
DINOv2[22] | 66.50 | 66.50 | 61.90 | 61.90 | 58.94 | 58.94 | 56.84 | 56.84 | 61.04 | 61.04 | |
BEiTv2[23] | 94.99 | 94.99 | 99.01 | 99.01 | 87.23 | 87.23 | 95.43 | 95.43 | 94.17 | 94.17 | |
BEiT[12] | 94.16 | 94.16 | 99.50 | 99.50 | 89.35 | 89.35 | 96.52 | 96.52 | 94.88 | 94.88 | |
React[18] | ViT[20] | 62.09 | 62.09 | 91.20 | 91.20 | 63.66 | 63.66 | 80.43 | 80.43 | 74.34 | 74.34 |
MoCov3[21] | 51.47 | 51.47 | 79.30 | 79.30 | 65.33 | 65.33 | 74.35 | 74.35 | 67.61 | 67.61 | |
DINOv2[22] | 76.73 | 76.73 | 74.25 | 74.25 | 56.26 | 56.26 | 63.17 | 63.17 | 67.60 | 67.60 | |
BEiTv2[23] | 86.10 | 86.10 | 98.09 | 98.09 | 85.69 | 85.69 | 94.96 | 94.96 | 91.21 | 91.21 | |
BEiT[12] | 84.32 | 84.32 | 96.99 | 96.99 | 87.04 | 87.04 | 94.21 | 94.21 | 90.64 | 90.64 | |
Mahalanobis[19] | ViT[20] | 84.93 | 84.93 | 84.90 | 84.90 | 71.53 | 71.53 | 84.16 | 84.16 | 81.38 | 81.38 |
MoCov3[21] | 84.29 | 84.29 | 86.95 | 86.95 | 70.33 | 70.33 | 83.54 | 83.54 | 81.28 | 81.28 | |
DINOv2[22] | 68.58 | 68.58 | 63.14 | 63.14 | 58.86 | 58.86 | 57.57 | 57.57 | 62.04 | 62.04 | |
BEiTv2[23] | 93.01 | 93.01 | 98.78 | 98.78 | 86.78 | 86.78 | 95.46 | 95.46 | 93.51 | 93.51 | |
BEiT[12] | 93.03 | 93.03 | 99.18 | 99.18 | 88.84 | 88.84 | 96.51 | 96.51 | 94.39 | 94.39 | |
ViM[7] | ViT[20] | 83.51 | 83.51 | 77.75 | 77.75 | 71.04 | 71.04 | 78.31 | 78.31 | 77.65 | 77.65 |
MoCov3[21] | 76.28 | 76.28 | 78.18 | 78.18 | 61.35 | 61.35 | 72.46 | 72.46 | 72.07 | 72.07 | |
DINOv2[22] | 66.90 | 66.90 | 62.53 | 62.53 | 58.93 | 58.93 | 56.93 | 56.93 | 61.32 | 61.32 | |
BEiTv2[23] | 95.35 | 95.35 | 99.31 | 99.31 | 90.06 | 90.06 | 96.96 | 96.96 | 95.42 | 95.42 | |
BEiT[12] | 94.25 | 94.25 | 99.59 | 99.59 | 91.47 | 91.47 | 97.41 | 97.41 | 95.68 | 95.68 | |
Average | ViT[20] | 73.52 | 73.52 | 84.05 | 84.05 | 65.66 | 65.66 | 80.02 | 80.02 | 75.81 | 75.81 |
MoCov3[21] | 68.45 | 68.45 | 82.79 | 82.79 | 64.69 | 64.69 | 77.83 | 77.83 | 73.44 | 73.44 | |
DINOv2[22] | 66.91 | 66.91 | 66.40 | 66.40 | 55.40 | 55.40 | 57.64 | 57.64 | 61.59 | 61.59 | |
BEiTv2[23] | 89.18 | 89.18 | 97.53 | 97.53 | 85.38 | 85.38 | 94.42 | 94.42 | 91.63 | 91.63 | |
BEiT[12] | 88.24 | 88.24 | 97.32 | 97.32 | 87.02 | 87.02 | 94.53 | 94.53 | 91.77 | 91.77 | |
Best | ViT[20] | 84.93 | 84.93 | 91.20 | 91.20 | 71.53 | 71.53 | 84.34 | 84.34 | 81.38 | 81.38 |
MoCov3[21] | 84.29 | 84.29 | 90.68 | 90.68 | 70.33 | 70.33 | 85.42 | 85.42 | 81.28 | 81.28 | |
DINOv2[22] | 80.51 | 80.51 | 80.34 | 80.34 | 69.77 | 69.77 | 63.17 | 63.17 | 67.60 | 67.60 | |
BEiTv2[23] | 95.35 | 95.35 | 99.31 | 99.31 | 90.06 | 90.06 | 96.96 | 96.96 | 95.42 | 95.42 | |
BEiT[12] | 94.25 | 94.25 | 99.59 | 99.59 | 91.47 | 91.47 | 97.41 | 97.41 | 95.68 | 95.68 |
In this section, we offer a comprehensive analysis of these key elements in the context of OOD detection. We employ ImageNet [35] as the in-distribution dataset and evaluate pre-task texts on challenging unnatural out-of-distribution datasets, including OpenImage-O [31], Texture [32], iNaturalist [33], and ImageNet-O [34]. Extensive validations with various pretraining methods and OOD score functions, including MSP [15], Energy [17], ODIN [41], MaxLogit [16], KL Matching [16], Residual [7], ReAct [18], Mahalanobis [19] and ViM [7].
Results are shown in Tab. II. The results indicate that the masked image modeling pretext task surpasses classification and contrastive learning pretext tasks when employing all included score functions. The average AUROC across these score functions exhibits an improvement of 15.96% compared to the competition. Models when using the best-performing score function saw a 14.30% increase in performance. This remarkable achievement can be attributed to the representative ID feature space representation, thereby aiding in distinguishing between ID and OOD data. This discovery is highly significant as it enhances performance across mainstream OOD detection score functions, thus advancing the entire field. We also employ CIFAR-10 [39] as the ID dataset and provide results in the appendix. Our approach attains an impressive AUROC of 99.99% while concurrently reducing the FPR95 to a mere 0.03%.
To enhance the comprehensibility of our experimental findings, we conduct a thorough statistical analysis and illustrate them in visual representations. The outcomes are depicted in Fig. 5. Our approach not only leads to an overall enhancement in results but also notably minimizes the variations among different methods. For instance, the ViT, MoCov3, and DINOv2 models using logit-based methods exhibited standard deviations of 4.76%, 5.04%, and 4.51%, respectively, while BEiT and BEiTv2 displayed significantly lower standard deviations, reaching as low as 0.13% and 0.01%. This observation underscores that even uncomplicated score functions can perform equivalently to more intricate ones when an effective ID feature representation is applied.
In Tab. I, we underscore the optimal methods for each model. On CIFAR-10, all models achieved their best results when employing the feat and logit combination approach, achieving almost 100% accuracy. This suggests a highly effective grasp of CIFAR-10’s feature space. Conversely, with the larger ImageNet dataset, we observed variations in outcomes. Notably, the masked image modeling pretext-pretrained model achieved the best results when using the feat and logit combination method, while other models excelled in probability-based and feature-based methods. Additionally, our masked image modeling pretext demonstrated significantly superior performance compared to other pretraining methods, underscoring the limitations of classification-based pretraining strategies and their inadequacy in harnessing advanced score functions effectively. These discoveries reinforce the pivotal role of proficient feature representation in OOD detection. Furthermore, for more detailed information, we provide illustrations of the distribution curves of OOD scores for both ID and OOD datasets in the appendix.
3.3 Masked Image Modeling for Out-of-Distribution v2
To sum up, in this section, we observed that pre-trained models adeptly reconstruct ID images, yet manifest distinctive domain differences in the OOD scenario (Fig. 4). This visual incongruity starkly highlights the prevailing domain gap in model features between ID and OOD data. Additionally, a thorough analysis of experimental outcomes reveals that the pre-task of masked image modeling not only significantly enhances overall results but also markedly diminishes disparities among score functions. These findings emphasize the crucial significance of effective feature representation in OOD detection, highlighting the enhancement of features through masked image modeling tasks.
Finally, we propose our Masked Image Modeling for Out-of-Distribution Detection v2 (MOODv2). The algorithm of is shown in Algorithm 1, mainly including the following stages.
-
1.
Pre-train the vision encoder with masked image modeling on the pretrain dataset.
-
2.
Apply fine-tuning the backbone on the in-distribution dataset.
-
3.
Extract features from the trained image encoder and calculate the OOD score distance score function for OOD detection.
In terms of the OOD score function, we adopt ViM[7] that combines features and logits, leveraging insights from the masked image modeling pre-trained model, which has demonstrated superior performance. Mathematically, the score is
(1) |
where is the -th logit of feature in the training set ; is a per-model constant; is the -th column to the last column of the eigenvector matrix of and is the principal dimension; is the number of classes.
ID data | Methods | Texture [32] | iNaturalist [33] | ImageNet-O [34] | OpenImage-O [31] | Average | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | ||
CIFAR-10 | MSP[15] | 45.67 | 95.17 | 71.07 | 81.76 | 32.52 | 98.85 | 59.74 | 91.45 | 52.25 | 91.81 |
Energy[17] | 31.16 | 97.89 | 48.95 | 97.92 | 37.22 | 97.85 | 45.29 | 96.36 | 40.65 | 97.50 | |
MaxLogit[16] | 41.21 | 95.95 | 67.83 | 86.04 | 32.58 | 98.80 | 56.64 | 92.94 | 49.56 | 93.43 | |
KL-Matching[16] | 98.00 | 10.64 | 94.23 | 35.86 | 92.99 | 32.40 | 94.68 | 27.92 | 94.97 | 26.71 | |
Residual[7] | 99.91 | 0.21 | 99.68 | 0.45 | 99.36 | 2.85 | 99.42 | 2.46 | 99.59 | 1.49 | |
React[18] | 35.97 | 96.26 | 69.01 | 87.91 | 36.65 | 97.75 | 54.14 | 93.11 | 48.94 | 93.76 | |
Mahalanobis[19] | 99.77 | 0.60 | 99.39 | 1.11 | 98.93 | 4.90 | 99.14 | 3.26 | 99.31 | 2.47 | |
ViM[7] | 99.91 | 0.23 | 99.72 | 0.38 | 99.38 | 2.65 | 99.49 | 2.31 | 99.63 | 1.39 | |
MOODv1[30] | 99.95 | 0.06 | 99.99 | 0.02 | 99.61 | 1.90 | 99.82 | 0.77 | 99.84 | 0.69 | |
MOODv2 (ours) | 99.98 | 0.06 | 100.00 | 0.00 | 99.94 | 0.20 | 99.99 | 0.01 | 99.98 | 0.07 | |
ImageNet | MSP[15] | 71.31 | 77.07 | 90.70 | 43.72 | 60.77 | 90.60 | 84.29 | 61.79 | 76.77 | 68.30 |
Energy[17] | 54.11 | 86.28 | 76.61 | 72.70 | 61.63 | 81.00 | 71.06 | 73.99 | 65.85 | 78.49 | |
MaxLogit[16] | 67.22 | 77.98 | 89.88 | 45.57 | 61.68 | 88.60 | 82.73 | 62.52 | 75.37 | 68.67 | |
KL-Matching[16] | 82.59 | 67.27 | 87.63 | 69.71 | 66.55 | 88.15 | 84.34 | 74.23 | 80.28 | 74.84 | |
Residual[7] | 82.39 | 64.61 | 73.72 | 86.00 | 68.44 | 87.45 | 74.88 | 77.98 | 74.86 | 79.01 | |
React[18] | 62.09 | 80.47 | 91.20 | 38.74 | 63.66 | 81.00 | 80.43 | 60.41 | 74.34 | 65.15 | |
Mahalanobis[19] | 84.93 | 66.05 | 84.90 | 81.60 | 71.53 | 88.85 | 84.16 | 74.72 | 81.38 | 77.80 | |
ViM[7] | 83.51 | 62.71 | 77.75 | 81.72 | 71.04 | 86.60 | 78.31 | 74.55 | 77.65 | 76.40 | |
MOODv1[30] | 93.01 | 30.91 | 98.78 | 5.89 | 86.78 | 63.15 | 95.46 | 26.46 | 93.51 | 31.60 | |
MOODv2 (ours) | 94.25 | 24.69 | 99.59 | 1.83 | 91.47 | 40.80 | 97.41 | 13.55 | 95.68 | 20.22 |
Methods | ID class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Plane | Car | Bird | Cat | Deer | Dog | Frog | Horse | Ship | Truck | ||
KL-Matching[16] | 95.35 | 92.04 | 95.18 | 91.26 | 88.11 | 94.66 | 94.99 | 86.52 | 93.61 | 89.37 | 92.11 |
Residual[7] | 97.62 | 95.88 | 97.06 | 96.30 | 89.18 | 94.33 | 96.73 | 91.46 | 94.89 | 92.36 | 94.58 |
Mahalanobis[19] | 97.52 | 96.07 | 96.77 | 96.41 | 89.60 | 94.79 | 96.41 | 91.48 | 94.80 | 92.58 | 94.64 |
ViM[7] | 97.61 | 96.36 | 97.19 | 96.50 | 88.78 | 94.21 | 96.70 | 91.60 | 94.97 | 92.35 | 94.63 |
MOODv1[30] | 98.63 | 99.33 | 94.31 | 93.22 | 98.11 | 96.50 | 99.25 | 98.96 | 98.76 | 97.82 | 97.83 |
MOODv2 (ours) | 99.14 | 99.03 | 99.51 | 98.37 | 97.12 | 97.20 | 98.53 | 98.07 | 98.35 | 96.68 | 98.20 |
Methods | ID class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Plane | Car | Bird | Cat | Deer | Dog | Frog | Horse | Ship | Truck | ||
KL-Matching[16] | 23.60 | 32.60 | 22.32 | 42.92 | 46.26 | 24.30 | 24.97 | 46.74 | 25.32 | 40.53 | 32.96 |
Residual[7] | 12.06 | 25.58 | 16.71 | 21.17 | 48.33 | 22.12 | 17.42 | 36.72 | 17.30 | 30.76 | 24.82 |
Mahalanobis[19] | 12.59 | 25.72 | 18.92 | 21.48 | 48.44 | 20.59 | 19.20 | 38.02 | 17.47 | 30.93 | 25.34 |
ViM[7] | 12.43 | 24.83 | 15.77 | 20.13 | 48.68 | 21.77 | 17.63 | 36.63 | 17.60 | 30.78 | 24.63 |
MOODv1[30] | 7.59 | 5.04 | 2.47 | 7.49 | 15.63 | 10.96 | 11.37 | 13.09 | 10.06 | 19.62 | 10.33 |
MOODv2 (ours) | 4.82 | 4.50 | 1.79 | 8.80 | 15.59 | 11.00 | 8.46 | 12.43 | 8.60 | 18.96 | 9.49 |
4 Experiments
In this section, we conduct a thorough comparison of our algorithm with the latest OOD detection methods. We employ the ViT-B/16 model, pre-trained on ImageNet-21K and fine-tuned on ImageNet-1K at a resolution of .
ID/OOD Datasets. We select CIFAR-10 [39] and ImageNet-1K [35] as the ID datasets. Following established procedures [7], for estimating the principal space of ImageNet, we randomly sample images from the training set. Our experiments include the following OOD datasets:
-
1.
OpenImage-O is a newly collected large-scale OOD dataset [31].
-
2.
Texture [36] comprises natural textural images, with four overlapping categories (bubbly, honeycombed, cobwebbed, spiraled) removed since they coincide with ImageNet.
- 3.
-
4.
ImageNet-O [38] contains images that are adversarially filtered to challenge OOD detectors.
Evaluation Metrics. We report two commonly used evaluation metrics AUROC and FPR95. The AUROC is a threshold-free metric, indicating the area under the receiver operating characteristic curve, with a higher value denoting better detection performance. FPR95, or FPR at TPR95, stands for the false positive rate when the true positive rate is 95%, and a smaller FPR95 is preferable. Both metrics are expressed as percentages.
Baseline Methods. Following previous works [7], we compare MOODv2 with the baseline algorithms that do not require fine-tuning including MSP [15], Energy [17], ODIN [41], MaxLogit [16], KL Matching [16], Residual, ReAct [18], and Mahalanobis [19].
4.1 One-Class OOD Detection
We start with the one-class OOD detection. For a given multi-class dataset of classes, we conduct one-class OOD tasks, where each task regards one of the classes as in-distribution and the remaining classes as out-of-distribution. We run our experiments on CIFAR-10 [39]. Table IV summarizes the average results across OOD classes of each ID class and the detailed class-wize performance is in the appendix.
It’s worth noting that all methods were pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k, which may have had some influence on the results to varying degrees. Nevertheless, we ensure consistent training strategies for all methods to ensure a fair comparison. Experimental results have demonstrated that MOODv2 achieves significant improvements across all ID classes even without fine-tuning the ID dataset. Notably, we achieved a remarkable 3.56% increase in the AUROC, reaching 98.20%, while simultaneously reducing the FPR95 by 15.14% to achieve an impressive 9.49%.
4.2 Multi-Class OOD Detection
For multi-class OOD Detection, we assume that ID samples are from a multi-class dataset, either CIFAR-10 [39] or ImageNet [35]. They are tested on external datasets as out-of-distribution, including OpenImage-O [31], Texture [36], iNaturalist [37] and ImageNet-O [38].
Results are shown in Tab. III. MOODv2 delivers outstanding results on CIFAR-10, achieving an impressive AUROC of 99.98% (0.35% enhancement) and the FPR95 reaches an astonishingly low rate of 0.07%, marking a substantial 95% reduction compared to the prior SOTA (1.39%). On ImageNet, MOODv2 also exhibited significant improvements, showcasing a remarkable 14.30% increase in AUROC, resulting in 95.68%. Additionally, the FPR95 saw a substantial reduction of 44.93%, reaching 20.22%.
In Fig. 7, we illustrate the distribution curves of OOD scores for ID and OOD datasets using various mainstream methods. A smaller overlap between ID and OOD data indicates superior OOD detection performance, while a larger overlap signifies weaker detection results. The ID curve (in red) for MOODv2 features a distinct peak at a higher position, resulting in minimal overlap with other OOD data, indicating a notable OOD detection capability. This success can be attributed to the high-quality ID feature representation.
5 Conclusion
In our work, we focus on the critical aspect of effective out-of-distribution (OOD) detection, which involves acquiring a robust in-distribution (ID) representation that distinguishes it from OOD samples. We conduct comprehensive experiments with distinct pretraining tasks and employ various OOD score functions. The findings indicate that feature representations pre-trained through reconstruction significantly enhance performance and reduce the performance gap among different score functions. This implies that even simple score functions can perform as well as complex ones when utilizing reconstruction-based pretext tasks. These findings hold promise for further development in OOD detection. Ultimately, we introduce the MOODv2 OOD detection framework, employing the masked image modeling pretext task, which achieves a remarkable 14.30% increase in AUROC, reaching 95.68% on ImageNet, and substantially improving CIFAR-10 to 99.98%.
References
- [1] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 1721–1730.
- [2] C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive survey of data mining-based fraud detection research,” arXiv preprint arXiv:1009.6119, 2010.
- [3] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks on deep learning visual classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1625–1634.
- [4] J. Tack, S. Mo, J. Jeong, and J. Shin, “Csi: Novelty detection via contrastive learning on distributionally shifted instances,” Advances in neural information processing systems, vol. 33, pp. 11 839–11 852, 2020.
- [5] V. Sehwag, M. Chiang, and P. Mittal, “Ssd: A unified framework for self-supervised outlier detection,” arXiv preprint arXiv:2103.12051, 2021.
- [6] S. Fort, J. Ren, and B. Lakshminarayanan, “Exploring the limits of out-of-distribution detection,” Advances in Neural Information Processing Systems, vol. 34, pp. 7068–7081, 2021.
- [7] H. Wang, Z. Li, L. Feng, and W. Zhang, “Vim: Out-of-distribution with virtual-logit matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- [8] J. Yang, K. Zhou, Y. Li, and Z. Liu, “Generalized out-of-distribution detection: A survey,” arXiv preprint arXiv:2110.11334, 2021.
- [9] M. B. Sariyildiz, K. Alahari, D. Larlus, and Y. Kalantidis, “Fake it till you make it: Learning transferable representations from synthetic imagenet clones,” 2023.
- [10] A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden trigger backdoor attacks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11 957–11 965, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/6871
- [11] ——, “Hidden trigger backdoor attacks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11 957–11 965.
- [12] H. Bao, L. Dong, and F. Wei, “Beit: Bert pre-training of image transformers,” arXiv preprint arXiv:2106.08254, 2021.
- [13] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- [14] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 000–16 009.
- [15] D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
- [16] D. Hendrycks, S. Basart, M. Mazeika, A. Zou, J. Kwon, M. Mostajabi, J. Steinhardt, and D. Song, “Scaling out-of-distribution detection for real-world settings,” arXiv preprint arXiv:1911.11132, 2019.
- [17] W. Liu, X. Wang, J. Owens, and Y. Li, “Energy-based out-of-distribution detection,” Advances in neural information processing systems, vol. 33, pp. 21 464–21 475, 2020.
- [18] Y. Sun, C. Guo, and Y. Li, “React: Out-of-distribution detection with rectified activations,” Advances in Neural Information Processing Systems, vol. 34, pp. 144–157, 2021.
- [19] K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” Advances in neural information processing systems, vol. 31, 2018.
- [20] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- [21] X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 9640–9649.
- [22] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2023.
- [23] Z. Peng, L. Dong, H. Bao, Q. Ye, and F. Wei, “Beit v2: Masked image modeling with vector-quantized visual tokenizers,” 2022.
- [24] I. Ndiour, N. Ahuja, and O. Tickoo, “Out-of-distribution detection with subspace techniques and probabilistic modeling of features,” arXiv preprint arXiv:2012.04250, 2020.
- [25] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
- [26] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- [27] T. DeVries and G. W. Taylor, “Learning confidence for out-of-distribution detection in neural networks,” arXiv preprint arXiv:1802.04865, 2018.
- [28] A. Zaeemzadeh, N. Bisagno, Z. Sambugaro, N. Conci, N. Rahnavard, and M. Shah, “Out-of-distribution detection using union of 1-dimensional subspaces,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9452–9461.
- [29] R. Huang and Y. Li, “MOS: Towards scaling out-of-distribution detection for large semantic space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8710–8719.
- [30] J. Li, P. Chen, Z. He, S. Yu, S. Liu, and J. Jia, “Rethinking out-of-distribution (ood) detection: Masked image modeling is all you need,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 578–11 589.
- [31] I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, and K. Murphy, “Openimages: A public dataset for large-scale multi-label and multi-class image classification.” Dataset available from https://github.com/openimages, 2017.
- [32] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3606–3613.
- [33] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The inaturalist species classification and detection dataset,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8769–8778.
- [34] D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song, “Natural adversarial examples,” CVPR, 2021.
- [35] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., 2015.
- [36] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3606–3613.
- [37] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The iNaturalist species classification and detection dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8769–8778.
- [38] D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song, “Natural adversarial examples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 262–15 271.
- [39] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- [40] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452–1464, 2017.
- [41] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017.
Jingyao Li received the B.Eng. degree from Xi’an Jiaotong University. She is currently a Ph.D. student at Department of Computer Science and Engineering of the Chinese University of Hong Kong (CUHK), under the supervision of Prof. Jiaya Jia. She serves as a reviewer for CVPR, ECCV, ICCV and etc. Her research interests include computer vision and large language models. |
Pengguang Chen received the B.Eng. degree in Computer Science from Nanjing University and the Ph.D. degree from the Chinese University of Hong Kong (CUHK), under the supervision of Prof. Jiaya Jia. He is currently a researcher in SmartMore. He serves as a reviewer for CVPR, ICCV, ECCV, TPAMI. His research interests include neural architecture search, self-supervised learning, knowledge distillation and semantic segmentation. |
Shaozuo Yu is a Ph.D. student at Department of Computer Science and Engineering of the Chinese University of Hong Kong. He served as a program chair of the workshop and challenge on “Out-of-Distribution Generalization in Computer Vision” at ECCV’22. He served as a reviewer for CVPR, Neurips, and ICML. His research interests include multimodality, generative models, and robust vision. |
Shu Liu now serves as Co-Founder and Technical Head in SmartMore. He received the BS degree from Huazhong University of Science and Technology and the PhD degree from the Chinese University of Hong Kong. He was the winner of 2017 COCO Instance Segmentation Competition and received the Outstanding Reviewer of ICCV in 2019. He continuously served as a reviewer for TPAMI, CVPR, ICCV, NIPS, ICLR and etc. His research interests lie in deep learning and computer vision. He is a member of IEEE. |
Jiaya Jia received the Ph.D. degree in Computer Science from Hong Kong University of Science and Technology in 2004 and is currently a full professor in Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He assumes the position of Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is in the editorial board of International Journal of Computer Vision (IJCV). He continuously served as area chairs for ICCV, CVPR, AAAI, ECCV, and several other conferences for the organization. He was on program committees of major conferences in graphics and computational imaging, including ICCP, SIGGRAPH, and SIGGRAPH Asia. He is a Fellow of the IEEE. |
This supplementary material includes visualization of distribution curves, multi-class and one-class OOD detection results on CIFAR-10, etc., which are not included in the main paper due to page limitations.
Appendix A Distribution curves
For more comprehensive insights, we offer visual representations of distribution curves for OOD scores on both ID and OOD datasets in Fig. 7. A narrower overlap between ID and OOD data signifies superior OOD detection performance, whereas a wider overlap indicates weaker detection results. The ID curve, depicted in red, for the fine-tuned BEiT series [12] models, exhibits a distinctive peak at a higher position. This leads to minimal overlap with other OOD data, highlighting a remarkable OOD detection capability. This accomplishment can be attributed to the high-quality ID feature representation derived from masked image modeling.
Appendix B Details of Results on CIFAR-10
B.1 Multi-class OOD Detection
We employ CIFAR-10 [39] as the in-distribution dataset and evaluate pre-task texts on multiple challenging unnatural out-of-distribution datasets, including OpenImage-O [31], Texture [32], iNaturalist [33], and ImageNet-O [34]. Extensive validations with various pretraining methods and OOD score functions including MSP [15], Energy [17], ODIN [41], MaxLogit [16], KL Matching [16], Residual [7], ReAct [18], Mahalanobis [19] and ViM [7]. Results are in Tab. V. Our approach attains an impressive AUROC of 99.99% while concurrently reducing the FPR95 to a mere 0.03%.
B.2 One-class OOD Detection
We perform one-class OOD detection. In the context of a multi-class dataset with classes, we conduct one-class OOD tasks. Each task treats one of the classes as in-distribution and the remaining classes as out-of-distribution. Our experiments are conducted on CIFAR-10 [39] and provide the detailed class-wise performance of mainstream methods including KL-Marching (Tab. VI), Residual (Tab. VII), Mahalanobis (Tab. VIII) and ViM (Tab. IX).
It’s important to note that all methods were pre-trained on ImageNet-21k and subsequently fine-tuned on ImageNet-1k, which might have influenced the results to varying degrees. However, we ensure consistent training strategies for all methods to maintain a fair comparison. The experimental results demonstrate that MOODv2 achieves significant improvements across all ID classes, even without fine-tuning the ID dataset. Notably, we achieved a remarkable 3.56% increase in the state-of-the-art AUROC, reaching 98.20%, while simultaneously reducing FPR95 by 15.14%, achieving an impressive 9.49%.
Methods | Models | Texture | iNaturalist | ImageNet-O | OpenImage-O | Average | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | AUROC | FPR95 | ||
MSP[15] | ViT[20] | 45.67 | 95.17 | 71.07 | 81.76 | 32.52 | 98.85 | 59.74 | 91.45 | 52.25 | 91.81 |
MoCov3[21] | 37.11 | 97.64 | 64.60 | 89.69 | 31.52 | 98.45 | 55.90 | 93.53 | 47.28 | 94.83 | |
DINOv2[22] | 70.37 | 58.33 | 87.10 | 37.77 | 70.61 | 63.25 | 78.27 | 51.58 | 76.59 | 52.73 | |
BEiTv2[23] | 57.67 | 88.31 | 82.53 | 55.54 | 52.06 | 89.55 | 72.72 | 70.71 | 66.24 | 76.03 | |
BEiT[12] | 51.64 | 91.09 | 73.85 | 74.02 | 47.82 | 90.75 | 65.40 | 80.81 | 59.68 | 84.17 | |
Energy[17] | ViT[20] | 31.16 | 97.89 | 48.95 | 97.92 | 37.22 | 97.85 | 45.29 | 96.36 | 40.65 | 97.50 |
MoCov3[21] | 24.97 | 98.93 | 44.74 | 98.29 | 38.06 | 95.45 | 43.90 | 95.49 | 37.92 | 97.04 | |
DINOv2[22] | 86.73 | 28.16 | 91.43 | 20.27 | 68.97 | 62.75 | 73.66 | 53.40 | 80.20 | 41.15 | |
BEiTv2[23] | 63.35 | 82.64 | 88.52 | 38.21 | 66.24 | 72.30 | 81.69 | 51.80 | 74.95 | 61.24 | |
BEiT[12] | 52.98 | 88.53 | 81.84 | 54.55 | 59.99 | 77.20 | 73.61 | 65.08 | 67.10 | 71.34 | |
MaxLogit[16] | ViT[20] | 41.21 | 95.95 | 67.83 | 86.04 | 32.58 | 98.80 | 56.64 | 92.94 | 49.56 | 93.43 |
MoCov3[21] | 32.94 | 98.22 | 61.79 | 92.21 | 31.65 | 98.45 | 53.32 | 94.26 | 44.92 | 95.78 | |
DINOv2[22] | 76.80 | 45.06 | 91.96 | 22.66 | 72.49 | 57.95 | 80.36 | 44.75 | 80.40 | 42.61 | |
BEiTv2[23] | 60.51 | 85.85 | 86.05 | 47.39 | 59.14 | 83.90 | 77.47 | 62.79 | 70.79 | 69.98 | |
BEiT[12] | 51.94 | 89.90 | 77.92 | 66.72 | 52.95 | 87.10 | 69.16 | 75.24 | 62.99 | 79.74 | |
KL-Matching[16] | ViT[20] | 98.00 | 10.64 | 94.23 | 35.86 | 92.99 | 32.40 | 94.68 | 27.92 | 94.97 | 26.71 |
MoCov3[21] | 97.61 | 13.97 | 94.65 | 35.51 | 92.05 | 38.25 | 94.22 | 33.48 | 94.64 | 30.30 | |
DINOv2[22] | 98.05 | 8.74 | 98.95 | 5.32 | 97.29 | 12.35 | 96.99 | 13.55 | 97.82 | 9.99 | |
BEiTv2[23] | 97.41 | 14.98 | 91.78 | 50.90 | 93.21 | 35.90 | 92.28 | 43.17 | 93.67 | 36.24 | |
BEiT[12] | 97.83 | 12.71 | 95.14 | 32.28 | 93.52 | 35.00 | 94.84 | 31.13 | 95.33 | 27.78 | |
Residual[7] | ViT[20] | 99.91 | 0.21 | 99.68 | 0.45 | 99.36 | 2.85 | 99.42 | 2.46 | 99.59 | 1.49 |
MoCov3[21] | 99.90 | 0.25 | 99.87 | 0.09 | 99.22 | 3.85 | 99.59 | 1.31 | 99.65 | 1.38 | |
DINOv2[22] | 99.98 | 0.04 | 100.00 | 0.01 | 99.99 | 0.05 | 99.97 | 0.18 | 99.98 | 0.07 | |
BEiTv2[23] | 99.98 | 0.04 | 100.00 | 0.00 | 99.79 | 0.90 | 99.92 | 0.27 | 99.92 | 0.30 | |
BEiT[12] | 99.99 | 0.02 | 100.00 | 0.00 | 99.96 | 0.10 | 99.99 | 0.01 | 99.99 | 0.03 | |
React[18] | ViT[20] | 35.97 | 96.26 | 69.01 | 87.91 | 36.65 | 97.75 | 54.14 | 93.11 | 48.94 | 93.76 |
MoCov3[21] | 25.74 | 98.90 | 46.11 | 98.29 | 37.60 | 95.55 | 44.63 | 95.46 | 38.52 | 97.05 | |
DINOv2[22] | 68.00 | 61.94 | 60.58 | 82.56 | 40.71 | 90.10 | 47.60 | 88.88 | 54.22 | 80.87 | |
BEiTv2[23] | 62.81 | 82.71 | 91.59 | 28.05 | 65.37 | 72.95 | 82.43 | 49.49 | 75.55 | 58.30 | |
BEiT[12] | 53.27 | 87.91 | 82.09 | 53.87 | 59.64 | 77.50 | 73.73 | 64.84 | 67.18 | 71.03 | |
Mahalanobis[19] | ViT[20] | 99.77 | 0.60 | 99.39 | 1.11 | 98.93 | 4.90 | 99.14 | 3.26 | 99.31 | 2.47 |
MoCov3[21] | 99.78 | 0.78 | 99.71 | 0.45 | 98.61 | 7.65 | 99.31 | 2.48 | 99.35 | 2.84 | |
DINOv2[22] | 99.98 | 0.06 | 100.00 | 0.00 | 99.99 | 0.00 | 99.97 | 0.16 | 99.99 | 0.05 | |
BEiTv2[23] | 99.95 | 0.06 | 99.99 | 0.02 | 99.61 | 1.90 | 99.82 | 0.77 | 99.84 | 0.69 | |
BEiT[12] | 99.99 | 0.00 | 100.00 | 0.00 | 99.96 | 0.05 | 99.98 | 0.05 | 99.98 | 0.03 | |
ViM[7] | ViT[20] | 99.91 | 0.23 | 99.72 | 0.38 | 99.38 | 2.65 | 99.49 | 2.31 | 99.63 | 1.39 |
MoCov3[21] | 99.93 | 0.16 | 99.92 | 0.03 | 99.40 | 2.75 | 99.69 | 1.03 | 99.73 | 0.99 | |
DINOv2[22] | 99.98 | 0.04 | 100.00 | 0.01 | 99.99 | 0.05 | 99.97 | 0.18 | 99.98 | 0.07 | |
BEiTv2[23] | 99.95 | 0.14 | 100.00 | 0.01 | 99.61 | 1.60 | 99.93 | 0.28 | 99.87 | 0.51 | |
BEiT[12] | 99.98 | 0.06 | 100.00 | 0.00 | 99.94 | 0.20 | 99.99 | 0.01 | 99.98 | 0.07 | |
Best | ViT[20] | 99.91 | 0.21 | 99.72 | 0.38 | 99.38 | 2.65 | 99.49 | 2.31 | 99.63 | 1.39 |
MoCov3[21] | 99.93 | 0.16 | 99.92 | 0.03 | 99.40 | 2.75 | 99.69 | 1.03 | 99.73 | 0.99 | |
DINOv2[22] | 99.98 | 0.04 | 100.00 | 0.00 | 99.99 | 0.00 | 99.97 | 0.16 | 99.99 | 0.05 | |
BEiTv2[23] | 99.98 | 0.04 | 100.00 | 0.00 | 99.79 | 0.90 | 99.93 | 0.27 | 99.92 | 0.30 | |
BEiT[12] | 99.99 | 0.00 | 100.00 | 0.00 | 99.96 | 0.05 | 99.99 | 0.01 | 99.99 | 0.03 |
Models | ID class | OOD class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
ViT[20] | 0 | - | 81.29 | 90.44 | 90.62 | 86.42 | 95.45 | 93.28 | 91.15 | 67.66 | 66.06 | 84.71 |
1 | 98.54 | - | 99.58 | 99.33 | 99.57 | 99.69 | 99.82 | 96.16 | 96.00 | 85.23 | 97.10 | |
2 | 90.41 | 97.50 | - | 87.18 | 76.75 | 95.02 | 90.45 | 81.47 | 96.76 | 98.16 | 90.41 | |
3 | 94.14 | 94.49 | 91.32 | - | 77.12 | 78.07 | 88.59 | 82.25 | 98.25 | 85.24 | 87.72 | |
4 | 96.15 | 98.55 | 92.46 | 89.63 | - | 95.80 | 94.34 | 66.06 | 98.45 | 96.45 | 91.99 | |
5 | 98.38 | 98.19 | 96.45 | 78.99 | 86.87 | - | 97.45 | 68.19 | 99.48 | 95.96 | 91.11 | |
6 | 96.68 | 97.44 | 92.05 | 88.20 | 83.14 | 95.27 | - | 96.14 | 95.03 | 97.92 | 93.54 | |
7 | 96.32 | 97.72 | 97.36 | 92.15 | 85.78 | 94.45 | 98.03 | - | 94.20 | 98.45 | 94.94 | |
8 | 89.88 | 71.86 | 97.40 | 95.86 | 98.82 | 98.45 | 93.19 | 98.04 | - | 80.89 | 91.60 | |
9 | 97.62 | 91.34 | 99.60 | 99.38 | 98.48 | 99.75 | 99.78 | 99.23 | 96.62 | - | 97.98 | |
MoCov3[21] | 0 | - | 87.10 | 87.79 | 90.33 | 89.95 | 94.07 | 91.11 | 72.18 | 69.31 | 69.45 | 83.48 |
1 | 93.05 | - | 98.12 | 97.41 | 98.20 | 98.26 | 99.03 | 94.47 | 93.38 | 75.58 | 94.17 | |
2 | 82.96 | 95.82 | - | 83.13 | 78.83 | 93.45 | 82.30 | 73.48 | 94.46 | 96.90 | 86.81 | |
3 | 91.08 | 93.10 | 90.56 | - | 74.01 | 74.04 | 84.32 | 79.70 | 93.94 | 93.13 | 85.99 | |
4 | 93.82 | 96.02 | 86.55 | 88.85 | - | 92.48 | 92.56 | 66.43 | 96.26 | 96.97 | 89.99 | |
5 | 94.32 | 95.45 | 91.58 | 74.17 | 88.34 | - | 94.31 | 69.63 | 97.36 | 98.02 | 89.24 | |
6 | 95.09 | 96.76 | 88.10 | 87.53 | 80.44 | 90.97 | - | 94.57 | 92.31 | 97.94 | 91.52 | |
7 | 91.62 | 96.60 | 92.38 | 91.56 | 80.46 | 92.20 | 96.91 | - | 92.62 | 95.95 | 92.25 | |
8 | 80.42 | 88.78 | 96.30 | 93.92 | 96.62 | 95.98 | 95.00 | 95.97 | - | 79.94 | 91.44 | |
9 | 92.33 | 79.04 | 97.95 | 97.19 | 98.14 | 98.28 | 98.73 | 97.74 | 88.23 | - | 94.18 | |
DINOv2[22] | 0 | - | 60.94 | 65.14 | 64.79 | 71.95 | 62.40 | 76.86 | 61.40 | 39.99 | 51.08 | 61.62 |
1 | 73.33 | - | 59.97 | 48.33 | 56.38 | 44.85 | 57.27 | 41.11 | 60.09 | 45.81 | 54.13 | |
2 | 69.07 | 55.66 | - | 49.75 | 44.54 | 46.53 | 51.42 | 43.47 | 57.03 | 53.23 | 52.30 | |
3 | 79.75 | 63.89 | 61.86 | - | 56.82 | 47.72 | 59.42 | 47.58 | 70.69 | 61.79 | 61.06 | |
4 | 81.90 | 68.04 | 59.55 | 59.30 | - | 57.66 | 54.34 | 53.14 | 73.53 | 68.04 | 63.95 | |
5 | 81.63 | 65.90 | 63.37 | 52.89 | 58.20 | - | 61.38 | 48.90 | 73.32 | 64.01 | 63.29 | |
6 | 86.28 | 71.81 | 62.40 | 57.83 | 51.69 | 56.97 | - | 54.87 | 81.90 | 74.10 | 66.43 | |
7 | 84.48 | 68.52 | 64.60 | 58.30 | 57.49 | 54.55 | 58.26 | - | 77.39 | 66.77 | 65.60 | |
8 | 64.17 | 71.12 | 75.01 | 73.31 | 79.28 | 71.02 | 84.92 | 71.20 | - | 61.16 | 72.35 | |
9 | 75.23 | 59.90 | 68.79 | 58.47 | 68.79 | 54.25 | 72.32 | 50.25 | 62.55 | - | 63.39 | |
BEiTv2[23] | 0 | - | 94.83 | 97.42 | 99.31 | 97.01 | 99.79 | 99.34 | 99.44 | 89.14 | 91.32 | 96.40 |
1 | 99.01 | - | 99.91 | 99.83 | 99.72 | 99.90 | 99.98 | 99.74 | 92.98 | 94.11 | 98.35 | |
2 | 85.99 | 99.23 | - | 98.03 | 81.97 | 99.25 | 93.81 | 93.23 | 96.98 | 99.43 | 94.21 | |
3 | 98.39 | 98.20 | 97.60 | - | 90.40 | 85.67 | 74.37 | 95.00 | 99.49 | 98.30 | 93.05 | |
4 | 99.57 | 99.62 | 98.82 | 88.31 | - | 99.31 | 99.03 | 77.26 | 99.32 | 99.88 | 95.68 | |
5 | 99.57 | 99.31 | 99.40 | 80.54 | 97.51 | - | 99.14 | 53.39 | 99.72 | 98.76 | 91.93 | |
6 | 99.46 | 99.75 | 97.00 | 96.55 | 97.07 | 99.04 | - | 99.35 | 99.71 | 99.91 | 98.65 | |
7 | 99.12 | 98.87 | 99.76 | 98.94 | 92.57 | 99.02 | 99.92 | - | 98.73 | 95.91 | 98.09 | |
8 | 88.49 | 93.17 | 99.83 | 99.74 | 99.82 | 99.94 | 99.94 | 99.86 | - | 96.15 | 97.44 | |
9 | 99.19 | 96.36 | 99.96 | 99.75 | 99.75 | 99.97 | 99.97 | 99.80 | 97.88 | - | 99.18 | |
BEiT[12] | 0 | - | 95.65 | 95.16 | 98.75 | 94.10 | 99.60 | 98.60 | 98.70 | 83.00 | 85.22 | 94.31 |
1 | 97.46 | - | 99.85 | 99.63 | 99.47 | 99.91 | 99.96 | 99.55 | 95.42 | 90.32 | 97.95 | |
2 | 96.98 | 98.44 | - | 89.21 | 77.98 | 98.19 | 97.53 | 72.68 | 97.65 | 99.17 | 91.98 | |
3 | 97.62 | 95.80 | 96.95 | - | 80.24 | 86.56 | 72.85 | 86.98 | 99.38 | 94.34 | 90.08 | |
4 | 99.40 | 98.45 | 98.57 | 98.12 | - | 98.92 | 96.00 | 70.81 | 98.42 | 93.32 | 94.67 | |
5 | 99.15 | 99.03 | 98.66 | 82.32 | 82.56 | - | 98.78 | 47.63 | 99.81 | 99.62 | 89.73 | |
6 | 99.32 | 99.56 | 96.90 | 96.87 | 93.96 | 99.06 | - | 99.56 | 99.78 | 99.89 | 98.32 | |
7 | 98.76 | 98.51 | 99.55 | 99.00 | 93.66 | 99.01 | 99.84 | - | 98.99 | 96.76 | 98.23 | |
8 | 92.28 | 92.62 | 99.44 | 99.40 | 99.72 | 99.80 | 99.81 | 99.44 | - | 92.60 | 97.24 | |
9 | 99.07 | 93.71 | 99.93 | 99.77 | 99.80 | 99.97 | 99.98 | 99.62 | 97.69 | - | 98.84 |
Models | ID class | OOD class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
ViT[20] | 0 | - | 88.70 | 96.73 | 99.02 | 96.98 | 99.82 | 98.77 | 98.21 | 66.32 | 79.11 | 91.52 |
1 | 98.03 | - | 99.97 | 99.93 | 99.92 | 99.99 | 99.98 | 99.87 | 95.48 | 69.60 | 95.86 | |
2 | 96.51 | 99.92 | - | 94.67 | 75.60 | 98.06 | 89.95 | 90.72 | 99.08 | 99.83 | 93.82 | |
3 | 98.54 | 99.70 | 94.27 | - | 75.83 | 66.18 | 89.15 | 81.41 | 99.56 | 98.99 | 89.29 | |
4 | 99.25 | 99.85 | 93.29 | 95.01 | - | 95.08 | 95.51 | 70.54 | 99.40 | 99.54 | 94.16 | |
5 | 99.74 | 99.97 | 98.38 | 87.74 | 89.16 | - | 98.40 | 85.77 | 99.92 | 99.85 | 95.44 | |
6 | 99.44 | 99.96 | 93.81 | 95.33 | 88.84 | 98.00 | - | 97.18 | 99.79 | 99.90 | 96.92 | |
7 | 98.85 | 99.64 | 97.53 | 95.31 | 76.90 | 91.92 | 99.25 | - | 98.41 | 98.70 | 95.17 | |
8 | 90.11 | 89.74 | 99.59 | 99.72 | 99.56 | 99.96 | 99.60 | 99.65 | - | 85.70 | 95.96 | |
9 | 98.11 | 85.44 | 99.96 | 99.93 | 99.80 | 99.99 | 99.98 | 99.75 | 96.08 | - | 97.67 | |
MoCov3[21] | 0 | - | 96.11 | 97.08 | 99.13 | 98.40 | 99.75 | 98.56 | 99.00 | 73.05 | 87.16 | 94.25 |
1 | 96.02 | - | 99.79 | 99.76 | 99.76 | 99.90 | 99.82 | 99.71 | 92.94 | 68.77 | 95.16 | |
2 | 93.25 | 99.90 | - | 94.04 | 80.49 | 97.94 | 88.20 | 91.06 | 98.09 | 99.60 | 93.62 | |
3 | 97.15 | 99.60 | 93.31 | - | 79.89 | 71.78 | 87.59 | 86.66 | 98.02 | 98.72 | 90.30 | |
4 | 98.28 | 99.91 | 92.84 | 93.95 | - | 96.47 | 94.59 | 73.27 | 98.35 | 99.53 | 94.13 | |
5 | 99.12 | 99.90 | 97.62 | 84.00 | 89.42 | - | 98.45 | 89.13 | 99.38 | 99.45 | 95.16 | |
6 | 99.38 | 99.97 | 95.00 | 95.30 | 93.13 | 98.72 | - | 98.83 | 99.66 | 99.90 | 97.77 | |
7 | 97.23 | 99.70 | 96.70 | 95.15 | 76.84 | 93.22 | 98.94 | - | 97.06 | 98.53 | 94.82 | |
8 | 87.65 | 95.39 | 99.26 | 99.45 | 99.47 | 99.76 | 99.27 | 99.57 | - | 89.66 | 96.61 | |
9 | 96.24 | 86.59 | 99.84 | 99.83 | 99.84 | 99.91 | 99.88 | 99.77 | 92.68 | - | 97.18 | |
DINOv2[22] | 0 | - | 72.83 | 64.15 | 68.05 | 65.47 | 68.53 | 68.80 | 70.10 | 54.59 | 69.72 | 66.92 |
1 | 66.18 | - | 66.00 | 56.84 | 62.76 | 58.68 | 60.05 | 57.11 | 54.97 | 52.30 | 59.43 | |
2 | 66.58 | 77.06 | - | 57.10 | 45.49 | 56.78 | 49.57 | 62.65 | 67.99 | 76.34 | 62.17 | |
3 | 74.83 | 77.23 | 59.35 | - | 52.76 | 50.60 | 51.10 | 63.03 | 71.63 | 76.36 | 64.10 | |
4 | 82.00 | 86.75 | 63.82 | 67.27 | - | 66.91 | 57.16 | 71.45 | 80.04 | 85.45 | 73.43 | |
5 | 79.01 | 80.38 | 61.21 | 54.33 | 54.49 | - | 54.53 | 64.44 | 75.72 | 80.20 | 67.15 | |
6 | 85.87 | 86.34 | 66.60 | 66.63 | 56.40 | 66.99 | - | 75.17 | 84.70 | 87.00 | 75.08 | |
7 | 76.76 | 76.51 | 59.87 | 56.22 | 49.64 | 53.88 | 52.70 | - | 73.05 | 72.37 | 63.44 | |
8 | 62.96 | 74.15 | 73.59 | 72.77 | 74.25 | 72.95 | 77.20 | 76.44 | - | 71.13 | 72.83 | |
9 | 68.96 | 59.80 | 70.76 | 61.83 | 68.02 | 62.95 | 66.87 | 58.87 | 57.35 | - | 63.94 | |
BEiTv2[23] | 0 | - | 98.30 | 99.41 | 99.91 | 99.60 | 99.98 | 99.86 | 99.68 | 91.03 | 93.81 | 97.95 |
1 | 99.66 | - | 100.00 | 99.99 | 99.99 | 99.99 | 100.00 | 99.99 | 98.49 | 78.72 | 97.43 | |
2 | 97.60 | 99.96 | - | 99.17 | 93.13 | 99.48 | 97.72 | 98.58 | 99.58 | 99.92 | 98.35 | |
3 | 99.74 | 99.87 | 98.56 | - | 94.77 | 80.80 | 94.73 | 97.37 | 99.89 | 99.90 | 96.18 | |
4 | 99.83 | 99.90 | 98.68 | 99.06 | - | 99.14 | 99.16 | 87.86 | 99.83 | 99.88 | 98.15 | |
5 | 99.94 | 99.97 | 99.62 | 91.48 | 98.24 | - | 99.70 | 96.71 | 99.95 | 99.81 | 98.38 | |
6 | 99.89 | 99.99 | 99.39 | 99.25 | 99.14 | 99.68 | - | 99.83 | 99.94 | 99.98 | 99.68 | |
7 | 99.67 | 99.82 | 99.76 | 99.75 | 97.38 | 99.45 | 99.98 | - | 99.67 | 99.66 | 99.46 | |
8 | 97.13 | 98.60 | 99.92 | 99.98 | 99.95 | 99.97 | 99.98 | 99.97 | - | 95.87 | 99.04 | |
9 | 99.36 | 95.53 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 98.54 | - | 99.27 | |
BEiT[12] | 0 | - | 98.74 | 99.39 | 99.91 | 99.59 | 99.96 | 99.83 | 99.77 | 91.23 | 93.39 | 97.98 |
1 | 99.32 | - | 100.00 | 99.98 | 99.98 | 100.00 | 100.00 | 99.97 | 99.02 | 88.68 | 98.55 | |
2 | 99.25 | 99.97 | - | 98.32 | 90.25 | 99.21 | 97.47 | 98.08 | 99.79 | 99.88 | 98.02 | |
3 | 99.78 | 99.85 | 99.09 | - | 94.53 | 83.32 | 94.51 | 97.40 | 99.94 | 99.60 | 96.45 | |
4 | 99.88 | 99.90 | 98.82 | 98.63 | - | 98.89 | 98.95 | 93.34 | 99.88 | 99.78 | 98.67 | |
5 | 99.98 | 99.88 | 99.77 | 90.61 | 97.43 | - | 99.60 | 93.86 | 99.99 | 99.83 | 97.88 | |
6 | 99.95 | 100.00 | 99.28 | 99.13 | 98.97 | 99.67 | - | 99.97 | 99.99 | 99.99 | 99.66 | |
7 | 99.55 | 99.45 | 99.76 | 99.41 | 97.82 | 99.04 | 99.94 | - | 99.49 | 99.13 | 99.29 | |
8 | 95.97 | 97.99 | 99.95 | 99.95 | 99.90 | 99.98 | 99.98 | 99.96 | - | 96.03 | 98.86 | |
9 | 99.38 | 94.73 | 100.00 | 99.98 | 99.98 | 100.00 | 100.00 | 99.85 | 98.85 | - | 99.20 |
Models | ID class | OOD class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
ViT[20] | 0 | - | 89.17 | 95.53 | 98.62 | 95.81 | 99.70 | 97.89 | 97.15 | 65.74 | 78.81 | 90.94 |
1 | 98.20 | - | 99.94 | 99.91 | 99.86 | 99.98 | 99.96 | 99.79 | 96.03 | 71.41 | 96.12 | |
2 | 96.19 | 99.84 | - | 94.76 | 77.37 | 98.03 | 89.80 | 90.19 | 98.81 | 99.68 | 93.85 | |
3 | 98.08 | 99.40 | 93.07 | - | 75.92 | 68.55 | 87.75 | 81.75 | 99.21 | 98.35 | 89.12 | |
4 | 99.19 | 99.81 | 93.59 | 95.16 | - | 96.03 | 95.52 | 70.51 | 99.35 | 99.54 | 94.30 | |
5 | 99.65 | 99.91 | 98.30 | 88.92 | 91.23 | - | 98.31 | 87.68 | 99.86 | 99.72 | 95.95 | |
6 | 99.30 | 99.93 | 93.91 | 95.23 | 89.50 | 98.04 | - | 96.95 | 99.64 | 99.82 | 96.93 | |
7 | 98.62 | 99.50 | 97.22 | 95.53 | 77.46 | 92.86 | 99.03 | - | 98.20 | 98.59 | 95.22 | |
8 | 90.26 | 90.45 | 99.48 | 99.67 | 99.47 | 99.93 | 99.48 | 99.57 | - | 87.26 | 96.18 | |
9 | 98.16 | 86.56 | 99.93 | 99.91 | 99.76 | 99.99 | 99.97 | 99.70 | 96.39 | - | 97.82 | |
MoCov3[21] | 0 | - | 95.82 | 95.69 | 98.75 | 97.49 | 99.58 | 97.83 | 98.04 | 71.17 | 85.75 | 93.35 |
1 | 95.59 | - | 99.66 | 99.69 | 99.67 | 99.85 | 99.74 | 99.50 | 92.72 | 68.79 | 95.02 | |
2 | 92.17 | 99.77 | - | 93.66 | 80.97 | 97.58 | 87.61 | 90.67 | 97.16 | 99.21 | 93.20 | |
3 | 95.87 | 99.33 | 91.17 | - | 78.08 | 71.77 | 86.00 | 84.32 | 96.86 | 98.04 | 89.05 | |
4 | 97.97 | 99.79 | 92.35 | 94.00 | - | 96.40 | 94.38 | 73.76 | 98.11 | 99.26 | 94.00 | |
5 | 98.66 | 99.78 | 97.00 | 84.31 | 90.24 | - | 97.96 | 89.13 | 98.93 | 99.14 | 95.02 | |
6 | 99.13 | 99.94 | 94.33 | 95.19 | 92.56 | 98.56 | - | 98.38 | 99.45 | 99.82 | 97.48 | |
7 | 96.61 | 99.55 | 96.01 | 94.83 | 76.00 | 92.97 | 98.57 | - | 96.47 | 98.15 | 94.35 | |
8 | 87.08 | 95.47 | 98.86 | 99.27 | 99.20 | 99.62 | 98.97 | 99.20 | - | 89.75 | 96.38 | |
9 | 95.98 | 86.94 | 99.74 | 99.77 | 99.76 | 99.87 | 99.82 | 99.63 | 92.63 | - | 97.13 | |
DINOv2[22] | 0 | - | 73.26 | 64.39 | 68.07 | 65.83 | 68.69 | 69.50 | 70.14 | 54.72 | 70.07 | 67.19 |
1 | 66.64 | - | 66.59 | 57.46 | 63.19 | 58.81 | 60.71 | 57.10 | 55.24 | 52.26 | 59.78 | |
2 | 67.18 | 77.06 | - | 57.44 | 45.86 | 57.11 | 49.82 | 63.01 | 68.56 | 76.62 | 62.52 | |
3 | 75.38 | 77.23 | 59.51 | - | 52.41 | 50.47 | 50.53 | 63.05 | 72.42 | 76.62 | 64.18 | |
4 | 82.15 | 86.71 | 63.94 | 67.52 | - | 67.12 | 57.42 | 71.56 | 80.55 | 86.07 | 73.67 | |
5 | 77.37 | 79.22 | 60.55 | 53.76 | 54.45 | - | 54.10 | 63.90 | 75.10 | 78.91 | 66.37 | |
6 | 86.06 | 86.59 | 66.51 | 66.79 | 56.39 | 67.45 | - | 75.36 | 84.95 | 87.54 | 75.29 | |
7 | 77.61 | 76.62 | 59.83 | 56.58 | 49.58 | 54.03 | 53.04 | - | 74.18 | 72.88 | 63.82 | |
8 | 63.06 | 74.53 | 74.59 | 73.16 | 75.12 | 73.67 | 78.29 | 76.87 | - | 71.34 | 73.40 | |
9 | 69.38 | 60.16 | 70.57 | 62.54 | 67.91 | 63.32 | 67.09 | 58.82 | 58.01 | - | 64.20 | |
BEiTv2[23] | 0 | - | 98.36 | 99.38 | 99.94 | 99.70 | 99.99 | 99.90 | 99.70 | 91.92 | 94.07 | 98.11 |
1 | 99.69 | - | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.99 | 98.96 | 80.71 | 97.70 | |
2 | 97.78 | 99.94 | - | 99.37 | 94.03 | 99.54 | 98.04 | 98.32 | 99.54 | 99.88 | 98.49 | |
3 | 99.72 | 99.89 | 98.56 | - | 94.80 | 81.26 | 95.70 | 96.85 | 99.85 | 99.87 | 96.28 | |
4 | 99.79 | 99.92 | 98.69 | 99.20 | - | 99.23 | 99.34 | 88.58 | 99.84 | 99.88 | 98.27 | |
5 | 99.94 | 99.98 | 99.57 | 92.56 | 98.39 | - | 99.73 | 97.18 | 99.92 | 99.85 | 98.57 | |
6 | 99.87 | 100.00 | 99.44 | 99.37 | 99.25 | 99.71 | - | 99.80 | 99.92 | 99.96 | 99.70 | |
7 | 99.68 | 99.80 | 99.74 | 99.77 | 97.64 | 99.50 | 99.96 | - | 99.69 | 99.62 | 99.49 | |
8 | 97.51 | 98.74 | 99.90 | 99.97 | 99.92 | 99.95 | 99.95 | 99.91 | - | 96.76 | 99.18 | |
9 | 99.41 | 95.56 | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.98 | 98.60 | - | 99.28 | |
BEiT[12] | 0 | - | 98.65 | 99.41 | 99.91 | 99.59 | 99.96 | 99.86 | 99.72 | 91.50 | 94.10 | 98.08 |
1 | 99.45 | - | 100.00 | 99.98 | 99.98 | 100.00 | 100.00 | 99.96 | 99.14 | 89.24 | 98.64 | |
2 | 99.25 | 99.95 | - | 98.42 | 90.61 | 99.25 | 97.69 | 98.32 | 99.78 | 99.87 | 98.13 | |
3 | 99.77 | 99.80 | 99.09 | - | 95.07 | 84.11 | 94.70 | 97.80 | 99.94 | 99.63 | 96.66 | |
4 | 99.89 | 99.89 | 98.88 | 98.85 | - | 99.06 | 99.09 | 93.79 | 99.90 | 99.77 | 98.79 | |
5 | 99.98 | 99.83 | 99.75 | 91.54 | 97.74 | - | 99.61 | 95.89 | 99.99 | 99.79 | 98.23 | |
6 | 99.95 | 100.00 | 99.33 | 99.19 | 99.06 | 99.67 | - | 99.96 | 99.98 | 99.98 | 99.68 | |
7 | 99.61 | 99.48 | 99.78 | 99.46 | 97.85 | 99.14 | 99.94 | - | 99.53 | 99.18 | 99.33 | |
8 | 96.51 | 98.01 | 99.95 | 99.96 | 99.92 | 99.98 | 99.98 | 99.95 | - | 96.05 | 98.92 | |
9 | 99.47 | 94.84 | 100.00 | 99.99 | 99.98 | 100.00 | 100.00 | 99.90 | 98.97 | - | 99.24 |
Models | ID class | OOD class | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
ViT[20] | 0 | - | 90.64 | 97.03 | 99.20 | 97.19 | 99.86 | 98.92 | 98.45 | 67.16 | 80.51 | 92.11 |
1 | 97.97 | - | 99.98 | 99.93 | 99.93 | 99.99 | 99.99 | 99.89 | 95.22 | 67.49 | 95.60 | |
2 | 96.31 | 99.93 | - | 94.88 | 74.69 | 98.10 | 89.81 | 90.86 | 99.10 | 99.85 | 93.72 | |
3 | 98.61 | 99.76 | 94.39 | - | 74.04 | 64.74 | 88.69 | 81.14 | 99.59 | 99.07 | 88.89 | |
4 | 99.25 | 99.86 | 93.62 | 95.27 | - | 95.28 | 95.60 | 71.21 | 99.41 | 99.54 | 94.34 | |
5 | 99.75 | 99.98 | 98.47 | 88.47 | 89.05 | - | 98.42 | 86.11 | 99.93 | 99.87 | 95.56 | |
6 | 99.46 | 99.97 | 94.05 | 95.60 | 88.80 | 98.08 | - | 97.32 | 99.80 | 99.91 | 97.00 | |
7 | 98.90 | 99.69 | 97.61 | 95.49 | 75.97 | 91.93 | 99.27 | - | 98.47 | 98.79 | 95.12 | |
8 | 90.10 | 90.70 | 99.63 | 99.77 | 99.59 | 99.97 | 99.63 | 99.68 | - | 86.14 | 96.14 | |
9 | 98.12 | 86.67 | 99.96 | 99.93 | 99.79 | 99.99 | 99.99 | 99.75 | 96.03 | - | 97.80 | |
MoCov3[21] | 0 | - | 97.10 | 97.66 | 99.36 | 98.74 | 99.83 | 98.88 | 99.27 | 74.06 | 88.58 | 94.83 |
1 | 96.25 | - | 99.82 | 99.79 | 99.79 | 99.91 | 99.84 | 99.74 | 93.06 | 67.99 | 95.13 | |
2 | 93.80 | 99.93 | - | 94.63 | 80.47 | 98.24 | 88.49 | 91.63 | 98.37 | 99.69 | 93.92 | |
3 | 97.60 | 99.68 | 94.10 | - | 79.97 | 72.24 | 87.98 | 87.63 | 98.32 | 98.96 | 90.72 | |
4 | 98.49 | 99.93 | 93.56 | 94.53 | - | 96.84 | 94.90 | 74.27 | 98.52 | 99.61 | 94.52 | |
5 | 99.29 | 99.93 | 97.94 | 85.08 | 89.84 | - | 98.61 | 89.87 | 99.49 | 99.57 | 95.51 | |
6 | 99.46 | 99.97 | 95.39 | 95.71 | 93.39 | 98.87 | - | 98.97 | 99.69 | 99.90 | 97.93 | |
7 | 97.57 | 99.77 | 97.08 | 95.65 | 76.65 | 93.74 | 99.06 | - | 97.39 | 98.74 | 95.07 | |
8 | 88.58 | 96.14 | 99.39 | 99.55 | 99.55 | 99.81 | 99.39 | 99.64 | - | 90.57 | 96.96 | |
9 | 96.55 | 87.81 | 99.87 | 99.86 | 99.86 | 99.93 | 99.90 | 99.79 | 92.96 | - | 97.39 | |
DINOv2[22] | 0 | - | 72.84 | 64.24 | 68.20 | 65.56 | 68.69 | 68.90 | 70.20 | 54.60 | 69.76 | 67.00 |
1 | 66.30 | - | 66.15 | 57.07 | 62.92 | 58.93 | 60.22 | 57.30 | 55.07 | 52.37 | 59.59 | |
2 | 66.61 | 77.05 | - | 57.20 | 45.49 | 56.89 | 49.57 | 62.70 | 68.01 | 76.37 | 62.21 | |
3 | 74.84 | 77.16 | 59.30 | - | 52.69 | 50.62 | 51.02 | 63.03 | 71.61 | 76.33 | 64.06 | |
4 | 82.04 | 86.74 | 63.85 | 67.37 | - | 67.00 | 57.18 | 71.49 | 80.07 | 85.48 | 73.47 | |
5 | 79.00 | 80.30 | 61.17 | 54.32 | 54.41 | - | 54.44 | 64.39 | 75.69 | 80.16 | 67.10 | |
6 | 85.90 | 86.33 | 66.63 | 66.71 | 56.41 | 67.08 | - | 75.22 | 84.73 | 87.03 | 75.12 | |
7 | 76.80 | 76.47 | 59.90 | 56.30 | 49.63 | 53.98 | 52.69 | - | 73.07 | 72.36 | 63.47 | |
8 | 63.00 | 74.18 | 73.68 | 72.92 | 74.35 | 73.11 | 77.30 | 76.54 | - | 71.19 | 72.92 | |
9 | 69.06 | 59.79 | 70.88 | 62.04 | 68.16 | 63.16 | 67.02 | 59.00 | 57.41 | - | 64.06 | |
BEiTv2[23] | 0 | - | 97.86 | 99.24 | 99.90 | 99.48 | 99.97 | 99.73 | 99.61 | 87.59 | 89.06 | 96.94 |
1 | 99.49 | - | 99.99 | 99.99 | 99.98 | 99.98 | 100.00 | 99.99 | 97.62 | 66.23 | 95.92 | |
2 | 95.66 | 99.95 | - | 98.84 | 90.50 | 98.98 | 95.46 | 97.71 | 99.07 | 99.77 | 97.33 | |
3 | 99.44 | 99.79 | 98.00 | - | 93.19 | 72.65 | 90.51 | 96.39 | 99.65 | 99.32 | 94.33 | |
4 | 99.72 | 99.87 | 98.37 | 98.87 | - | 98.76 | 98.78 | 84.32 | 99.72 | 99.75 | 97.57 | |
5 | 99.86 | 99.95 | 99.47 | 91.21 | 97.83 | - | 99.52 | 95.97 | 99.88 | 99.49 | 98.13 | |
6 | 99.85 | 99.99 | 99.26 | 99.14 | 98.93 | 99.60 | - | 99.81 | 99.91 | 99.92 | 99.60 | |
7 | 99.55 | 99.78 | 99.68 | 99.72 | 96.47 | 99.17 | 99.96 | - | 99.45 | 99.34 | 99.24 | |
8 | 96.02 | 98.31 | 99.90 | 99.97 | 99.94 | 99.96 | 99.96 | 99.97 | - | 93.04 | 98.56 | |
9 | 99.17 | 95.57 | 100.00 | 99.99 | 99.99 | 99.99 | 100.00 | 99.99 | 98.21 | - | 99.21 | |
BEiT[12] | 0 | - | 98.76 | 99.35 | 99.90 | 99.48 | 99.95 | 99.76 | 99.78 | 88.74 | 91.93 | 97.52 |
1 | 99.15 | - | 100.00 | 99.98 | 99.97 | 100.00 | 100.00 | 99.97 | 98.78 | 84.81 | 98.07 | |
2 | 99.12 | 99.98 | - | 97.89 | 88.04 | 98.88 | 96.49 | 97.93 | 99.73 | 99.85 | 97.55 | |
3 | 99.77 | 99.83 | 98.95 | - | 93.35 | 78.82 | 92.42 | 97.56 | 99.94 | 99.38 | 95.56 | |
4 | 99.87 | 99.90 | 98.65 | 98.48 | - | 98.66 | 98.75 | 93.22 | 99.85 | 99.73 | 98.57 | |
5 | 99.97 | 99.88 | 99.73 | 90.68 | 97.10 | - | 99.49 | 94.34 | 99.99 | 99.76 | 97.88 | |
6 | 99.96 | 100.00 | 99.27 | 99.08 | 98.83 | 99.64 | - | 99.98 | 99.99 | 99.99 | 99.64 | |
7 | 99.53 | 99.46 | 99.75 | 99.39 | 97.50 | 98.89 | 99.92 | - | 99.46 | 99.03 | 99.22 | |
8 | 95.53 | 98.13 | 99.94 | 99.95 | 99.87 | 99.97 | 99.98 | 99.96 | - | 95.62 | 98.77 | |
9 | 99.32 | 95.30 | 100.00 | 99.98 | 99.97 | 100.00 | 100.00 | 99.86 | 98.67 | - | 99.23 |