1. Introduction
Recent advancements in Convolutional Neural Networks (CNNs) have enabled a revolutionary growth in several fields of machine vision and artificial intelligence [
1]. Their characteristic ability to generalize well in difficult visual tasks has been a key component in the diffusion of this technology into numerous applications that are based on the analysis of 2/3D data. One of the first research questions raised [
2] among the deep learning community was how the “knowledge” stored inside a Neural Network can be efficiently transferred into another model. From a very early stage, it was obvious [
3] that such a capability could provide a path for the adoption of deep learning in several fields beyond the typical tasks of computer vision. By enabling a neural model to harness the information stored into another trained network, the latter effectively acts as an extra source of information [
4]. This could facilitate training with less data, improve the accuracy of the trained models, train smaller and more efficient models friendlier to the limitations of edge computing, etc.
For a very large number of important applications, the necessity of acquiring a large body of training data to benefit from learning features tailored to the task-at-hand renders the end-to-end learning of deep models prohibitive. Data acquisition and annotation in several fields, such as biometric recognition, forensics, biomedical imaging, etc., is notoriously difficult due to various restrictions and limitations [
5,
6,
7] (e.g., cost of specialized personnel, privacy issues, etc.). Therefore, research and development in such fields can greatly benefit [
8] from techniques that enable powerful algorithms such as CNNs to efficiently learn from limited datasets, or equivalently increase the performance of the current techniques for training under such restrictions.
Transfer learning has been an active field for several decades, preceding the development of deep learning, producing several methods for transforming knowledge from a source domain/task to a target domain/task [
9]. Many methods and approaches, especially those oriented to knowledge transfer between deep neural networks with the same topology, exhibit significant overlap with the research field of Domain Adaptation [
10]. In this intersection, transfer learning can be formulated as the quest for an appropriate transformation for the representations learned over the source domain, so that they match the distribution and characteristics of the target domain/task. Although various approaches have been proposed for deep transfer learning [
11], the most widely used technique is that of directly transferring (copying) the coefficients from a model trained on a source task to a target network of equivalent architecture, intended for a different (target) task. The latter model typically undergoes a “fine-tuning” process, where only the last layers are updated aggressively, while the transferred layers are only allowed to perform very small modifications to the corresponding coefficients. This strategy has a dual objective: (1) to initialize most of the target model’s parameters to a more relevant initial state that can already produce meaningful representations of the visual information, and (2) to indirectly act as a regularization mechanism, forcing the optimization to move in a subspace of solutions largely dictated by the coefficients transferred from the first model. Despite its simplicity, this is often a very successful strategy that produces models with better generalization than regular training (with random initial conditions), especially when dealing with limited data. Some important limitations are naturally occurring, though, since the quality of the produced solution is related to the similarity between the dataset/task used to pre-train part of the model, and the target dataset/task [
2,
12]. Most importantly, this method does not enable knowledge transfer between different model architectures, therefore it is not appropriate for various applications such as training smaller models that can benefit from larger “expert” models trained on the same or similar tasks.
A solution towards this direction was proposed by Hinton et al. with the method of Knowledge Distillation [
4], an approach that allows individual Neural Networks to gain knowledge from multiple sources, such as external data, large models trained in the same task and even model ensembles. This approach is based on the relaxation of the classification task, by softening the target response of the model’s output through temperature scaling. The authors argue that besides targeting just to a large response for the correct output node, the trained model can gain more insights into the underlying information structure of the task, by aiming to replicate the softened response of an expert model. An expert model is considered a trained (large) model or ensemble that exhibits good performance on the target task. The goal of this process is to train a smaller model that is able to generalize better, using less data compared to a regular training. The authors demonstrated that their approach acts as an efficient regularization mechanism, and it has since been considered as a very successful method for transferring knowledge between models.
Similar to Knowledge Distillation, most methods for transfer learning [
9] and Domain Adaptation [
10] are aiming to manipulate the global image representation produced by the trained CNN in the final layer(s) of the model’s architecture, prior to any task-specific output layer. In this work, partly motivated by recent works demonstrating that local descriptors can be used to construct effective regularization functions that manipulate style [
13] and texture [
14] of images in generative tasks, we investigate ways to utilize geometric regularization of local features from intermediate layers of the trained CNN, as a mechanism for knowledge transfer between the models. We explore various ways to construct computationally efficient regularization functions with geometric context. By drawing inspiration from manifold-to-manifold comparison literature, we formulate lightweight regularization terms that incentivize various sections of a “student” CNN to gradually learn how to generate local features with similar geometry to those of another—more knowledgeable—model (“instructor”), which is pre-trained on the same or a different task. The investigated functions directly act on the local features produced in the intermediate representations of the trained CNN, by imposing some restrictions on the neighboring relations of the feature vectors. In this way, the regularization mechanism aims to manipulate the local manifolds of the activations in various layers within the model. An overview of the proposed regularization scheme is shown in
Figure 1.
We investigate the efficiency of different criteria for the definition of local neighborhoods and propose a technique that proves to be very efficient in transferring knowledge from the teacher to the student model. An important aspect of this approach is that the only requirement for the architectures of student and instructor is to have matching spatial dimensions in the layers chosen for regularization. To the authors’ knowledge, this is the first method for knowledge transfer between CNNs that utilizes geometric regularization of the local activations. The proposed method is independent of the target task of the training process, and also to the features’ dimensionality and the models’ depth and architecture. Finally, it is demonstrated that it can be used complementarily to distillation or similar methods, thus enhancing the efficiency of knowledge transfer in various applications. Preliminary results from a partial investigation on a subset of the presented methods have been recently presented in a conference paper of ours [
15]. The current work provides a significantly extended description of the proposed regularization scheme, formulating and evaluating additional geometrical criteria that offer valuable insights on the important parameters of the regularization process. It also includes a significantly extended experimental section, evaluating different scenarios of knowledge transfer, such as knowledge transfer from expert models, transfer between models of experts and transfer from external data. Additionally, we provide experimental evidence regarding the effects of different formulations and geometrical criteria of the regularization problem, providing some guidelines for the incorporation of geometric regularization for local activations into training tasks.
The rest of the paper is organized as follows: In
Section 2, we provide a brief overview of the related literature, highlighting methods for regularization of global manifolds and their innate weaknesses to handle data limitations efficiently. We also provide an overview of approaches to formulating manifold-to-manifold distance measures and draw some links to other regularization schemes that exploit local features for generative tasks. The detailed formulation of the investigated functions is provided in
Section 3. Experimental results for knowledge transfer in different settings and applications are provided in
Section 4. Conclusions and future directions are discussed in
Section 5.
3. Proposed Method
In this work, we aim at designing a mechanism that incentivizes a “student” CNN to create local features that resemble, in overall geometry, those of an “instructor” model, at various levels across the models’ architectures. The hypothesis is that the knowledge of the instructor model is materialized across all its layers through the specific succession of the learned encodings. Thus, a reasonable and direct path for the student model to harvest this knowledge is to learn how to mimic the geometry of those encodings across its architecture. This approach can be thought of as being complementary to that of Knowledge Distillation or similar methods, which target only to mimicking the global encodings at the final layers of the instructor model.
To create a regularization mechanism for the spatial activations,
, at the output of a layer of a student CNN, we have to formulate an appropriate differentiable loss function that estimates a dissimilarity between
and a set of corresponding activations,
, from an instructor CNN. This function will be used as an additional term in the overall loss function of the learning optimization problem. In the general case:
In order to provide a greater flexibility on the architectures of student and instructor CNNs, we assume that the dimensionality of the local features’
and
in the two models can be different. This assumption automatically disqualifies typical functions for geometrical alignment, such as the one utilized in [
14]. Such techniques try to enforce each student’s feature vector to be near to its neighbors from the instructor’s features. Hence, since the two vector spaces could have different dimensionality, this approach is not applicable. On the other hand, in a knowledge-transfer setting similar to
Figure 1, the local features in two sets with matching spatial dimensions have an implicit one-to-one correspondence since they stem from the same input image. A convenient way to exploit this while enabling different feature dimensionality is to formulate a loss function that tries to enforce similar affinity patterns between corresponding vectors in the instructor’s and student’s feature sets. In this setting, the affinity pattern of a feature vector is defined by a function of the distance between this vector and all or a subset of the other vectors of the same set. In this way, the regularization is imposed on the affinities within the student’s vector space, thus enabling instructor’s activations to live in a space of different dimensionality. This type of regularization differs from a more typical approach of vectors’ geometrical alignment, in the sense that the loss function’s objective is not to locate each student’s feature in a particular region of the feature space dictated by the instructor model via some rules, but rather to force all regularized activations of the student to mimic the neighboring relations exhibited by the corresponding activations in the instructor model. A general form of the regularization loss in such context is as follows:
where
and
are functions that measure pairwise similarities or distances inside the
-dimensional vector spaces at the student’s and instructor’s sides, respectively.
In this work, we will investigate two approaches for such function, designed to be computationally lightweight and differentiable. First, we aim at the direct comparison of the neighboring patterns between corresponding activation features from the student and the instructor models. Second, we study a more relaxed criterion that offers some additional degrees of freedom to the student model’s activations. This criterion is based on comparing only the ratio between the sum of distances to each feature’s neighbors to the sum of distances to all features of the activation map.
3.2. Affinity Contrast Loss
The second approach to geometric regularization that we investigate in this work is based on a criterion presented in [
41] for the comparison of the manifold structures generated by local descriptors on various 1D and 2D signals. Results in that work indicated that the degree of how well the definition of local neighborhoods—as derived from one of the two compared manifolds—reflects the relationships of the corresponding vectors in the other manifold, is directly related to the similarity of the underlying signals. Furthermore, by simply measuring the contrast (ratio) in the similarity between the neighbors and the rest of the vectors for each feature set, one can derive an efficient measure of how well the definition of neighborhoods and the underlying manifold structures match. Hence, this measure can be used as a dissimilarity measure between the respective signals that generated the local feature sets.
In the context of knowledge transfer considered in the current work, a similar function could measure the dissimilarity between sets of activation features. Hence, if the input signals to the instructor and the student models are equivalent, such a function can act as a manifold-to-manifold distance metric between the local activation manifolds at corresponding layers of the two models. In this scheme, the regularization function aims at imposing to the student model’s activations a similar geometry to the corresponding instructor’s activations manifold for each input signal, similar to the functionality of NP loss defined in Equation (4).
Following the same reasoning described in
Section 3.1, we opted for using the normalized square Euclidean distance defined in Equation (3) as the pairwise vectors’ comparison measure. Additionally, the definition of the neighborhood is again computed only on the instructor’s side, similar to the definition provided in Equation (5). To construct the loss function, we first define a measure of Local Affinity Contrast for a set of
N feature vectors with neighborhood mask
M and normalized pairwise distances
D, as follows:
where
is provided by Equation (3).
The main criterion for defining the neighborhoods that we will investigate here is again inspired by [
41]. In that work, the Minimal Spanning Tree (MST) connecting the nodes representing the feature vectors is used as a minimalistic backbone on which neighborhoods are defined via a radius of geodesic distance. The rationale behind this decision is that the MST is a graph that is less prone to topological short-circuits than, e.g., k-NN. Therefore, by considering the neighbors of each node based on their geodesic proximity to this node’s position on the MST, it is less possible for the neighborhoods to contain members which are distant from a geometric perspective, but adjacent in a Euclidean fashion. Therefore, by following a similar scheme, the neighborhood of each feature can be defined by computing the MST on the activation features of the instructor’s model, and use the following definition of the neighborhood mask:
where
is the geodesic distance between the
ith and
jth node on the MST computed on the instructor’s activation features. In the Experimental Section, we will compare this criterion to the more straightforward approach of using the k-NN rule to the activation features of the instructor’s model, constructing a neighborhood mask
that indicates the k-nearest neighbors of each feature in the feature space.
Using either of the above definitions of neighborhood, we can define the Affinity Contrast loss (AC) for the student model’s activations as follows:
Again, since the instructor network is not updated during training, either the MST- or the k-NN-based neighborhoods can be computed only once for each training sample. Thus, the overall computational overhead of the proposed regularization scheme is kept very small, originating mostly from the pair-wise distance computations between the local features, .
Note that either of the above criteria for defining the neighboring relations and constructing the neighborhood masks can be used in Equation (5) in order to regularize the activations with the NP loss, thus enabling different combinations of regularization functions and neighborhoods to be implemented.
5. Conclusions
We have presented a method for knowledge transfer based on the geometric regularization of local activations in the intermediate layers of Convolutional Neural Networks. According to the proposed scheme, the student model is incentivized to produce local features that follow the geometrical properties of those stemming from the instructor model, at corresponding spatial scales. In order to eliminate the necessity of matching features’ dimensionality between the instructor and student—taking advantage of the explicit one-to-one correspondence between the local features at matching spatial grids—we opted for encoding the geometric properties in terms of affinity patterns exclusively within each feature set. Thus, the objective of the regularization is transformed so as to enforce specific similarities between the local features, mimicking the corresponding similarities between the features in the instructor model for the same input data.
We formulated and assessed two variants for the regularization loss that exhibit different qualities. The Neighboring Pattern Loss aims to directly penalize any deviation of the distance patterns from the target patterns. The Affinity Contrast loss compares the ratio between the sum of distances between each feature vector and its neighbors, to the sum of distances to all the other features. Thus, it provides some additional degrees of freedom to the student model for penalty-free alteration of the learned representations that still retain some important characteristics of the target geometry. We have investigated the behavior of both functions, and highlighted the importance of the definition of neighborhoods, by comparing the regularization efficiency of an MST-based criterion and the simple k-NN rule.
Experimental evaluation revealed very promising results regarding the benefits of geometric regularization under the presented scheme. In all experiments, the regularized models consistently exhibited an accuracy improvement compared to the regularly trained models under the same conditions and initializations. The AC loss consistently delivered greater performance improvements compared to NP loss, indicating that the more relaxed objective could have some advantages under the investigated context. Additionally, experiments showed that the MST-based criteria for defining the neighbors of each local feature can be beneficial compared to the simple k-NN rule, especially in more challenging classification tasks.
Geometric regularization, especially via AC loss, was tested under various experimental settings, such as: (a) knowledge transfer from an expert model to a smaller student, (b) knowledge transfer from external data via an instructor with different architecture and (c) knowledge transfer between experts for accuracy improvement. Especially in the latter case, the regularized model achieved better performance from both the reference and instructor models in the most challenging of the tested tasks. The comparison to the established technique of Knowledge Distillation revealed similar levels of performance improvement, but most importantly provided positive evidence for the combination of both local and global feature-based regularization techniques to the same learning problem.
The comparative runtime for regular versus regularized training was measured at ×1.6 slower for training a regularized Simple CNN and ×2.1 slower for training a regularized NiN model, with negligible variations between different regularization functions. The training time, however, is heavily affected by the configuration of the training H/W, the particularities of the utilized deep learning framework and the specific implementation of the training routine. As an example, the higher GPU memory utilization of the Caffe framework utilized here, imposes restrictions to the size of batches, casting the read-time and bandwidth of the SSD hard disk as the predominant sources of delay. However, preliminary experiments with different setups indicate that through an appropriate combination of H/W configuration and S/W implementation, the overhead of the regularization can be reduced below 40% even for deeper models with up to 5 regularized layers.
Despite the positive evidence, there is a lot of room for improving the regularization objectives by investigating different formulations and geometrical criteria, and also thoroughly investigating the efficacy of the presented techniques in different tasks (e.g., detection, segmentation, etc.). Furthermore, recent advances in self-supervised [
52] learning have revealed great potential for regularization methods to be used in new tasks, beyond the typical knowledge transfer. In the future, we are committed to investigate different appropriate formulations of the geometrical similarity in local activations and apply these techniques to larger and more diverse visual tasks. Furthermore, we are working to assess the effectiveness of the presented techniques in a self-supervised setting, either as standalone loss functions or combined with objectives which are formulated around the geometry and statistics of the global image features.
Author Contributions
Conceptualization, methodology, I.T.; software, investigation and data curation, I.T. and F.F.; writing—original draft preparation, I.T.; writing—review and editing, F.F. and G.E.; visualization, I.T.; supervision, G.E.; project administration, G.E.; funding acquisition, I.T. and F.F. All authors have read and agreed to the published version of the manuscript.
Funding
This research is co-financed by Greece and the European Union (European Social Fund—ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning 2014–2020” in the context of the project “New knowledge-transfer and regularization techniques for training Convolutional Neural Networks with limited data” (MIS 5047164).
Data Availability Statement
The datasets unutilized in this study are 3rd Party Data. Restrictions may apply to the availability of these data. Data were obtained from the official repository of each respective dataset and are available in
https://www.cs.toronto.edu/~kriz/cifar.html (CIFAR10/100) and
http://ufldl.stanford.edu/housenumbers (SVHN) (accessed on 6 August 2021). Any additional data generated by this study is contained within the article.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Hassaballah, M.; Awad, A.I. Deep Learning in Computer Vision: Principles and Applications; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
- Bellinger, C.; Drummond, C.; Japkowicz, N. Manifold-Based Synthetic Oversampling with Manifold Conformance Estimation. Mach. Learn. 2018, 107, 605–637. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Mehraj, H.; Mir, A. A Survey of Biometric Recognition Using Deep Learning. EAI Endorsed Trans. Energy Web 2020, 8, e6. [Google Scholar] [CrossRef]
- Albert, B.A. Deep Learning From Limited Training Data: Novel Segmentation and Ensemble Algorithms Applied to Automatic Melanoma Diagnosis. IEEE Access 2020, 8, 31254–31269. [Google Scholar] [CrossRef]
- Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep Learning-Enabled Medical Computer Vision. NPJ Digit. Med. 2021, 4, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Sundararajan, K.; Woodard, D.L. Deep Learning for Biometrics: A Survey. ACM Comput. Surv. 2018, 51, 65:1–65:34. [Google Scholar] [CrossRef]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
- Wang, M.; Deng, W. Deep Visual Domain Adaptation: A Survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef] [Green Version]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Proceedings of the Artificial Neural Networks and Machine Learning, Rhodes, Greece, 4–7 October 2018; pp. 270–279. [Google Scholar]
- Zamir, A.R.; Sax, A.; Shen, W.; Guibas, L.J.; Malik, J.; Savarese, S. Taskonomy: Disentangling Task Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3712–3722. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A Neural Algorithm of Artistic Style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- Mechrez, R.; Talmi, I.; Zelnik-Manor, L. The Contextual Loss for Image Transformation with Non-Aligned Data. arXiv 2018, arXiv:1803.02077. [Google Scholar]
- Theodorakopoulos, I.; Fotopoulou, F.; Economou, G. Local Manifold Regularization for Knowledge Transfer in Convolutional Neural Networks. In Proceedings of the 2020 11th International Conference on Information, Intelligence, Systems and Applications, Piraeus, Greece, 15–17 July 2020; pp. 1–8. [Google Scholar]
- Ma, X.; Liu, W. Recent Advances of Manifold Regularization. In Manifolds II-Theory and Applications; IntechOpen: London, UK, 2018; ISBN 978-1-83880-310-0. [Google Scholar]
- Reed, S.; Sohn, K.; Zhang, Y.; Lee, H. Learning to Disentangle Factors of Variation with Manifold Interaction. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21 June 2014; pp. 1431–1439. [Google Scholar]
- Lee, T.; Choi, M.; Yoon, S. Manifold Regularized Deep Neural Networks Using Adversarial Examples. arXiv 2016, arXiv:1511.06381. [Google Scholar]
- Verma, V.; Lamb, A.; Beckham, C.; Najafi, A.; Courville, A.; Mitliagkas, I.; Bengio, Y. Manifold Mixup: Learning Better Representations by Interpolating Hidden States. Proceedings of International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Lee, J.A.; Verleysen, M. Nonlinear Dimensionality Reduction; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
- Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Dai, D.; Li, W.; Kroeger, T.; Van Gool, L. Ensemble Manifold Segmentation for Model Distillation and Semi-Supervised Learning. arXiv 2018, arXiv:1804.02201. [Google Scholar]
- Zhu, W.; Qiu, Q.; Huang, J.; Calderbank, A.; Sapiro, G.; Daubechies, I. LDMNet: Low Dimensional Manifold Regularized Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Li, L.; Wang, S.; Zhang, W.; Huang, Q. A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning. Proceedings the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1203–1211. [Google Scholar]
- Jin, C.; Rinard, M. Manifold Regularization for Locally Stable Deep Neural Networks. arXiv 2020, arXiv:2003.04286. [Google Scholar]
- Von Luxburg, U. Statistical Learning with Similarity and Dissimilarity Functions. Ph.D. Thesis, Technische Universität Berlin, Berlin, Germany, 2004. [Google Scholar]
- Goshtasby, A.A. Similarity and Dissimilarity Measures. In Image Registration: Principles, Tools and Methods; Advances in Computer Vision and Pattern Recognition; Springer: London, UK, 2012; pp. 7–66. ISBN 978-1-4471-2458-0. [Google Scholar]
- Gower, J.C.; Warrens, M.J. Similarity, Dissimilarity, and Distance, Measures of. Wiley StatsRef Stat. Ref. Online 2017. [Google Scholar] [CrossRef]
- Costa, Y.M.G.; Bertolini, D.; Britto, A.S.; Cavalcanti, G.D.C.; Oliveira, L.E.S. The Dissimilarity Approach: A Review. Artif. Intell. Rev. 2020, 53, 2783–2808. [Google Scholar] [CrossRef]
- Arandjelovic, O.; Shakhnarovich, G.; Fisher, J.; Cipolla, R.; Darrell, T. Face Recognition with Image Sets Using Manifold Density Divergence. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 581–588. [Google Scholar]
- Friedman, J.H.; Rafsky, L.C. Multivariate Generalizations of the Wald--Wolfowitz and Smirnov Two-Sample Tests. Ann. Stat. 1979, 7, 697–717. [Google Scholar] [CrossRef]
- Kastaniotis, D.; Theodorakopoulos, I.; Theoharatos, C.; Economou, G.; Fotopoulos, S. A Framework for Gait-Based Recognition Using Kinect. Pattern Recognit. Lett. 2015, 68, 327–335. [Google Scholar] [CrossRef]
- Theodorakopoulos, I.; Economou, G.; Fotopoulos, S. Collaborative Sparse Representation in Dissimilarity Space for Classification of Visual Information. In Proceedings of the Advances in Visual Computing, Rethymnon, Crete, Greece, 29–31 July 2013; pp. 496–506. [Google Scholar]
- Bjorck, A.; Golub, G. Numerical Methods for Computing Angles Between Linear Subspaces. Math. Comput. 1973, 27, 123. [Google Scholar] [CrossRef]
- Kim, T.-K.; Arandjelović, O.; Cipolla, R. Boosted Manifold Principal Angles for Image Set-Based Recognition. Pattern Recognit. 2007, 40, 2475–2484. [Google Scholar] [CrossRef] [Green Version]
- Wang, R.; Shan, S.; Chen, X.; Gao, W. Manifold-Manifold Distance with Application to Face Recognition Based on Image Set. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Vasconcelos, N.; Lippman, A. A Multiresolution Manifold Distance for Invariant Image Similarity. IEEE Trans. Multimed. 2005, 7, 127–142. [Google Scholar] [CrossRef] [Green Version]
- Hamm, J.; Lee, D.D. Grassmann Discriminant Analysis: A Unifying View on Subspace-Based Learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; Association for Computing Machinery: New York, NY, USA; pp. 376–383.
- Lu, J.; Tan, Y.-P.; Wang, G. Discriminative Multimanifold Analysis for Face Recognition from a Single Training Sample per Person. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 39–51. [Google Scholar] [CrossRef] [PubMed]
- Theodorakopoulos, I.; Economou, G.; Fotopoulos, S.; Theoharatos, C. Local Manifold Distance Based on Neighborhood Graph Reordering. Pattern Recognit. 2016, 53, 195–211. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv 2016, arXiv:1603.08155. [Google Scholar]
- Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Yu, Y.; Song, M. Neural Style Transfer: A Review. IEEE Trans. Vis. Comput. Graph. 2020, 26, 3365–3385. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.-H.; Liao, Q. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef] [Green Version]
- Chauhan, K.; Patel, H.; Dave, R.; Bhatia, J.; Kumhar, M. Advances in Single Image Super-Resolution: A Deep Learning Perspective. Proceedings of First International Conference on Computing, Communications, and Cyber-Security, Chandigarh, India, 12–13 October 2019; pp. 443–455. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in Network. arXiv 2014, arXiv:1312.4400. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2012. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 16–17 December 2011. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; Association for Computing Machinery: New York, NY, USA; pp. 675–678.
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow Twins: Self-Supervised Learning via Redundancy Reduction. arXiv 2021, arXiv:2103.03230. [Google Scholar]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).