Improved Fine-Grained Representation Learning with Data Transformation
- Publication Type:
- Thesis
- Issue Date:
- 2021
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Fine-grained recognition is challenging in computer vision and artificial intelligence. It aims to identify under subcategories of given images but suffers from small inter-class variance and large intra-class variance along with multiple object scales and complex background, leading to a more complex problem space. Recently, deep neural networks have extensively promoted the development of fine-grained recognition. However, the existing methods still suffer from several issues, including data limitation, model interpretation, and performance. In this thesis, we propose several data-transformation models to address these challenges.
First, we develop a unified framework (MGN-CNN) based on a mixture of experts to promote diversity among experts by combing a gradually-enhanced learning strategy and a KullbackLeibler divergence based constraint. The strategy learns new experts on the dataset with prior knowledge from former experts and adds them to the model sequentially. At the same time, the introduced constraint forces the experts to produce diverse prediction distributions. These drive the experts to learn the task from different aspects, making them specialized in various subspace problems.
Second, we propose Intra-class Part Swapping (InPS) that produces new data by performing attention-guided content swapping on input pairs from the same class. Compared with previous approaches, InPS avoids introducing noisy labels and ensures a likely holistic structure of objects in generated images. We demonstrate InPS outperforms the most recent augmentation approaches in both fine-grained recognition and weakly object localization.
Finally, we explore fine-grained zero-shot learning and introduce a novel structure-aware feature generation scheme, termed SA-GAN, to explicitly account for the topological structure in learning both the latent space and the generative networks. This topology-preserving mechanism enables our method to significantly enhance the generalization capability on unseen-classes and consequently improve the classification performance.
Please use this identifier to cite or link to this item: