An Indoor Scene Recognition-Based 3D Registration Mechanism for Real-Time AR-GIS Visualization in Mobile Applications
Abstract
:1. Introduction
- (1)
- We combine the spatial layout estimation with pose tracking technology for rational and logical AR visualization, breaking through rendering limitations.
- (2)
- We present a novel automatic method for fusing AR scenes in an indoor environment, which does not rely on the conventional depth detection or 3D modeling processes. No professional data acquisition equipment is needed in this approach, as it is more resilient to spatial alterations and more faithfully represents changing indoor scenes.
2. Related Work
2.1. AR Tracking Technology
2.2. 3D Indoor Scene Understanding
3. Methods
3.1. Overview
3.2. Camera-Pose Tracking Based on Natural Features
3.3. Spatial Layout Estimation
3.3.1. Learning to Predict Rough Layout
3.3.2. Ranking Box Layouts
3.4. Fusion of Real Indoor Scene and Virtual 3D Object
- (a)
- Feature extraction
- (b)
- Feature matching
- (c)
- Spatial layout coordinate-points transformationThe geometric relationships between the real world, virtual objects, and the camera are defined in the same generic units. However, the spatial layout is based on pixel coordinates. To overcome this issue, we transform the spatial layout coordinates to the world coordinates according to the pixel dimensions and dots per inch (DPI) of the reference image.
- (d)
- Position and orient object
- (e)
- Render 3D objects in video frame
4. Experiments
4.1. Experiments Setup
4.2. Experimental Process
4.2.1. Offline Training Stage
- (a)
- Image training
- (b)
- Estimating spatial layout
4.2.2. Online Fusion Stage
- (a)
- Camera pose computing
- (b)
- Virtual object rendering
4.3. Experimental Results and Discussion
5. Conclusions and Future Work
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Arth, C.; Klopschitz, M.; Reitmayr, G.; Schmalstieg, D. Real-time self-localization from panoramic images on mobile devices. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011, Basel, Switzerland, 26–29 October 2011. [Google Scholar]
- Wagner, D.; Reitmayr, G.; Mulloni, A.; Drummond, T.; Schmalstieg, D. Real-Time Detection and Tracking for Augmented Reality on Mobile Phones. IEEE Trans. Vis. Comput. Graph. 2010, 16, 355. [Google Scholar] [CrossRef] [PubMed]
- Wagner, D.; Reitmayr, G.; Mulloni, A.; Drummond, T.; Schmalstieg, D. Pose tracking from natural features on mobile phones. In Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, Cambridge, UK, 15–18 September 2008. [Google Scholar]
- Lepetit, V.; Lagger, P.; Fua, P. Randomized trees for real-time keypoint recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Chia, K.W.; Cheok, A.D.; Prince, S.J.D. Online 6 DOF Augmented Reality Registration from Natural Features. In Proceedings of the International Symposium on Mixed and Augmented Reality, ISMAR 2002, Darmstadt, Germany, 1 October 2002. [Google Scholar]
- Genc, Y.; Riedel, S.; Souvannavong, F.; Akinlar, C.; Navab, N. Marker-less tracking for AR: A learning-based approach. In Proceedings of the International Symposium on Mixed and Augmented Reality, Darmstadt, Germany, 1 October 2002; pp. 295–304. [Google Scholar]
- Chen, P.; Peng, Z.; Li, D.; Yang, L. An improved augmented reality system based on AndAR. J. Vis. Commun. Image Represent. 2016, 37, 63–69. [Google Scholar] [CrossRef]
- Nistér, D. Preemptive RANSAC for live structure and motion estimation. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003. [Google Scholar]
- Pinies, P.; Lupton, T.; Sukkarieh, S.; Tardos, J.D. Inertial aiding of inverse depth SLAM using a monocular camera. In Proceedings of the IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007. [Google Scholar]
- Schon, T.; Karlsson, R.; Tornqvist, D.; Gustafsson, F. A framework for simultaneous localization and mapping utilizing model structure. In Proceedings of the International Conference on Information Fusion, Quebec, QC, Canada, 9–12 July 2007. [Google Scholar]
- Reitmayr, G.; Drummond, T. Going out: Robust model-based tracking for outdoor augmented reality. In Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, Santa Barbard, CA, USA, 22–25 October 2006. [Google Scholar]
- Hedau, V.; Hoiem, D.; Forsyth, D. Recovering the spatial layout of cluttered rooms. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2010. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Vandergheynst, P.; Ortiz, R.; Alahi, A. FREAK: Fast Retina Keypoint. In Proceedings of the Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- King, G.R.; Piekarski, W.; Thomas, B.H. ARVino—Outdoor Augmented Reality Visualisation of Viticulture GIS Data. In Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality, Vienna, Austria, 5–8 October 2005. [Google Scholar]
- Huang, W.; Sun, M.; Li, S. A 3D GIS-based interactive registration mechanism for outdoor augmented reality system. Expert Syst. Appl. 2016, 55, 48–58. [Google Scholar] [CrossRef]
- Lin, P.J.; Kao, C.C.; Lam, K.H.; Tsai, I.C. Design and Implementation of a Tourism System Using Mobile Augmented Reality and GIS Technologies; Springer International Publishing: Cham, Switzerland, 2014; pp. 1093–1099. [Google Scholar]
- Ferrai, V.; Tuytelaars, T.; van Gool, L. Markerless augmented reality with a real-time affine region tracker. In Proceedings of the IEEE and ACM International Symposium on Augmented Reality, New York, NY, USA, 29–30 October 2001; pp. 87–96. [Google Scholar]
- Thierry, M.; Fofi, D.; Gorria, P.; Salvi, J. Automatic texture mapping on real 3D model. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, USA, 17–22 June 2007; pp. 1–2. [Google Scholar]
- Skrypnyk, I.; Lowe, D.G. Scene modelling, recognition and tracking with invariant image features. In Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality, Arlington, VA, USA, 5 November 2004; pp. 110–119. [Google Scholar]
- Lowe, D.G.; Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bleser, G.; Stricker, D. Advanced tracking through efficient image processing and visual-inertial sensor fusion. In Proceedings of the Virtual Reality Conference, Reno, NE, USA, 8–12 March 2008; pp. 1920–1925. [Google Scholar]
- Singh, G.; Swan, J.E.; Jones, J.A.; Ellis, S.R. Depth judgments by reaching and matching in near-field augmented reality. In Proceedings of the IEEE Virtual Reality, Costa Mesa, CA, USA, 4–8 March 2012. [Google Scholar]
- Comport, A.I.; Marchand, E.; Pressigout, M.; Chaumette, F. Real-time markerless tracking for augmented reality: The virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 2006, 12, 615–628. [Google Scholar] [CrossRef] [PubMed]
- Tsochantaridis, I.; Joachims, T.; Hofmann, T.; Altun, Y. Large Margin Methods for Structured and Interdependent Output Variables. J. Mach. Learn. Res. 2005, 6, 1453–1484. [Google Scholar]
- Gupta, A.; Hebert, M.; Kanade, T.; Blei, D.M. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 1288–1296. [Google Scholar]
- Hedau, V.; Hoiem, D.; Forsyth, D. Recovering free space of indoor scenes from a single image. In Proceedings of the Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Fouhey, D.F.; Delaitre, V.; Gupta, A.; Efros, A.A.; Laptev, I.; Sivic, J. People watching: Human actions as a cue for single view geometry. Int. J. Comput. Vis. 2014, 110, 259–274. [Google Scholar] [CrossRef]
- Choi, W.; Chao, Y.W.; Pantofaru, C.; Savarese, S. Understanding Indoor Scenes Using 3D Geometric Phrases. In Proceedings of the Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Wang, H.; Gould, S.; Koller, D. Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding. Commun. ACM 2013, 56, 92–99. [Google Scholar] [CrossRef]
- Schwing, A.G.; Urtasun, R. Efficient Exact Inference for 3D Indoor Scene Understanding; Springer: Berlin/Heidelberg, Germany, 2012; pp. 299–313. [Google Scholar]
- Mallya, A.; Lazebnik, S. Learning Informative Edge Maps for Indoor Scene Layout Prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, Y.; Li, S.; Kuo, C.C.J. A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method. arXiv, 2016; arXiv:1607.00598. [Google Scholar]
- Dasgupta, S.; Fang, K.; Chen, K.; Savarese, S. DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Herbert, B.; Tuytelaars, T.; Gool, L.V. Surf: Speeded up robust features. In Proceedings of the Computer Vision—ECCV 2006, Graz, Austria, 7–13 May 2006; Volume 3951, pp. 404–417. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of the Computer Vision—ECCV, Heraklion, Greece, 5–11 September 2010; Volume 6314, pp. 778–792. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradsk, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PANI-8, 679–698. [Google Scholar] [CrossRef]
- Artoolkit. Available online: https://archive.artoolkit.org/ (accessed on 13 March 2018).
Tracking Technique | Advantage | Limitation |
---|---|---|
Sensor-based | No maintenance required and no range limit | Hard to apply in indoor environments |
Feature-based | Flexible and without pre-building 3D model | No exact position |
Marker-based | Simple to operate and easy to realize | Needs regular maintenance and suffers from limited range |
Model-based | Provides exact position | Requires costly model processing |
Average Computation Time | |
---|---|
Test 1 | 17 ms |
Test 2 | 18 ms |
Test 3 | 21 ms |
Test 4 | 17 ms |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, W.; Xiong, H.; Dai, X.; Zheng, X.; Zhou, Y. An Indoor Scene Recognition-Based 3D Registration Mechanism for Real-Time AR-GIS Visualization in Mobile Applications. ISPRS Int. J. Geo-Inf. 2018, 7, 112. https://doi.org/10.3390/ijgi7030112
Ma W, Xiong H, Dai X, Zheng X, Zhou Y. An Indoor Scene Recognition-Based 3D Registration Mechanism for Real-Time AR-GIS Visualization in Mobile Applications. ISPRS International Journal of Geo-Information. 2018; 7(3):112. https://doi.org/10.3390/ijgi7030112
Chicago/Turabian StyleMa, Wei, Hanjiang Xiong, Xuefeng Dai, Xianwei Zheng, and Yan Zhou. 2018. "An Indoor Scene Recognition-Based 3D Registration Mechanism for Real-Time AR-GIS Visualization in Mobile Applications" ISPRS International Journal of Geo-Information 7, no. 3: 112. https://doi.org/10.3390/ijgi7030112