Video-Based Automatic Baby Motion Analysis for Early Neurological Disorder Diagnosis: State of the Art and Future Directions
Abstract
:1. Introduction
2. Taxonomy
3. Data Acquisition, Collection and Labelling
3.1. Acquisition/Recording Tools
3.2. Publicly Available Datasets
- Estimating the child pose;
- Comparing normative behaviours to those of monitored children in order to suggest further investigations;
- Recognizing atypical behaviours in order to directly get an NDD diagnosis.
4. Methods and Systems for Movement Assessment
4.1. Newborns
4.2. Infants
4.3. Toddlers
5. Recent Advances in Human Motion Analysis
- Motion feature extraction;
- Human pose estimation;
- Extraction significant motion segments/temporal action localization;
- Human image completion;
- Action recognition and action quality assessment;
- Humans-objects interaction prediction/understanding;
- Spatiotemporal video representation;
- Interpretablilty of involved AI models.
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Larsen, M.L.; Wiingreen, R.; Jensen, A.; Rackauskaite, G.; Laursen, B.; Hansen, B.M.; Hoei-Hansen, C.E.; Greisen, G. The effect of gestational age on major neurodevelopmental disorders in preterm infants. Pediatr. Res. 2021. [Google Scholar] [CrossRef]
- Hadders-Algra, M. Early Diagnostics and Early Intervention in Neurodevelopmental Disorders—Age-Dependent Challenges and Opportunities. J. Clin. Med. 2021, 10, 861. [Google Scholar] [CrossRef] [PubMed]
- Kundu, S.; Maurer, S.V.; Stevens, H.E. Future Horizons for Neurodevelopmental Disorders: Placental Mechanisms. Front. Pediatr. 2021, 9, 653230. [Google Scholar] [CrossRef] [PubMed]
- Einspieler, C.; Prechtl, H.F.; Ferrari, F.; Cioni, G.; Bos, A.F. The qualitative assessment of general movements in preterm, term and young infants—Review of the methodology. Early Hum. Dev. 1997, 50, 47–60. [Google Scholar] [CrossRef]
- Campbell, S.K.; Kolobe, T.H.; Osten, E.T.; Lenke, M.; Girolami, G.L. Construct validity of the test of infant motor performance. Phys. Ther. 1995, 75, 585–596. [Google Scholar] [CrossRef] [PubMed]
- Heineman, K.R.; Bos, A.F.; Hadders-Algra, M. The Infant Motor Profile: A standardized and qualitative method to assess motor behaviour in infancy. Dev. Med. Child Neurol. 2008, 50, 275–282. [Google Scholar] [CrossRef] [PubMed]
- Einspieler, C.; Prechtl, H.F. Prechtl’s assessment of general movements: A diagnostic tool for the functional assessment of the young nervous system. Ment. Retard. Dev. Disabil. Res. Rev. 2005, 11, 61–67. [Google Scholar] [CrossRef]
- Teitelbaum, P.; Teitelbaum, O.; Nye, J.; Fryman, J.; Maurer, R.G. Movement analysis in infancy may be useful for early diagnosis of autism. Proc. Natl. Acad. Sci. USA 1998, 95, 13982–13987. [Google Scholar] [CrossRef] [Green Version]
- Gurevitz, M.; Geva, R.; Varon, M.; Leitner, Y. Early markers in infants and toddlers for development of ADHD. J. Atten. Disord. 2014, 18, 14–22. [Google Scholar] [CrossRef]
- Jaspers, M.; de Winter, A.F.; Buitelaar, J.K.; Verhulst, F.C.; Reijneveld, S.A.; Hartman, C.A. Early childhood assessments of community pediatric professionals predict autism spectrum and attention deficit hyperactivity problems. J. Abnorm. Child Psychol. 2013, 41, 71–80. [Google Scholar] [CrossRef] [Green Version]
- Athanasiadou, A.; Buitelaar, J.; Brovedani, P.; Chorna, O.; Fulceri, F.; Guzzetta, A.; Scattoni, M.L. Early motor signs of attention-deficit hyperactivity disorder: A systematic review. Eur. Child Adolesc. Psychiatry 2020, 29, 903–916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Balter, L.J.; Wiwe Lipsker, C.; Wicksell, R.K.; Lekander, M. Neuropsychiatric Symptoms in Pediatric Chronic Pain and Outcome of Acceptance and Commitment Therapy. Front. Psychol. 2021, 12, 836. [Google Scholar] [CrossRef] [PubMed]
- Micai, M.; Fulceri, F.; Caruso, A.; Guzzetta, A.; Gila, L.; Scattoni, M.L. Early behavioral markers for neurodevelopmental disorders in the first 3 years of life: An overview of systematic reviews. Neurosci. Biobehav. Rev. 2020, 116, 183–201. [Google Scholar] [CrossRef] [PubMed]
- Peyton, C.; Pascal, A.; Boswell, L.; DeRegnier, R.; Fjørtoft, T.; Støen, R.; Adde, L. Inter-observer reliability using the General Movement Assessment is influenced by rater experience. Early Hum. Dev. 2021, 161, 105436. [Google Scholar] [CrossRef]
- Irshad, M.T.; Nisar, M.A.; Gouverneur, P.; Rapp, M.; Grzegorzek, M. Ai approaches towards Prechtl’s assessment of general movements: A systematic literature review. Sensors 2020, 20, 5321. [Google Scholar] [CrossRef]
- Wilson, R.B.; Vangala, S.; Elashoff, D.; Safari, T.; Smith, B.A. Using Wearable Sensor Technology to Measure Motion Complexity in Infants at High Familial Risk for Autism Spectrum Disorder. Sensors 2021, 21, 616. [Google Scholar] [CrossRef]
- Ghazi, M.A.; Ding, L.; Fagg, A.H.; Kolobe, T.H.; Miller, D.P. Vision-based motion capture system for tracking crawling motions of infants. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, 6–9 August 2017; pp. 1549–1555. [Google Scholar]
- Marcroft, C.; Khan, A.; Embleton, N.D.; Trenell, M.; Plötz, T. Movement recognition technology as a method of assessing spontaneous general movements in high risk infants. Front. Neurol. 2015, 5, 284. [Google Scholar] [CrossRef]
- Cabon, S.; Porée, F.; Simon, A.; Rosec, O.; Pladys, P.; Carrault, G. Video and audio processing in paediatrics: A review. Physiol. Meas. 2019, 40, 02TR02. [Google Scholar] [CrossRef]
- Cattani, L.; Alinovi, D.; Ferrari, G.; Raheli, R.; Pavlidis, E.; Spagnoli, C.; Pisani, F. Monitoring infants by automatic video processing: A unified approach to motion analysis. Comput. Biol. Med. 2017, 80, 158–165. [Google Scholar] [CrossRef]
- Sun, Y.; Kommers, D.; Wang, W.; Joshi, R.; Shan, C.; Tan, T.; Aarts, R.M.; van Pul, C.; Andriessen, P.; de With, P.H. Automatic and continuous discomfort detection for premature infants in a NICU using video-based motion analysis. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 5995–5999. [Google Scholar]
- Jorge, J.; Villarroel, M.; Chaichulee, S.; Guazzi, A.; Davis, S.; Green, G.; McCormick, K.; Tarassenko, L. Non-contact monitoring of respiration in the neonatal intensive care unit. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 286–293. [Google Scholar]
- Lorato, I.; Stuijk, S.; Meftah, M.; Kommers, D.; Andriessen, P.; van Pul, C.; de Haan, G. Towards Continuous Camera-Based Respiration Monitoring in Infants. Sensors 2021, 21, 2268. [Google Scholar] [CrossRef]
- Nagy, Á.; Földesy, P.; Jánoki, I.; Terbe, D.; Siket, M.; Szabó, M.; Varga, J.; Zarándy, Á. Continuous Camera-Based Premature-Infant Monitoring Algorithms for NICU. Appl. Sci. 2021, 11, 7215. [Google Scholar] [CrossRef]
- Chaurasia, S.K.; Reddy, S. State-of-the-art survey on activity recognition and classification using smartphones and wearable sensors. Multimed. Tools Appl. 2021, 81, 1–32. [Google Scholar] [CrossRef]
- Leo, M.; Farinella, G.M. Computer Vision for Assistive Healthcare; Academic Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Bouchabou, D.; Nguyen, S.M.; Lohr, C.; LeDuc, B.; Kanellos, I. A survey of human activity recognition in smart homes based on IoT sensors algorithms: Taxonomies, challenges, and opportunities with deep learning. Sensors 2021, 21, 6037. [Google Scholar] [CrossRef] [PubMed]
- Silva, N.; Zhang, D.; Kulvicius, T.; Gail, A.; Barreiros, C.; Lindstaedt, S.; Kraft, M.; Bölte, S.; Poustka, L.; Nielsen-Saines, K.; et al. The future of General Movement Assessment: The role of computer vision and machine learning—A scoping review. Res. Dev. Disabil. 2021, 110, 103854. [Google Scholar] [CrossRef] [PubMed]
- Redd, C.B.; Karunanithi, M.; Boyd, R.N.; Barber, L.A. Technology-assisted quantification of movement to predict infants at high risk of motor disability: A systematic review. Res. Dev. Disabil. 2021, 118, 104071. [Google Scholar] [CrossRef]
- Raghuram, K.; Orlandi, S.; Church, P.; Chau, T.; Uleryk, E.; Pechlivanoglou, P.; Shah, V. Automated movement recognition to predict motor impairment in high-risk infants: A systematic review of diagnostic test accuracy and meta-analysis. Dev. Med. Child Neurol. 2021, 63, 637–648. [Google Scholar] [CrossRef]
- Rahman, M.; Usman, O.L.; Muniyandi, R.C.; Sahran, S.; Mohamed, S.; Razak, R.A. A Review of machine learning methods of feature selection and classification for autism spectrum disorder. Brain Sci. 2020, 10, 949. [Google Scholar] [CrossRef]
- Orlandi, S.; Guzzetta, A.; Bandini, A.; Belmonti, V.; Barbagallo, S.D.; Tealdi, G.; Mazzotti, S.; Scattoni, M.L.; Manfredi, C. AVIM—A contactless system for infant data acquisition and analysis: Software architecture and first results. Biomed. Signal Process. Control 2015, 20, 85–99. [Google Scholar] [CrossRef]
- Kanemaru, N.; Watanabe, H.; Kihara, H.; Nakano, H.; Takaya, R.; Nakamura, T.; Nakano, J.; Taga, G.; Konishi, Y. Specific characteristics of spontaneous movements in preterm infants at term age are associated with developmental delays at age 3 years. Dev. Med. Child Neurol. 2013, 55, 713–721. [Google Scholar] [CrossRef]
- Baccinelli, W.; Bulgheroni, M.; Simonetti, V.; Fulceri, F.; Caruso, A.; Gila, L.; Scattoni, M.L. Movidea: A Software Package for Automatic Video Analysis of Movements in Infants at Risk for Neurodevelopmental Disorders. Brain Sci. 2020, 10, 203. [Google Scholar] [CrossRef] [Green Version]
- Tomasi, C.; Kanade, T. Detection and Tracking of Point Features. Int. J. Comput. Vis. 1991, 9, 137–154. [Google Scholar] [CrossRef]
- Caruso, A.; Gila, L.; Fulceri, F.; Salvitti, T.; Micai, M.; Baccinelli, W.; Bulgheroni, M.; Scattoni, M.L. Early Motor Development Predicts Clinical Outcomes of Siblings at High-Risk for Autism: Insight from an Innovative Motion-Tracking Technology. Brain Sci. 2020, 10, 379. [Google Scholar] [CrossRef] [PubMed]
- Migliorelli, L.; Moccia, S.; Pietrini, R.; Carnielli, V.P.; Frontoni, E. The babyPose dataset. Data Brief 2020, 33, 106329. [Google Scholar] [CrossRef]
- Hesse, N.; Bodensteiner, C.; Arens, M.; Hofmann, U.G.; Weinberger, R.; Sebastian Schroeder, A. Computer vision for medical infant motion analysis: State of the art and rgb-d data set. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Hesse, N.; Pujades, S.; Black, M.J.; Arens, M.; Hofmann, U.G.; Schroeder, A.S. Learning and tracking the 3D body shape of freely moving infants from RGB-D sequences. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2540–2551. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schroeder, A.S.; Hesse, N.; Weinberger, R.; Tacke, U.; Gerstl, L.; Hilgendorff, A.; Heinen, F.; Arens, M.; Dijkstra, L.J.; Rocamora, S.P.; et al. General Movement Assessment from videos of computed 3D infant body models is equally effective compared to conventional RGB video rating. Early Hum. Dev. 2020, 144, 104967. [Google Scholar] [CrossRef]
- Huang, X.; Fu, N.; Liu, S.; Vyas, K.; Farnoosh, A.; Ostadabbas, S. Invariant representation learning for infant pose estimation with small data. arXiv 2020, arXiv:2010.06100. [Google Scholar]
- Chambers, C.; Seethapathi, N.; Saluja, R.; Loeb, H.; Pierce, S.R.; Bogen, D.K.; Prosser, L.; Johnson, M.J.; Kording, K.P. Computer vision to automatically assess infant neuromotor risk. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2431–2442. [Google Scholar] [CrossRef] [PubMed]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef] [Green Version]
- Rajagopalan, S.; Dhall, A.; Goecke, R. Self-stimulatory behaviours in the wild for autism diagnosis. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 755–761. [Google Scholar]
- Rehg, J.; Abowd, G.; Rozga, A.; Romero, M.; Clements, M.; Sclaroff, S.; Essa, I.; Ousley, O.; Li, Y.; Kim, C.; et al. Decoding children’s social behavior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3414–3421. [Google Scholar]
- Tariq, Q.; Daniels, J.; Schwartz, J.N.; Washington, P.; Kalantarian, H.; Wall, D.P. Mobile detection of autism through machine learning on home video: A development and prospective validation study. PLoS Med. 2018, 15, e1002705. [Google Scholar] [CrossRef] [Green Version]
- Billing, A.E. DREAM: Development of Robot-Enhanced Therapy for Children with Autism Spectrum Disorders. EU-FP7 Grant 611391. 2019. Available online: https://github.com/dream2020/data (accessed on 20 January 2022).
- Rihawi, O.; Merad, D.; Damoiseaux, J.L. 3D-AD: 3D-autism dataset for repetitive behaviours with kinect sensor. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Prechtl, H.F. Qualitative changes of spontaneous movements in fetus and preterm infant are a marker of neurological dysfunction. Early Hum. Dev. 1990. [Google Scholar] [CrossRef]
- Beccaria, E.; Martino, M.; Briatore, E.; Podestà, B.; Pomero, G.; Micciolo, R.; Espa, G.; Calzolari, S. Poor repertoire General Movements predict some aspects of development outcome at 2 years in very preterm infants. Early Hum. Dev. 2012, 88, 393–396. [Google Scholar] [CrossRef]
- Einspieler, C.; Marschik, P.B.; Bos, A.F.; Ferrari, F.; Cioni, G.; Prechtl, H.F. Early markers for cerebral palsy: Insights from the assessment of general movements. Future Neurol. 2012, 7, 709–717. [Google Scholar] [CrossRef] [Green Version]
- Einspieler, C.; Marschik, P.B.; Pansy, J.; Scheuchenegger, A.; Krieber, M.; Yang, H.; Kornacka, M.K.; Rowinska, E.; Soloveichick, M.; Bos, A.F. The general movement optimality score: A detailed assessment of general movements during preterm and term age. Dev. Med. Child Neurol. 2016, 58, 361–368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hesse, N.; Stachowiak, G.; Breuer, T.; Arens, M. Estimating body pose of infants in depth images using random ferns. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 7–13 December 2015; pp. 35–43. [Google Scholar]
- Hesse, N.; Schröder, A.S.; Müller-Felber, W.; Bodensteiner, C.; Arens, M.; Hofmann, U.G. Body pose estimation in depth images for infant motion analysis. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju, Korea, 11–15 July 2017; pp. 1909–1912. [Google Scholar]
- Khan, M.H.; Schneider, M.; Farid, M.S.; Grzegorzek, M. Detection of infantile movement disorders in video data using deformable part-based model. Sensors 2018, 18, 3202. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barry, M.J. Physical therapy interventions for patients with movement disorders due to cerebral palsy. J. Child Neurol. 1996, 11, S51–S60. [Google Scholar] [CrossRef] [PubMed]
- Marschik, P.B.; Pokorny, F.B.; Peharz, R.; Zhang, D.; O’Muircheartaigh, J.; Roeyers, H.; Bölte, S.; Spittle, A.J.; Urlesberger, B.; Schuller, B.; et al. A novel way to measure and predict development: A heuristic approach to facilitate the early detection of neurodevelopmental disorders. Curr. Neurol. Neurosci. Rep. 2017, 17, 43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Doroniewicz, I.; Ledwoń, D.J.; Affanasowicz, A.; Kieszczyńska, K.; Latos, D.; Matyja, M.; Mitas, A.W.; Myśliwiec, A. Writhing movement detection in newborns on the second and third day of life using pose-based feature machine learning classification. Sensors 2020, 20, 5986. [Google Scholar] [CrossRef]
- Moccia, S.; Migliorelli, L.; Carnielli, V.; Frontoni, E. Preterm infants’ pose estimation with spatio-temporal features. IEEE Trans. Biomed. Eng. 2019, 67, 2370–2380. [Google Scholar] [CrossRef]
- McCay, K.D.; Ho, E.S.L.; Shum, H.P.H.; Fehringer, G.; Marcroft, C.; Embleton, N.D. Abnormal infant movements classification with deep learning on pose-based features. IEEE Access 2020, 8, 51582–51592. [Google Scholar] [CrossRef]
- berg, G.K.; Jacobsen, B.K.; Jørgensen, L. Predictive value of general movement assessment for cerebral palsy in routine clinical practice. Phys. Ther. 2015, 95, 1489–1495. [Google Scholar]
- Tsuji, T.; Nakashima, S.; Hayashi, H.; Soh, Z.; Furui, A.; Shibanoki, T.; Shima, K.; Shimatani, K. Markerless Measurement and evaluation of General Movements in infants. Sci. Rep. 2020, 10, 1422. [Google Scholar] [CrossRef] [PubMed]
- Reich, S.; Zhang, D.; Kulvicius, T.; Bölte, S.; Nielsen-Saines, K.; Pokorny, F.B.; Peharz, R.; Poustka, L.; Wörgötter, F.; Einspieler, C.; et al. Novel AI driven approach to classify infant motor functions. Sci. Rep. 2021, 11, 9888. [Google Scholar] [CrossRef] [PubMed]
- Ihlen, E.A.; Støen, R.; Boswell, L.; de Regnier, R.A.; Fjørtoft, T.; Gaebler-Spira, D.; Labori, C.; Loennecken, M.C.; Msall, M.E.; Möinichen, U.I.; et al. Machine learning of infant spontaneous movements for the early prediction of cerebral palsy: A multi-site cohort study. J. Clin. Med. 2020, 9, 5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Leo, M.; Looney, D.; D’Orazio, T.; Mandic, D.P. Identification of defective areas in composite materials by bivariate EMD analysis of ultrasound. IEEE Trans. Instrum. Meas. 2011, 61, 221–232. [Google Scholar] [CrossRef]
- Schmidt, W.; Regan, M.; Fahey, M.; Paplinski, A. General movement assessment by machine learning: Why is it so difficult. J. Med. Artif. Intell 2019, 2. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Adde, L.; Brown, A.; Van Den Broeck, C.; DeCoen, K.; Eriksen, B.H.; Fjørtoft, T.; Groos, D.; Ihlen, E.A.F.; Osland, S.; Pascal, A.; et al. In-Motion-App for remote General Movement Assessment: A multi-site observational study. BMJ Open 2021, 11, e042147. [Google Scholar] [CrossRef]
- Sakkos, D.; Mccay, K.D.; Marcroft, C.; Embleton, N.D.; Chattopadhyay, S.; Ho, E.S. Identification of Abnormal Movements in Infants: A Deep Neural Network for Body Part-Based Prediction of Cerebral Palsy. IEEE Access 2021, 9, 94281–94292. [Google Scholar] [CrossRef]
- Zamzmi, G.; Kasturi, R.; Goldgof, D.; Zhi, R.; Ashmeade, T.; Sun, Y. A review of automated pain assessment in infants: Features, classification tasks, and databases. IEEE Rev. Biomed. Eng. 2017, 11, 77–96. [Google Scholar] [CrossRef]
- Zamzmi, G.; Pai, C.Y.; Goldgof, D.; Kasturi, R.; Sun, Y.; Ashmeade, T. Automated pain assessment in neonates. In Scandinavian Conference on Image Analysis; Springer: Berlin/Heidelberg, Germany, 2017; pp. 350–361. [Google Scholar]
- Pacheco, C.; Mavroudi, E.; Kokkoni, E.; Tanner, H.G.; Vidal, R. A Detection-based Approach to Multiview Action Classification in Infants. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6112–6119. [Google Scholar]
- Tariq, Q.; Fleming, S.L.; Schwartz, J.N.; Dunlap, K.; Corbin, C.; Washington, P.; Kalantarian, H.; Khan, N.Z.; Darmstadt, G.L.; Wall, D.P. Detecting developmental delay and autism through machine learning models using home videos of Bangladeshi children: Development and validation study. J. Med. Internet Res. 2019, 21, e13822. [Google Scholar] [CrossRef] [PubMed]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- Vyas, K.; Ma, R.; Rezaei, B.; Liu, S.; Neubauer, M.; Ploetz, T.; Oberleitner, R.; Ostadabbas, S. Recognition of atypical behavior in autism diagnosis from video using pose estimation over time. In Proceedings of the 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Pittsburgh, PA, USA, 13–16 October 2019; pp. 1–6. [Google Scholar]
- Washington, P.; Kline, A.; Mutlu, O.C.; Leblanc, E.; Hou, C.; Stockham, N.; Paskov, K.; Chrisman, B.; Wall, D. Activity Recognition with Moving Cameras and Few Training Examples: Applications for Detection of Autism-Related Headbanging. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; Association for Computing Machinery: New York, NY, USA; pp. 1–7.
- Negin, F.; Ozyer, B.; Agahian, S.; Kacdioglu, S.; Ozyer, G.T. Vision-assisted recognition of stereotype behaviors for early diagnosis of Autism Spectrum Disorders. Neurocomputing 2021, 446, 145–155. [Google Scholar] [CrossRef]
- Nabil, M.A.; Akram, A.; Fathalla, K.M. Applying machine learning on home videos for remote autism diagnosis: Further study and analysis. Health Inform. J. 2021, 27, 1460458221991882. [Google Scholar] [CrossRef]
- Gamra, M.B.; Akhloufi, M.A. A review of deep learning techniques for 2D and 3D human pose estimation. Image Vis. Comput. 2021, 114, 104282. [Google Scholar] [CrossRef]
- Kwon, H.; Kim, M.; Kwak, S.; Cho, M. Learning self-similarity in space and time as generalized motion for video action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 13065–13075. [Google Scholar]
- Kwon, H.; Kim, M.; Kwak, S.; Cho, M. Motionsqueeze: Neural motion feature learning for video understanding. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 345–362. [Google Scholar]
- Geng, Z.; Sun, K.; Xiao, B.; Zhang, Z.; Wang, J. Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14676–14686. [Google Scholar]
- Li, K.; Wang, S.; Zhang, X.; Xu, Y.; Xu, W.; Tu, Z. Pose Recognition with Cascade Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1944–1953. [Google Scholar]
- Wang, J.; Jin, S.; Liu, W.; Liu, W.; Qian, C.; Luo, P. When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11855–11864. [Google Scholar]
- Zhao, Z.; Liu, W.; Xu, Y.; Chen, X.; Luo, W.; Jin, L.; Zhu, B.; Liu, T.; Zhao, B.; Gao, S. Prior Based Human Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7951–7961. [Google Scholar]
- Li, C.; Xie, C.; Zhang, B.; Han, J.; Zhen, X.; Chen, J. Memory attention networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.F.R.; Panda, R.; Ramakrishnan, K.; Feris, R.; Cohn, J.; Oliva, A.; Fan, Q. Deep analysis of cnn-based spatio-temporal representations for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6165–6175. [Google Scholar]
- Sciortino, G.; Farinella, G.M.; Battiato, S.; Leo, M.; Distante, C. On the estimation of children’s poses. In Proceedings of the International Conference on Image Analysis and Processing, Catania, Italy, 11–15 September 2017; Springer: Berlin/Heidelberg, Germany; pp. 410–421. [Google Scholar]
- Wang, J.; Shao, Z.; Huang, X.; Lu, T.; Zhang, R.; Lv, X. Spatial–temporal pooling for action recognition in videos. Neurocomputing 2021, 451, 265–278. [Google Scholar] [CrossRef]
- Bilal, M.; Maqsood, M.; Yasmin, S.; Hasan, N.U.; Rho, S. A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes. J. Supercomput. 2021, 1–36. [Google Scholar] [CrossRef]
- Hong, C.; Yu, J.; Wan, J.; Tao, D.; Wang, M. Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 2015, 24, 5659–5670. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Wang, X.; Zhang, S.; Qing, Z.; Shao, Y.; Zuo, Z.; Gao, C.; Sang, N. OadTR: Online Action Detection with Transformers. arXiv 2021, arXiv:2106.11149. [Google Scholar]
- Zhang, Y.; Li, X.; Liu, C.; Shuai, B.; Zhu, Y.; Brattoli, B.; Chen, H.; Marsic, I.; Tighe, J. Vidtr: Video transformer without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 13577–13587. [Google Scholar]
- Neimark, D.; Bar, O.; Zohar, M.; Asselmann, D. Video transformer network. arXiv 2021, arXiv:2102.00719. [Google Scholar]
- Fan, H.; Xiong, B.; Mangalam, K.; Li, Y.; Yan, Z.; Malik, J.; Feichtenhofer, C. Multiscale vision transformers. arXiv 2021, arXiv:2104.11227. [Google Scholar]
- Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. arXiv 2021, arXiv:2103.15691. [Google Scholar]
- Shi, B.; Dai, Q.; Hoffman, J.; Saenko, K.; Darrell, T.; Xu, H. Temporal Action Detection with Multi-level Supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 8022–8032. [Google Scholar]
- Bao, W.; Yu, Q.; Kong, Y. Evidential Deep Learning for Open Set Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 13349–13358. [Google Scholar]
- Chen, T.; Kornblith, S.; Swersky, K.; Norouzi, M.; Hinton, G. Big self-supervised models are strong semi-supervised learners. arXiv 2020, arXiv:2006.10029. [Google Scholar]
- Yu, X.; Rao, Y.; Zhao, W.; Lu, J.; Zhou, J. Group-aware Contrastive Regression for Action Quality Assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 7919–7928. [Google Scholar]
- Li, D.; Qiu, Z.; Pan, Y.; Yao, T.; Li, H.; Mei, T. Representing Videos As Discriminative Sub-Graphs for Action Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3310–3319. [Google Scholar]
- Hu, K.; Shao, J.; Liu, Y.; Raj, B.; Savvides, M.; Shen, Z. Contrast and Order Representations for Video Self-Supervised Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 7939–7949. [Google Scholar]
- Huang, D.; Wu, W.; Hu, W.; Liu, X.; He, D.; Wu, Z.; Wu, X.; Tan, M.; Ding, E. ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency. arXiv 2021, arXiv:2106.02342. [Google Scholar]
- Wang, L.; Tong, Z.; Ji, B.; Wu, G. TDN: Temporal difference networks for efficient action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1895–1904. [Google Scholar]
- Kulal, S.; Mao, J.; Aiken, A.; Wu, J. Hierarchical Motion Understanding via Motion Programs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6568–6576. [Google Scholar]
- Lin, C.; Xu, C.; Luo, D.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Fu, Y. Learning Salient Boundary Feature for Anchor-free Temporal Action Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3320–3329. [Google Scholar]
- Li, J.; Todorovic, S. Anchor-Constrained Viterbi for Set-Supervised Action Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9806–9815. [Google Scholar]
- Luo, W.; Zhang, T.; Yang, W.; Liu, J.; Mei, T.; Wu, F.; Zhang, Y. Action Unit Memory Network for Weakly Supervised Temporal Action Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9969–9979. [Google Scholar]
- Tirupattur, P.; Duarte, K.; Rawat, Y.S.; Shah, M. Modeling Multi-Label Action Dependencies for Temporal Action Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1460–1470. [Google Scholar]
- Wang, S.; Yap, K.H.; Ding, H.; Wu, J.; Yuan, J.; Tan, Y.P. Discovering human interactions with large-vocabulary objects via query and multi-scale detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 13475–13484. [Google Scholar]
- Morais, R.; Le, V.; Venkatesh, S.; Tran, T. Learning Asynchronous and Sparse Human-Object Interaction in Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16041–16050. [Google Scholar]
- Bai, X.; Wang, X.; Liu, X.; Liu, Q.; Song, J.; Sebe, N.; Kim, B. Explainable Deep Learning for Efficient and Robust Pattern Recognition: A Survey of Recent Developments. Pattern Recognit. 2021, 120, 108102. [Google Scholar] [CrossRef]
- Li, J.; Todorovic, S. Action Shuffle Alternating Learning for Unsupervised Action Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12628–12636. [Google Scholar]
- Ozuysal, M.; Fua, P.; Lepetit, V. Fast keypoint recognition in ten lines of code. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Database | Contents | Frame Size | Age Range | Info | Frames | Labels |
---|---|---|---|---|---|---|
BabyPose [37] | 16 Videos | 640 × 480 | N | Depth 8 bit/16 bit | 16,000 | 12 Body Landmarks |
MINI-RGBD [38] | 12 Videos | 640 × 480 | I | RGB/D | 12,000 | 25 Body Landmarks |
SyRIP [41] | Images | Misc | I | RGB | 2000 | 17 Body Landmarks |
Dataset [42] | 85 Youtube Video URLs | Misc | I | RGB | NA | 18 Body Landmarks |
SSBD [44] | 75 Youtube Video URLs | Misc | NA | RGB | U | Behaviors |
MMDB [45] | 160 Videos | Misc | T | Multimodal | U | ASD Diagnosis |
Tariq [46] | 162 Videos | Misc | T | RGB | U | Behaviors |
DREAM [47] | 3121 Videos | NA | T | Depth | NA | 3D Skeleton Gaze ADOS scores |
3d-AD [48] | 100 Videos | 512 × 424 | T | Depth | U | Behaviors |
Work | Setup | Input | CV/Ml Task | Clinical Scope |
---|---|---|---|---|
[53,54] | Hospital | Depth | Pose estimation by Keypoints recognition | General |
[59] | NICU | Depth | Limb Pose by 2 CNN 2CNN (detection + regression) | General |
[55] | Hospital | RGB | Deformable part models | General |
[57] | Hospital | Multimodal | Optical Flow + audio features Logistic regression | Normal/Abnormal |
[58] | Hospital | RGB | Limb Motion Description by SVM, RF, LDA | WM vs. PR |
[60] | NA | Synthetic | Histograms + CNN | Normal/Abnormal |
Work | Setup | Input | Method | Classification Goal |
---|---|---|---|---|
[62] | Home/Hospital | RGB | Motion Feature + Gaussian mixture network | 4 type of mov. WMs/FMs/PR/CS |
[64] | Hospital | RGB | Motion + MEMD + HT + Decision Tree | CP risk |
[63] | Hospital | RGB | OpenPose+NN | FMs |
[72] | Treatment Center | RGB | Amount of Motion | Pain Level |
[66] | Home/Hospital | RGB | VGG9+LSTM | FMs |
[70] | Home/Hospital | RGB | OpenPose+LSTM | FMs |
[69] | Home | Smartphone | CIMA-Pose | CP risk |
Work | Setup | Input | Method | Goal |
---|---|---|---|---|
[74] | domestic (Tariq dataset) | RGB | Random Forests | Typical/Atypical |
[73] | Rehabilitation Environment | Multiview RGB | Faster R-CNN + LSTM + learnable fusion coefficients | 4 daily actions |
[76] | Domestic | RGB | 2D Mask R-CNN + particle filter +CNN classifier | Atypical/Typical Trajectories |
[78] | Domestic | RGB (from YouTube) | YOLOv3 + HOF + K-means K-means + MLP | 4 repetitive Actions |
[77] | Domestic (SSBD dataset) | RGB | CNN + LSTM | ASD/Typical |
[79] | domestic (Tariq dataset) | RGB | Various regressors Classifiers | ASD Features Rating |
Work | Improved Task | Key Contribution |
---|---|---|
[81] | Motion Features Extraction | Spatiotemporal self-similarity |
[82] | Motion Features Extraction | MotionSqueeze module |
[83] | Pose Estimation (Key points Positioning) | Multi-branch regression |
[84] | Pose Estimation (Key points Positioning) | Cascade Transformers |
[85] | Pose Estimation (Key points Positioning) | Adversarial algorithms |
[86] | Human Completion | Topological Structure/Memory Bank |
[87] | Skeleton-Based Action Recognition | Memory Attention Networks |
[90] | Action Recognition | Temporal-Spatial pooling block |
[91] | Action Recognition | CNN+Autoencoder+LSTM |
[101] | Action Recognition | Contrastive Learning |
[100] | Action Recognition | Semi-supervised Action Detection |
[95,96,97,98,99] | Action Classification | Transformers |
[103] | Action Quality Assessment | Contrastive Regression |
[104] | video representation | Space-Time Graph |
[105,106] | Video Representation | Self-supervised learning |
[107] | Temporal Modeling | Two-level Motion Modeling |
[108] | Motion Segment Extraction | Hierarchical Framework |
[109] | Temporal Action Localization | E2E anchor free method |
[110] | Temporal Action Localization | Anchor-Constrained Viterbi |
[111] | Temporal Action Localization | Memory Network |
[112] | Temporal Action Localization | Multi-Label Action Dependency layer |
[113] | Human Object Interaction | Transformer /Cascade detector |
[114] | Human Object Interaction | Graph Networks |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Leo, M.; Bernava, G.M.; Carcagnì, P.; Distante, C. Video-Based Automatic Baby Motion Analysis for Early Neurological Disorder Diagnosis: State of the Art and Future Directions. Sensors 2022, 22, 866. https://doi.org/10.3390/s22030866
Leo M, Bernava GM, Carcagnì P, Distante C. Video-Based Automatic Baby Motion Analysis for Early Neurological Disorder Diagnosis: State of the Art and Future Directions. Sensors. 2022; 22(3):866. https://doi.org/10.3390/s22030866
Chicago/Turabian StyleLeo, Marco, Giuseppe Massimo Bernava, Pierluigi Carcagnì, and Cosimo Distante. 2022. "Video-Based Automatic Baby Motion Analysis for Early Neurological Disorder Diagnosis: State of the Art and Future Directions" Sensors 22, no. 3: 866. https://doi.org/10.3390/s22030866