Learning a DFT-based sequence with reinforcement learning: a NAO implementation

Boris Durán; Gauss Lee; Robert Lowe

doi:10.2478/s13230-013-0109-5

Open Access Published by De Gruyter Open Access April 15, 2013

Learning a DFT-based sequence with reinforcement learning: a NAO implementation

Boris Durán , Gauss Lee and Robert Lowe

From the journal Paladyn, Journal of Behavioral Robotics

https://doi.org/10.2478/s13230-013-0109-5

Abstract

The implementation of sequence learning in robotic platforms offers several challenges. Deciding when to stop one action and continue to the next requires a balance between stability of sensory information and, of course, the knowledge about what action is required next. The work presented here proposes a starting point for the successful execution and learning of dynamic sequences. Making use of the NAO humanoid platform we propose a mathematical model based on dynamic field theory and reinforcement learning methods for obtaining and performing a sequence of elementary motor behaviors. Results from the comparison of two reinforcement learning methods applied to sequence generation, for both simulation and implementation, are provided.

Keywords: sequences; neural dynamics; reinforcement learning; humanoid

References

[1] S. Amari, “Dynamics of pattern formation in lateral-inhibition type neural fields,” Biological Cybernetics, vol. 27, pp. 77–87, 1977.10.1007/BF00337259Search in Google Scholar PubMed

[2] G. Schöner, Cambridge Handbook of Computational Cognitive Modeling. R. Sun, UK: Cambridge University Press, 2008, ch. Dynamical systems approaches to cognition, pp. 101–126.10.1017/CBO9780511816772.007Search in Google Scholar

[3] E. Bicho, P. Mallet, and G. Schoner, “Target representation on an autonomous vehicle with low-level sensors.” The International Journal of Robotics Research, vol. 19, no. 5, pp. 424–447, May 2000. [Online]. Available: http://dx.doi.org/10.1177/02783640022066950Search in Google Scholar

[4] W. Erlhagen, A. Mukovskiy, F. Chersi, and E. Bicho, “On the development of intention understanding for joint action tasks,” 2007.10.1109/DEVLRN.2007.4354022Search in Google Scholar

[5] Y. Sandamirskaya and G. Schöner, “An embodied account of serial order: How instabilities drive sequence generation,” Neural Networks, vol. 23, no. 10, pp. 1164–1179, December 2010.Search in Google Scholar

[6] Y. Sandamirskaya, M. Richter, and G. Schöner, “Neural dynamics of sequence generation and behavioral organization,” in Front. Comput. Neurosci.: Computational Neuroscience & Neurotechnology Bernstein Conference & Neurex Annual Meeting, BC11, no. 0, 2011.Search in Google Scholar

[7] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press, Mar. 1998. [Online]. Available: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0262193981Search in Google Scholar

[8] R. E. Suri and W. Schultz, “Learning of sequential movements by neural network model with dopamine-like reinforcement signal,” Experimental Brain Research, vol. 121, pp. 350–354, 1998, 10.1007/s002210050467. [Online]. Available: http://dx.doi.org/10.1007/s002210050467Search in Google Scholar PubMed

[9] J. Modayil, A. White, and R. S. Sutton, “Multi-timescale nexting in a reinforcement learning robot,” CoRR, vol. abs/1112.1133, 2011.Search in Google Scholar

[10] Y. Sandamirskaya and G. Schöner, “Serial order in an acting system: a multidimensional dynamic neural fields implementation,” in Development and Learning, 2010. ICDL 2010. 9th IEEE International Conference on, 2010.10.1109/DEVLRN.2010.5578834Search in Google Scholar

[11] Y. Niv, “Reinforcement learning in the brain,” Journal of Mathematical Psychology, vol. 53, no. 3, pp. 139–154, 2009. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S002224960800118110.1016/j.jmp.2008.12.005Search in Google Scholar

[12] M. Wiering and M. van Otterlo, Reinforcement Learning: State-Of-the-Art, ser. Adaptation, Learning, and Optimization. Springer, 2012. [Online]. Available: http://books.google.com/books?id=YPjNuvrJR0MC10.1007/978-3-642-27645-3Search in Google Scholar

[13] E. Thelen and L. Smith, Dynamic Systems Approach to the Develop, ser. The MIT Press/Bradford Books series in cognitive psychology. Mit Press, 1996. [Online]. Available: http://books.google.com/books?id=kBslxoe0TekCSearch in Google Scholar

[14] J. K. O’Regan and A. Noë, “A sensorimotor account of vision and visual consciousness.” The Behavioral and brain sciences, vol. 24, no. 5, Oct. 2001. [Online]. Available: http://view.ncbi.nlm.nih.gov/pubmed/1223989210.1017/S0140525X01000115Search in Google Scholar

[15] S. Kazerounian, M. D. Luciw, M. Richter, and Y. Sandamirskaya, “Autonomous reinforcement of behavioral sequences in neural dynamics,” CoRR, vol. abs/1210.3569, 2012.10.1109/DevLrn.2012.6400831Search in Google Scholar

Received: 2012-12-18

Accepted: 2013-3-27

Published Online: 2013-4-15

Published in Print: 2012-12-1

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Learning a DFT-based sequence with reinforcement learning: a NAO implementation

Abstract

References

Journal and Issue

Articles in the same Issue