Learning through interaction is a foundational principle in both human and animal learning. In a broad sense, intelligent agents can be formulated as goal-directed systems interacting with an uncertain environment. Despite the generality of this definition, a key challenge in computationally grounding it lies in how to effectively set up and represent goals and purposes. This dissertation explores this question through the lens of various machine learning paradigms.
We begin with the classical reinforcement learning formulation, which treats all goals as reward maximization. We demonstrate that, under a novel physics-inspired reward function, the agent can learn to solve rearrangement tasks with an awareness of physical concepts such as mass and friction.
Next, we move to the imitation learning setting to lift the limitations of scalar rewards. Specifically, we study the role of different forms of prompts in communicating task specifications to a decision transformer. We show that in most robotics applications, providing a demonstration as a prompt during deployment is often impractical. Instead, using an essential task parameter along with a learnable prompt can achieve comparable or even better generalization in a zero-shot manner.
Finally, we shift our focus from imparting human knowledge about goals to enabling the agent to discover goals purely from data. We show that a geometric abstraction of probabilistic world dynamics can be embedded into the representation space through asymmetric contrastive learning. The resulting embedding space allows for irreversible transitions and facilitates subgoal discovery and planning.