Jun 16, 2020 · This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular ...
We first write the total regret as expected sum over sub-optimality gaps appearing in the whole learning process, then use the estimation error of Q-function ...
This paper presents the first non-asymptotic result showing a model-free algorithm can achieve logarithmic cumulative regret for episodic tabular ...
This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular ...
People also ask
What are the weakness of Q-learning?
What are the limitations of Q-learning?
Why is Q-learning considered off-policy?
Why is Q-learning unstable?
Jun 16, 2020 · This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for ...
People also search for
Sep 10, 2024 · This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for ...
Year · Q-learning with logarithmic regret. K Yang, L Yang, S Du. International Conference on Artificial Intelligence and Statistics, 1576-1584, 2021. 75, 2021.
Mar 3, 2022 · In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting.
Missing: Q- | Show results with:Q-
In this paper, we study the problem of regret minimization for episodic Rein- forcement Learning (RL) both in the model-free and the model-based setting. We.
Apr 14, 2021 · This paper presents the first non-asymptotic result showing a model-free algorithm can achieve logarithmic cumulative regret for episodic ...