arxiv:2107.01825

Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation

Published on Jul 5, 2021

Authors:

Wanpeng Zhang ,

Abstract

MEEE, a model-ensemble method in deep reinforcement learning, uses optimistic exploration and weighted exploitation to improve sample efficiency and address challenges of uncertainty evaluation and error propagation.

AI-generated summary

Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate dynamic models, evaluating the uncertainty of the learned models, and more rational utilization of models. To mitigate these issues, we present MEEE, a model-ensemble method that consists of optimistic exploration and weighted exploitation. During exploration, unlike prior methods directly selecting the optimal action that maximizes the expected accumulative return, our agent first generates a set of action candidates and then seeks out the optimal action that takes both expected return and future observation novelty into account. During exploitation, different discounted weights are assigned to imagined transition tuples according to their model uncertainty respectively, which will prevent model predictive error propagation in agent training. Experiments on several challenging continuous control benchmark tasks demonstrated that our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2107.01825 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2107.01825 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2107.01825 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.