arxiv:2402.12845

MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

Published on Feb 20

Upvote

Authors:

Tianyu Zheng ,

Ge Zhang ,

Stephen W. Huang ,

Abstract

Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at https://github.com/Zheng0428/MORE_.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.12845 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.12845 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.12845 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.