Teen-Different commited on
Commit
cf14949
·
verified ·
1 Parent(s): 716c31e

Upload 25 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ images/all_algos_testing.png filter=lfs diff=lfs merge=lfs -text
37
+ images/all_algos.png filter=lfs diff=lfs merge=lfs -text
38
+ Report.pdf filter=lfs diff=lfs merge=lfs -text
Enhanced Environemnt Simulation.gif ADDED
README.md CHANGED
@@ -1,3 +1,230 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RxRovers: Roaming for Rapid Relief
2
+
3
+ <span style="color: blue;">Dynamic Obstacles and Path Optimization
4
+
5
+ ![Trained Agents in Action](images/trained_agents.gif)
6
+
7
+ ## Project Overview
8
+
9
+ "RxRovers: Roaming for Rapid Relief" aims to integrate advanced reinforcement learning (RL) into the healthcare domain, specifically focusing on optimizing medical supply delivery within hospital settings. This project seeks to deploy autonomous agents, RxRovers, which are programmed to navigate through hospital corridors efficiently, dodging any potential obstacles (dynamic and static) to ensure the timely distribution of medicines. This innovation can significantly enhance patient care and outcomes.
10
+
11
+ ## Team Members
12
+
13
+ - Charvi Kusuma [GitHub](https://github.com/kcharvi)
14
+ - Tarun Reddi [GitHub](https://github.com/REDDITARUN)
15
+
16
+ ### Real-World Application
17
+
18
+ The RxRovers project directly addresses a real-world healthcare problem: simulating delivery of medical supplies efficiently within hospital environments. This project integrates RL into the healthcare domain to enhance this delivery system, directly benefiting patient care and outcomes.
19
+
20
+ ### Complex Navigation Challenges
21
+
22
+ The project’s challenges mirror real-world navigation complexities in hospital settings:
23
+
24
+ - **Dynamic and Static Obstacles**: Accurate representation of unpredictable hospital environments.
25
+ - **Path Planning**: Optimization to ensure timely and efficient delivery of medical supplies.
26
+
27
+ ### Adaptable Environment
28
+
29
+ The project's environment can be modified to reflect various real-world hospital layouts, demonstrating its adaptability to different healthcare settings.
30
+
31
+ ## Objectives
32
+
33
+ 1. Develop RL agents capable of autonomously navigating hospital environments while delivering medical supplies.
34
+ 2. Optimize path planning strategies to ensure timely and efficient delivery of medicines, while avoiding obstacles such as equipment, humans, and environmental constraints.
35
+ 3. Enhance the visual representation and user experience of the simulated hospital environment to improve engagement and realism.
36
+ 4. Conduct comparative analysis of various reinforcement learning algorithms to identify the optimal approach for medical supply delivery optimization within hospital environments.
37
+
38
+ ## Environment
39
+
40
+ ### Simulated Hospital Layouts
41
+
42
+ The project creates a hospital environment that mirrors real-world scenarios. The environment includes features such as:
43
+
44
+ - **Hospital Corridors**: Multiple corridors, rooms, and operation desks.
45
+ - **Obstacles**: Dynamic obstacles (humans, other rovers) and static obstacles (rooms, walls).
46
+ - **RxRovers**: Autonomous agents programmed to navigate these environments, avoid obstacles, and reach designated destinations.
47
+
48
+ ### Initial Stage
49
+
50
+ - **Grid**: 9×9 grid simulating rovers navigating on a grid.
51
+ - **Setup**: Initialized action and observation spaces, set up grid size, starting points, destinations, and parameters such as rewards and penalties for actions and events.
52
+
53
+ ### Refined Version
54
+
55
+ - **Grid Size**: 15×15 grid representing the hospital environment.
56
+ - **Rovers**: Two agents represented by blue squares.
57
+ - **Targets**: Yellow target destinations for delivering medicine.
58
+ - **Actions**: Move down, up, left, right, or stay still (5 possible actions).
59
+ - **Operation Desks**: Dark green squares representing starting points.
60
+ - **Rooms**: Black squares indicating static obstacles.
61
+ - **Human**: Purple square representing a moving human obstacle.
62
+ - **Observation Space**: Positions of the Rovers, human, rooms, operation desks, and the grid boundary.
63
+ - **Rewards**:
64
+ - +30 for moving closer to targets.
65
+ - +100 for reaching destinations.
66
+ - -15 for collisions or moving out of grid bounds.
67
+ - -5 for waiting near the human obstacle.
68
+ - -20 for moving away from targets.
69
+ - **Termination**: Episode ends if both Rovers reach their targets or maximum time steps (20) are reached.
70
+
71
+ ![Environment Image](images/enhanced-environment.png)
72
+
73
+ ## Algorithms
74
+
75
+ ### Q-Learning (QL)
76
+
77
+ Implemented Q-Learning for a rover agent navigating a grid environment.
78
+
79
+ - **Settings**:
80
+ - State Representation: Positions of the two rovers and the human.
81
+ - Action Space: Move up, down, left, right, or stay in place.
82
+ - Rewards: Based on interactions (moving closer to target, collisions, reaching target).
83
+ - Done Flag: Episode terminates when both rovers reach targets or max time steps are reached.
84
+ - **Hyperparameters**:
85
+ - Alpha (Learning Rate): 1e-4
86
+ - Gamma (Discount Factor): 0.9
87
+ - Epsilon (Exploration Rate): 0.5
88
+ - Epsilon Decay: 0.995
89
+ - Epsilon Minimum: 0.01
90
+ - **Training Phase**:
91
+ - Total Rewards per Episode
92
+ - Epsilon Decay Curve
93
+ - **Evaluation Phase**:
94
+ - Evaluation Rewards per Episode
95
+
96
+ ### Double Q-Learning (DQL)
97
+
98
+ Implemented Double Q-Learning to mitigate overestimation biases.
99
+
100
+ - **Settings**: Same as Q-Learning.
101
+ - **Hyperparameters**: Same as Q-Learning.
102
+ - **Training Phase**:
103
+ - Total Rewards per Episode
104
+ - **Evaluation Phase**:
105
+ - Evaluation Rewards
106
+
107
+ ### Deep Q Network (DQN)
108
+
109
+ Utilized a neural network to approximate the Q-values.
110
+
111
+ - **Settings**:
112
+ - Neural Network Architecture: Two hidden layers with ReLU activation.
113
+ - Replay Memory: Stores past experiences for experience replay.
114
+ - Select Action Function: Epsilon-greedy strategy.
115
+ - Optimize Model Function: Gradient descent on the Q-network.
116
+ - **Hyperparameters**:
117
+ - Number of Episodes: 1000
118
+ - Target Network Update Frequency: 10 episodes
119
+ - Batch Size: 256
120
+ - Discount Factor: 0.9
121
+ - Learning Rate: 0.001
122
+ - Epsilon Initial Value: 1
123
+ - Epsilon Final Value: 0.05
124
+ - Epsilon Decay: 10000
125
+ - Maximum Timestamps per Episode: 30
126
+ - **Training Phase**:
127
+ - Total Rewards per Episode
128
+ - Epsilon Decay Curve
129
+ - **Evaluation Phase**:
130
+ - Evaluation Rewards
131
+
132
+ ### Double Deep Q Network (DDQN)
133
+
134
+ Addresses overestimation bias by decoupling selection and evaluation of the action.
135
+
136
+ - **Settings**:
137
+ - Optimize Model Function: Gradient descent with gradients clipping.
138
+ - **Hyperparameters**:
139
+ - Batch Size: 256
140
+ - Gamma: 0.9
141
+ - Learning Rate: 0.001
142
+ - Epsilon Start: 1
143
+ - Epsilon End: 0.05
144
+ - Epsilon Decay: 10,000
145
+ - Number of Episodes: 1000
146
+ - Target Update: Every 10 episodes
147
+ - **Training Phase**:
148
+ - Total Rewards per Episode
149
+ - Epsilon Decay Curve
150
+ - **Evaluation Phase**:
151
+ - Evaluation Rewards
152
+
153
+ ### Proximal Policy Optimization (PPO)
154
+
155
+ A robust and efficient algorithm developed by OpenAI.
156
+
157
+ - **Settings**:
158
+ - Neural Network Architecture: Actor and critic heads.
159
+ - Training Loop: States, actions, rewards, values, and log-probs collected in replay buffer.
160
+ - Loss Functions: Policy and value losses.
161
+ - **Hyperparameters**:
162
+ - Total Timesteps: 50,000
163
+ - Gamma: 0.99
164
+ - Lambda: 0.95
165
+ - Epsilon: 0.2
166
+ - Epochs: 3
167
+ - Batch Size: 64
168
+ - Learning Rate: 0.001
169
+ - **Training Phase**:
170
+ - Total Rewards per Episode
171
+ - **Evaluation Phase**:
172
+ - Evaluation Rewards
173
+
174
+ ### Actor Critic (A2C)
175
+
176
+ Combines elements of both policy-based methods (Actor) and value-based methods (Critic).
177
+
178
+ - **Settings**:
179
+ - Actor Network: Learns a policy π(s).
180
+ - Critic Network: Learns the value function V(s).
181
+ - Advantage: Measures action quality.
182
+ - Policy and Value Function Updates: Gradient descent.
183
+ - **Training Phase**:
184
+ - Total Rewards per Episode
185
+ - **Evaluation Phase**:
186
+ - Evaluation Rewards
187
+
188
+ **Total Rewards Plot during Training:**
189
+
190
+ ![Plots](images/all_algos.png)
191
+
192
+ **Evaluation for 10 Timestamps:**
193
+
194
+ ![Evaluation Plots](images/all_algos_testing.png)
195
+
196
+ ## Comparison
197
+
198
+ Comparison of different reinforcement learning algorithms and their performance:
199
+
200
+ | Aspect | Q-Learning | Double Q-Learning | DQN | DDQN | PPO | A2C |
201
+ | ------------------------- | ----------------------- | ----------------------- | -------------------- | ------------------------------ | ----------------------------- | ----------------------------- |
202
+ | Algorithm Type | Model-free, Value-based | Model-free, Value-based | Value-based, Deep NN | Value-based, Deep NN | Actor-Critic, Policy Gradient | Actor-Critic, Policy Gradient |
203
+ | Exploration-Exploitation | Epsilon-greedy | Epsilon-greedy | Epsilon-greedy | Epsilon-greedy | Continuous | Continuous |
204
+ | Stability & Learning Rate | No target network | No target network | Target network | Double updates, target network | Adaptive, Gradient Clipping | Adaptive, Entropy Bonus |
205
+ | Convergence | Slow | Moderate | Fast | Faster | Fast | Moderate |
206
+ | Memory Requirements | Low | Low | High | High | Moderate | Moderate |
207
+ | Adaptability | Moderate | Moderate | Low to Moderate | Low to Moderate | High | Moderate |
208
+ | Generalization | Low to Moderate | Low to Moderate | Low | Low | High | Moderate |
209
+
210
+ ## Challenges Addressed
211
+
212
+ 1. **Navigating a Dynamic World**: Rovers learned to dodge moving obstacles using dynamic obstacle handling techniques.
213
+ 2. **Custom Paths for Diverse Layouts**: Environment adaptable to various hospital settings.
214
+ 3. **Learning Optimal Path**: Rovers trained to take the shortest route to delivery rooms, avoiding obstacles.
215
+
216
+ ## Bonus: Real-World Application
217
+
218
+ 1. **Healthcare System Integration**: Enhances delivery system within hospital settings, benefiting patient care and outcomes.
219
+ 2. **Simulating Hospital Layouts**: Reflects real-world scenarios with hospital corridors, obstacles, and RxRovers.
220
+ 3. **Complex Navigation Challenges**: Optimizes path planning and avoids dynamic/static obstacles.
221
+ 4. **Adaptable Environment**: Can be modified to different healthcare settings.
222
+
223
+ ## References
224
+
225
+ 1. [Deep Reinforcement Learning DQN for Multi-Agent Environment](https://medium.com/yellowme/deep-reinforcement-learning-dqn-for-multi-agent-environment-5f4fae1a9ff5)
226
+ 2. [Warehouse Robot Path Planning](https://github.com/LyapunovJingci/Warehouse_Robot_Path_Planning) for path planning and optimization with obstacle avoidance.
227
+ 3. [Reinforcement Learning (DQN) Tutorial | PyTorch Tutorials](https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)
228
+ 4. [Deep Q Learning (DQN) using PyTorch](https://medium.com/@vignesh.g1609/deep-q-learning-dqn-using-pytorch-a31f02a910ac)
229
+
230
+ ---
Report.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a28e62aa4cfc411d65a8b745759d810d14b0674110e044bfbd672f0f991f26ea
3
+ size 3280745
images/all_algos.png ADDED

Git LFS Details

  • SHA256: 53e0b0a16f06d882fb595c8f0c2b39e3b774ba814e0a6cc058642491fcff0bc0
  • Pointer size: 131 Bytes
  • Size of remote file: 275 kB
images/all_algos_testing.png ADDED

Git LFS Details

  • SHA256: ba14f095759a0cb693ec4d5a9a6c59b0b19d1a8d6671a80591d0d925cf7fc9b0
  • Pointer size: 131 Bytes
  • Size of remote file: 188 kB
images/desk.png ADDED
images/enhanced-environment.png ADDED
images/human.png ADDED
images/room.png ADDED
images/rover_desk_collision.png ADDED
images/rover_dest.png ADDED
images/rover_human_collision.png ADDED
images/rover_moving.png ADDED
images/rover_room_collision - Copy.png ADDED
images/rover_rover_collision.png ADDED
images/rover_start.png ADDED
images/target.png ADDED
images/trained_agents.gif ADDED
rx_rover_A2C.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
rx_rover_DDQN.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
rx_rover_DQN.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
rx_rover_Double_Q_Learning.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
rx_rover_Enhanced_Environment.ipynb ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Environment - Enhanced"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "code",
12
+ "execution_count": 2,
13
+ "metadata": {},
14
+ "outputs": [],
15
+ "source": [
16
+ "import gym\n",
17
+ "from gym import spaces\n",
18
+ "import numpy as np\n",
19
+ "import matplotlib.pyplot as plt\n",
20
+ "\n",
21
+ "import os\n",
22
+ "import random\n",
23
+ "import imageio\n",
24
+ "from tqdm import tqdm \n",
25
+ "from itertools import count\n",
26
+ "from collections import namedtuple, deque\n",
27
+ "\n",
28
+ "import torch\n",
29
+ "import torch.nn as nn\n",
30
+ "import torch.optim as optim\n",
31
+ "import random\n",
32
+ "import matplotlib.image as mpimg"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "code",
37
+ "execution_count": 3,
38
+ "metadata": {},
39
+ "outputs": [],
40
+ "source": [
41
+ "class RoverGridEnv(gym.Env):\n",
42
+ " metadata={'render.modes': ['human']} \n",
43
+ " def __init__(self, max_ts=20): \n",
44
+ " super(RoverGridEnv,self).__init__()\n",
45
+ " self.max_ts=max_ts # The Max_Timestamps is set to 20 by default.\n",
46
+ " self.grid_size=(15,15) \n",
47
+ " self.action_space=spaces.Discrete(5) \n",
48
+ " self.observation_space=spaces.MultiDiscrete([15,15,15,15,15,15])\n",
49
+ " self.rover_positions=np.array([[6,4],[10,4]])\n",
50
+ " self.operation_desks=np.array([[6,3],[10,3]])\n",
51
+ " self.rooms=np.array([[4,7],[4,10],[4,13],[8,7],[8,10],[8,13],[12,7],[12,10],[12,13]])\n",
52
+ " self.human_position=np.array([8,9])\n",
53
+ " self.targets=np.array([[5,10],[9,13]])\n",
54
+ " self.actions=[(0,-1),(0,1),(-1,0),(1,0),(0,0)] # Down,Up,Left,Right,Wait\n",
55
+ " self.rover_done=[False,False] \n",
56
+ " self.reset()\n",
57
+ " \n",
58
+ " def seed(self,seed=None):\n",
59
+ " np.random.seed(seed)\n",
60
+ " random.seed(seed)\n",
61
+ " \n",
62
+ " def reset(self):\n",
63
+ " self.current_step=0\n",
64
+ " self.rover_positions=np.array([[6,4],[10,4]])\n",
65
+ " self.rover_done=[False,False]\n",
66
+ " self.human_position=np.array([7,8])\n",
67
+ " self.current_step=0\n",
68
+ " return self._get_obs()\n",
69
+ " \n",
70
+ " def _get_obs(self):\n",
71
+ " return np.concatenate((self.rover_positions.flatten(),self.human_position))\n",
72
+ " \n",
73
+ " def step(self,actions):\n",
74
+ " rewards=np.zeros(2)\n",
75
+ " done=[False,False]\n",
76
+ " info={'message': ''} \n",
77
+ " for i,action in enumerate(actions):\n",
78
+ " if self.rover_done[i]:\n",
79
+ " done[i]=True \n",
80
+ " continue\n",
81
+ " prev_distance=np.linalg.norm(self.targets[i]-self.rover_positions[i])\n",
82
+ " if self._is_human_adjacent(self.rover_positions[i]):\n",
83
+ " rewards[i] -= 5\n",
84
+ " else:\n",
85
+ " delta=np.array(self.actions[action])\n",
86
+ " new_position=self.rover_positions[i]+delta\n",
87
+ " if self._out_of_bounds(new_position):\n",
88
+ " rewards[i] -= 15\n",
89
+ " continue\n",
90
+ " if self._collision(new_position,i):\n",
91
+ " rewards[i] -= 15\n",
92
+ " continue\n",
93
+ " self.rover_positions[i]=new_position\n",
94
+ " new_distance=np.linalg.norm(self.targets[i]-new_position)\n",
95
+ " if new_distance < prev_distance:\n",
96
+ " rewards[i]+=30 \n",
97
+ " else:\n",
98
+ " rewards[i] -= 20 \n",
99
+ " if np.array_equal(new_position,self.targets[i]):\n",
100
+ " rewards[i]+=100\n",
101
+ " self.rover_done[i]=True \n",
102
+ " done[i]=True\n",
103
+ "\n",
104
+ " # move human randomly\n",
105
+ " self._move_human()\n",
106
+ " self.current_step+=1\n",
107
+ " all_done=all(done) or self.current_step >= self.max_ts\n",
108
+ " if all_done and not all(done): # if the maximum number of steps is reached but not all targets were reached\n",
109
+ " info['message']='Maximum number of timestamps reached'\n",
110
+ " return self._get_obs(),rewards,all_done,info\n",
111
+ "\n",
112
+ " def _is_human_adjacent(self,position):\n",
113
+ " for delta in [(1,1),(1,-1),(-1,1),(-1,-1)]:\n",
114
+ " adjacent_position=position+np.array(delta)\n",
115
+ " if np.array_equal(adjacent_position,self.human_position):\n",
116
+ " return True\n",
117
+ " return False\n",
118
+ "\n",
119
+ " def _out_of_bounds(self,position):\n",
120
+ " return not (0 <= position[0] < self.grid_size[0] and 0 <= position[1] < self.grid_size[1])\n",
121
+ " \n",
122
+ " def _collision(self,new_position,rover_index):\n",
123
+ " if any(np.array_equal(new_position,pos) for pos in np.delete(self.rover_positions,rover_index,axis=0)):\n",
124
+ " return True # Collision with the other rover\n",
125
+ " if any(np.array_equal(new_position,pos) for pos in self.rooms):\n",
126
+ " return True # Collision with a room\n",
127
+ " if any(np.array_equal(new_position,pos) for pos in self.operation_desks):\n",
128
+ " return True # Collision with an operation desk\n",
129
+ " if np.array_equal(new_position,self.human_position):\n",
130
+ " return True # Collision with the human\n",
131
+ " return False\n",
132
+ " \n",
133
+ " def _move_human(self):\n",
134
+ " valid_moves=[move for move in self.actions if not self._out_of_bounds(self.human_position+np.array(move))]\n",
135
+ " self.human_position+=np.array(valid_moves[np.random.choice(len(valid_moves))])\n",
136
+ " \n",
137
+ " # def render(self,mode='human',save_path=None):\n",
138
+ " # fig,ax=plt.subplots(figsize=(7,7))\n",
139
+ " # ax.set_xlim(0,self.grid_size[0])\n",
140
+ " # ax.set_ylim(0,self.grid_size[1])\n",
141
+ " # ax.set_xticks(np.arange(0,15,1))\n",
142
+ " # ax.set_yticks(np.arange(0,15,1))\n",
143
+ " # ax.grid(which='both')\n",
144
+ "\n",
145
+ " # # draw elements\n",
146
+ " # for pos in self.rover_positions:\n",
147
+ " # ax.add_patch(Rectangle((pos[0]-0.5,pos[1]-0.5),1,1,color='blue'))\n",
148
+ " # for pos in self.operation_desks:\n",
149
+ " # ax.add_patch(Rectangle((pos[0]-0.5,pos[1]-0.5),1,1,color='darkgreen'))\n",
150
+ " # for pos in self.rooms:\n",
151
+ " # ax.add_patch(Rectangle((pos[0]-0.5,pos[1]-0.5),1,1,color='black'))\n",
152
+ " # ax.add_patch(Rectangle((self.human_position[0]-0.5,self.human_position[1]-0.5),1,1,color='purple'))\n",
153
+ " # for pos in self.targets:\n",
154
+ " # ax.add_patch(Rectangle((pos[0]-0.5,pos[1]-0.5),1,1,color='yellow',alpha=0.5))\n",
155
+ "\n",
156
+ " # if save_path is not None:\n",
157
+ " # plt.savefig(save_path)\n",
158
+ " # plt.close()\n",
159
+ " \n",
160
+ " # def close(self):\n",
161
+ " # plt.close()\n",
162
+ "\n",
163
+ " def render(self, mode='human', save_path=None):\n",
164
+ " fig, ax=plt.subplots(figsize=(7,7))\n",
165
+ " ax.set_xlim(0, self.grid_size[0])\n",
166
+ " ax.set_ylim(0, self.grid_size[1])\n",
167
+ "\n",
168
+ " rover_start_img_path='images/rover_moving.png' \n",
169
+ " rover_moving_img_path='images/rover_moving.png' \n",
170
+ " rover_dest_img_path='images/rover_dest.png' \n",
171
+ " rover_human_collision_path='images/rover_human_collision.png' \n",
172
+ " rover_room_collision_path='images/rover_room_collision.png' \n",
173
+ " rover_desk_collision_path='images/rover_desk_collision.png' \n",
174
+ " rover_rover_collision_path='images/rover_rover_collision.png' \n",
175
+ " desk_img_path='images/desk.png'\n",
176
+ " room_img_path='images/room.png' \n",
177
+ " human_img_path='images/human.png' \n",
178
+ " target_img_path='images/target.png' \n",
179
+ "\n",
180
+ "\n",
181
+ " for i, pos in enumerate(self.rover_positions):\n",
182
+ " \n",
183
+ " if self.rover_done[i]:\n",
184
+ " rover_img=mpimg.imread(rover_dest_img_path) \n",
185
+ " elif np.array_equal(pos, self.rover_positions[i]):\n",
186
+ " rover_img=mpimg.imread(rover_start_img_path) \n",
187
+ " elif self._is_human_adjacent(pos):\n",
188
+ " rover_img=mpimg.imread(rover_human_collision_path) \n",
189
+ " elif self._collision(pos, i):\n",
190
+ " \n",
191
+ " if any(np.array_equal(pos, pos) for pos in self.rooms):\n",
192
+ " rover_img=mpimg.imread(rover_room_collision_path)\n",
193
+ " elif any(np.array_equal(pos, pos) for pos in self.operation_desks):\n",
194
+ " rover_img=mpimg.imread(rover_desk_collision_path)\n",
195
+ " elif any(np.array_equal(pos, self.rover_positions[i]) for i in range(len(self.rover_positions)) if i != 0):\n",
196
+ " rover_img=mpimg.imread(rover_rover_collision_path)\n",
197
+ " else:\n",
198
+ " rover_img=mpimg.imread(rover_moving_img_path) \n",
199
+ " ax.imshow(rover_img, extent=(pos[0]-0.5, pos[0]+0.5, pos[1]-0.5, pos[1]+0.5))\n",
200
+ "\n",
201
+ " \n",
202
+ " for pos in self.operation_desks:\n",
203
+ " desk_img=mpimg.imread(desk_img_path)\n",
204
+ " ax.imshow(desk_img, extent=(pos[0]-0.5, pos[0]+0.5, pos[1]-0.5, pos[1]+0.5))\n",
205
+ "\n",
206
+ " for pos in self.rooms:\n",
207
+ " room_img=mpimg.imread(room_img_path)\n",
208
+ " ax.imshow(room_img, extent=(pos[0]-0.5, pos[0]+0.5, pos[1]-0.5, pos[1]+0.5))\n",
209
+ "\n",
210
+ " human_img=mpimg.imread(human_img_path)\n",
211
+ " ax.imshow(human_img, extent=(self.human_position[0]-0.5, self.human_position[0]+0.5, self.human_position[1]-0.5, self.human_position[1]+0.5))\n",
212
+ "\n",
213
+ " for pos in self.targets:\n",
214
+ " target_img=mpimg.imread(target_img_path)\n",
215
+ " ax.imshow(target_img, extent=(pos[0]-0.5, pos[0]+0.5, pos[1]-0.5, pos[1]+0.5))\n",
216
+ "\n",
217
+ " if save_path is not None:\n",
218
+ " plt.savefig(save_path)\n",
219
+ " plt.close()\n",
220
+ "\n",
221
+ " def close(self):\n",
222
+ " plt.close()\n",
223
+ "\n"
224
+ ]
225
+ },
226
+ {
227
+ "cell_type": "code",
228
+ "execution_count": 4,
229
+ "metadata": {},
230
+ "outputs": [
231
+ {
232
+ "name": "stdout",
233
+ "output_type": "stream",
234
+ "text": [
235
+ "Initial Setup\n"
236
+ ]
237
+ },
238
+ {
239
+ "data": {
240
+ "image/png": "",
241
+ "text/plain": [
242
+ "<Figure size 700x700 with 1 Axes>"
243
+ ]
244
+ },
245
+ "metadata": {},
246
+ "output_type": "display_data"
247
+ }
248
+ ],
249
+ "source": [
250
+ "env=RoverGridEnv()\n",
251
+ "print(\"Initial Setup\")\n",
252
+ "observation=env.reset()\n",
253
+ "env.render()"
254
+ ]
255
+ }
256
+ ],
257
+ "metadata": {
258
+ "kernelspec": {
259
+ "display_name": "Python 3",
260
+ "language": "python",
261
+ "name": "python3"
262
+ },
263
+ "language_info": {
264
+ "codemirror_mode": {
265
+ "name": "ipython",
266
+ "version": 3
267
+ },
268
+ "file_extension": ".py",
269
+ "mimetype": "text/x-python",
270
+ "name": "python",
271
+ "nbconvert_exporter": "python",
272
+ "pygments_lexer": "ipython3",
273
+ "version": "3.8.10"
274
+ }
275
+ },
276
+ "nbformat": 4,
277
+ "nbformat_minor": 2
278
+ }
rx_rover_PPO.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
rx_rover_Q_Learning.ipynb ADDED
The diff for this file is too large to render. See raw diff