- 15th Jul 2024
- 18:09 pm
- Adan Salman
In this Assignment, you will teach RL Agents to pickup packages on a grid-world. The environment you will be using is called the Four-Rooms domain an environment commonly used to test the efficacy of RL Algorithms. You must implement Qlearning as your RL aglorithm with the goal of teaching your agent to pickup the package(s) in three scenarios of increasing complexity. General information about this Assignment and the Scenarios can be found below.
When implementing your solutions, take note of the following:
- The environment is a 2D grid-world of size 13 x 13 but rows 0 and 12 and columns 0 and 12 are boundaries (not traversable) so it is technically a grid-world of size 11 x 11. The environment has one start state that is randomly assigned when the environment is created and a terminal state that is determined when the number of packages left to collect reaches 0.
- You have been given two scripts to start with: FourRooms.py and ExecutionSkeleton.py. You are not allowed to modify FourRooms.py but, you can, and will, copy and modify ExecutionSkeleton.py for each of the scenarios you tackle. Documentation for FourRooms.py can be found in Appendix A.
- Your agents will only have a partial representation of the environment whichconsists of their location (x,y) and the number of packages left k. They are not allowed to know the location of the package(s), hallways (cells that connect two rooms together) or the boundaries of the environment. They must figure that out themselves.
Scenario : Simple Package Collection
In this first scenario, your Agent must collect 1 package located somewhere in the environment. Your implementation should be done in a file called Scenario1.py and should be invoked as follows: python Scenario1.py. For this scenario, you should specify scenario = ’simple’ when creating your FourRooms object and call fourRoomsObj.newEpoch() whenever your agent starts a new training epoch. When learning is complete, you should call fourRoomsObj.showPath(-1) to show the final path your agent took to get from its start state to the terminal state. showPath will display an image on screen so you will need to set the savefig optional parameter to save the image to disk if you are on nightmare.
ML - Reinforcement Learning - Get Assignment Solution
Please note that this is a sample assignment solved by our Python Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.
- Option 1 - To download the complete solution along with Code, Report and screenshots - Please visit our Programming Assignment Sample Solution page
- Option 2 - Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
- Option 3 - You can check the partial solution for this assignment in this blog below
Free Project Solution - ML - Reinforcement Learning Assignment Solution
import os
from FourRooms import FourRooms
import numpy as np
import random
#CONSTANTS
EPOCHS = 10000
E_GREEDY = 0.5
DEC_RATE = 0.6
DISCOUNT = 0.9
Q = np.array([[0,0,0,0] for i in range(169)])
R = np.array([[0,0,0,0] for i in range(169)])
def qTableUpdate(FR, Q, prevPos, action, visited):
index = prevPos[0] + prevPos[1]*13
l_rate = 1/(1 + 0.3*visited[index][action])
Q[index][action] += l_rate*(R[index][action] + DISCOUNT*(max(Q[FR.getPosition()[0] + FR.getPosition()[1]*13])) - Q[index][action])
def rTableUpdate(FR, R, prevPos, action):
index = prevPos[0] + prevPos[1]*13
if prevPos == FR.getPosition():
R[index][action] = -1
if FR.getPackagesRemaining() == 0:
R[index][action] = 100
def trainingFunction(FRobj, Q, R, EPOCHS):
for k in range(EPOCHS):
FRobj.newEpoch()
prevPos = FRobj.getPosition()
visited = np.array([[0,0,0,0] for x in range(169)])
E = E_GREEDY
os.system("clear")
print("Training... number of epochs (", k, "/", EPOCHS,")")
while not FRobj.isTerminal():
index = prevPos[0] + prevPos[1]*13
if random.random() < E:
choices = []
for i in range(4):
if R[index][i] >= 0:
choices.append(i)
action = choices[random.randint(0,len(choices)-1)]
visited[index][action] += 1
FRobj.takeAction(action)
else:
choices = []
action = 0
for i in range(4):
if Q[index][i] == max(Q[index]):
choices.append(i)
action = choices[random.randint(0,len(choices)-1)]
visited[index][action] += 1
FRobj.takeAction(action)
rTableUpdate(FRobj, R, prevPos, action)
qTableUpdate(FRobj, Q, prevPos, action, visited)
prevPos = FRobj.getPosition()
E *= DEC_RATE
def main():
fourRoomsObj = FourRooms('simple',False)
stepsSequence = [FourRooms.LEFT, FourRooms.LEFT, FourRooms.LEFT,
FourRooms.UP, FourRooms.UP, FourRooms.UP,
FourRooms.RIGHT, FourRooms.RIGHT, FourRooms.RIGHT,
FourRooms.DOWN, FourRooms.DOWN, FourRooms.DOWN]
typesActions = ['UP', 'DOWN', 'LEFT', 'RIGHT']
adjType = ['EMPTY', 'RED', 'GREEN', 'BLUE']
print('Agent starts at: {0}'.format(fourRoomsObj.getPosition()))
for actions in stepsSequence:
gridType, newPos, packagesRemaining, isTerminal = fourRoomsObj.takeAction(actions)
print("Agent took {0} action and moved to {1} of type {2}".format (typesActions[actions], newPos, adjType[gridType]))
if isTerminal:
break
trainingFunction(fourRoomsObj, Q, R, EPOCHS)
fourRoomsObj.showPath(-1)
if __name__ == "__main__":
main()
Get the best ML - Reinforcement Learning Assignment help and tutoring services from our experts now!
About The Author - Sneha Mishra
Sneha Mishra specializes in reinforcement learning (RL) and AI, focusing on teaching RL agents to navigate and collect packages in the Four-Rooms domain. With expertise in Q-learning algorithms, Sneha implements solutions across scenarios of increasing complexity, ensuring agents learn optimal strategies. Dedicated to advancing AI capabilities in dynamic environments.