COMP704 – Beginning development

Today I have began work on my AI and attempted to implement the Snake add-on to my project and begin setting up the Q learning function to the project.

Development

I’ve begun setting up the structure of the code so that I have a foundation to work with when trying to set up functionality in future iterations. Parts like the epsilon greedy policy and implementing the return functions that the Q-learning equation will use was actually quite simple. However, trying to implement a function that returns an array of possible actions for the greedy policy to use has proven to be more difficult. I worry that I may be duplicating code similar to the function that simply accesses the Q-table but only returns a value.

Each episode is run via a for-loop and within that loop is another for-loop representing each step that the AI will make in the environment. The reason two loops are being used instead of one is because if one was used to represent the episodes and steps then the environment would reset to default state every step the AI makes in the environment.

Problems

While trying to set up the Snake environment in my AI script, I came across a problem as the window that was meant to render the snake game wasn’t appearing, however an empty graph was appearing. After looking at OpenAI Gym’s ‘Cart Pole’ example and comparing that project’s render function to the snake render function, I realised that the developers had decided to display Snake on a Graph hence why empty graphs were appearing when the game was being played. Unfortunately this problem seemed to be happening on PyCharm (my development platform) and is possibly due to the size of the generated grid or the data being plotted on the graph, so I’ve had to abandon the idea of using Snake as my environment and have decided to use one of OpenAI Gym’s Atari environments instead. The other reason for this change in direction is because I now only have 4 weeks and don’t wish to use up any more time on experimenting with environments that might not work.

When setting up the Atari environment I had to install the entirety of OpenAI Gym to ensure that the correct environments would appear. Additionally, I had to install ale-py, otherwise it wouldn’t be able to emulate the Atari systems. Screenshots of this are shown below in fig 1 and 2.

Fig. 1: Oates. 2022. installing ale-py with pip. [picture]

Fig. 2: Oates. 2022. error showing that all aspects of OpenAI Gym is required. [picture]

Despite the errors, this solution seems to be working a lot better and having more promise, as OpenAI Gym is design to be a platform for testing and training AI in retro games making it very suitable for my project. In addition, since the library is built for this purpose, setting up environments will be much easier (Rana 2018).

What I’ve learned

From this experience I have decided that I need to be more careful when picking add-ons. This is especially the case if the description doesn’t mention anything about how the add-on works as I will then need to dive into the code to understand how it works and why. From the Q-learning side of the project, I’ve learned that implementing the Q-learning equation can be quite simple as a lot of the parameters are simple data that increase or decrease every step. In addition, when it comes to the Q-table while I hope to use a CSV file to store the data, through research I can simply use a dictionary to store the data.

Further Enquires

While it is a shame that I can not use Snake as my AI’s environment, if I had the time, I could try and see if I could get the Snake environment to work with PyCharm. This could be a great opportunity for me in the future to see if I can get my AI to play Snake. In addition, if the developer would allow it, I could create a separate branch on the repo where this add-on came from, so that future developers can use the environment in PyCharm as well. Of course this would take time away from my work on the AI.

Reflection and what I aim to do next

Looking at the project I can continue working on the Q-learning aspect of the AI as normal since no problems have come up there. As for the environment itself, while it is annoying that the Snake(1976) environment is not working, especially since it did not mention anything about this in the READ ME file, I will simply use another environment. Looking through OpenAI Gym’s environments, I will use Asterix (1983) as the AI’s new environment and propose what will be the AI’s new states as well as other aspects of the environment. A picture of Asterix (1983) is shown below in fig 3.

AtariProtos.com - All Your Protos Are Belong To Us! — Fig. 3: Unknown maker. ca. 2022. No title [photo]

Looking at the game though it is more complex than Snake(1976), I believe that the agent’s position will remain the AI’s states but further research will be needed to see if that’s possible. In addition, it’s possible that a different type of Q-learning may be required to handle the greater amounts of data that the environment may output. For example, ‘Deep Q-learning’ may be required, which is a combination of both Q-learning and neural networks, where instead of a weight value for each neuron, a quality value is used instead (N. Yannakakis and Togelius 2018). However, due to the time required for this, it might be too high a scope, as mentioned earlier I only have 4 weeks to develop, polish and train the AI.

As for its actions, by using env.env.get_action_meaning, I have found that there are 9 actions that will reside in the table, these being noop, up, down, left, right, up left, up right, down left and down right. To elaborate, ‘env’ is the variable that contains the environment that the AI uses. Pictures of the code I used are shown below in fig 4, 5 and 6. This solution was found when looking through an article on the basics of reinforcement learning in OpenAI Gym (Rana 2018).

Fig. 4: Oates. 2022. picture of stored environment. [picture]

Fig. 5: Oates. 2022. code used to print environment’s actions. [picture]

Fig. 6: Oates. 2022. results from using get_action_meaning. [picture]

Bibliography

Asterix. 1983. Atari, inc, Atari, inc.

N. YANNAKAKIS, Georgios and Julians TOGELIUS. 2018. Artificial Intelligence and Games. Springer.

RANA, Ashish. 2018. ‘Introduction: Reinforcement Learning with OpenAI Gym’. Available at: https://towardsdatascience.com/reinforcement-learning-with-openai-d445c2c687d2. [Accessed 01/02/2022].

Snake. 1976. Gremlin Interactive, Gremlin Interactive.

Figure List

Figure 1: Max Oates. 2022. installing ale-py with pip.

Figure 2: Max Oates. 2022. error showing that all aspects of OpenAI Gym is required.

Figure 3: Unknown maker. ca. 2022. No title [photo]. V&A [online]. Available at: https://www.retroplace.com/en/games/85731–asterix

Figure 4: Max Oates. 2022. picture of stored environment.

Figure 5: Max Oates. 2022. code used to print environment’s actions.

Figure 6: Max Oates. 2022. results from using get_action_meaning.