Reinforcement Learning 1

Consider the actions in this page: https://davidkmarzagao.github.io/RL-bandits-exercise/

Perform a random algorithm to select actions.

How to use the system? There are 5 actions marked a_1 \dots a_5 . By clicking on them you sample of of its values. You may need to make use of a random number generator to decide on which action to select, and you will find a button for exactly that. This will ensure that you will get the “random” numbers you are supposed to get.

Made a mistake? Refresh the page to restart the sequence of random numbers and action values.

You question is: what is the average of the first 8 rewards you obtain? This number should be rounded to 1 decimal place and it is the password for the next page:

Next page: https://kohan.uk/rl-2