The project aims to train an agent (a bird) to score by crossing pillars in a Flappy-Bird-gym environment created for OpenAI. The agent bird learns to score by crossing pipes with the Q-Learning Algorithm.



Environment Description
State Space
- This environment contain state as location coordinate of bird’s centre in environment.
- Observation from environment is horizontal distance and vertical distance from centre of next pipe to centre of bird
- States normalization status is False ie. we are getting states in form of integers
- Agent also has direction it is going after flap in observation space(Here we have fixed the angle of rotation & direction to 0°)
Action Space
- There are two actions in this Environment
- 0 = Do Nothing
- 1 = Flap In this case we have fixed the direction to 0°
Reward
- Reward is +1 for every step agent (bird) takes.
- Reward is +5 for crossing each pipe.
- Reward is -10 if the bird Crashes.
Termination
- If bird hits pipe or collides with ground
- If maximum no. of steps of bird are reached
Algorithm
- Flappy Bird Environment is solved by Q-Learning Algorithm.
- Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state.
- It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
- For any finite Markov decision process, Q-learning finds an optimal policy

Result
