Home / Flappy Bird using RL

Flappy Bird using RL

The project aims to train an agent (a bird) to score by crossing pillars in a Flappy-Bird-gym environment created for OpenAI. The agent bird learns to score by crossing pipes with the Q-Learning Algorithm.

Environment Description

State Space

This environment contain state as location coordinate of bird’s centre in environment.
Observation from environment is horizontal distance and vertical distance from centre of next pipe to centre of bird
States normalization status is False ie. we are getting states in form of integers
Agent also has direction it is going after flap in observation space(Here we have fixed the angle of rotation & direction to 0°)

Action Space

There are two actions in this Environment
- 0 = Do Nothing
- 1 = Flap In this case we have fixed the direction to 0°

Reward

Reward is +1 for every step agent (bird) takes.
Reward is +5 for crossing each pipe.
Reward is -10 if the bird Crashes.

Termination

If bird hits pipe or collides with ground
If maximum no. of steps of bird are reached

Algorithm

Flappy Bird Environment is solved by Q-Learning Algorithm.
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state.
It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
For any finite Markov decision process, Q-learning finds an optimal policy

Result

Team

Mentors