Flappy Bird using RL

The project aims to train an agent (a bird) to score by crossing pillars in a Flappy-Bird-gym environment created for OpenAI. The agent bird learns to score by crossing pipes with the Q-Learning Algorithm.

Environment Description

State Space

  • This environment contain state as location coordinate of bird’s centre in environment.
  • Observation from environment is horizontal distance and vertical distance from centre of next pipe to centre of bird
  • States normalization status is False ie. we are getting states in form of integers
  • Agent also has direction it is going after flap in observation space(Here we have fixed the angle of rotation & direction to 0°)

Action Space

  • There are two actions in this Environment
    • 0 = Do Nothing
    • 1 = Flap In this case we have fixed the direction to 0°


  • Reward is +1 for every step agent (bird) takes.
  • Reward is +5 for crossing each pipe.
  • Reward is -10 if the bird Crashes.


  • If bird hits pipe or collides with ground
  • If maximum no. of steps of bird are reached


  • Flappy Bird Environment is solved by Q-Learning Algorithm.
  • Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state.
  • It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
  • For any finite Markov decision process, Q-learning finds an optimal policy


