Reinforcement Learning

Breakthroughs in Deep Reinforcement Learning such as AlphaZero and Proximal Policy Optimisation (PPO) have made it possible for AI engines to find optimal long-term strategies in complex environments. We take advantage of this technology to solve real world challenges in finance, engineering and science.

The Challenge

You need to build an engine to choose optimal actions within a complex environment such as a stock market, but you aren't sure how to move beyond a basic rules-based system.

How can you build a smart AI that learns the optimal strategy for the given challenge, by repeated play and reinforcement learning?

The Solution

Our in-house deep reinforcement learning specialists design and build AI systems that are able to learn progressively more complex and sophisticated strategies through self-play.

We utilise cutting edge deep reinforcement learning techniques such as DeepMind's AlphaZero and OpenAI's PPO to build AIs that evaluate the value of a given action from the current state of the environment and maximise future reward.

We pit the AIs against eachother, with each successive generation finding new and more elaborate tactics to triumph over previous versions.

Deliverables

01

A trained AI agent coded using Python and Tensorflow, that outputs the optimal action given an environment state, along with statistics to explain the relative value of the possible actions.