For this blog post, we've decided to take the algorithm from part I, and train it on Kraken BTC/EUR exchange data. The features that were used for learning the Q function (the AI part, the magical black box that is able to look at the data and tell you what action will yield the best reward):
One-day price movement
Difference between SMA15 and SMA60
We've trained a model for up to 150 iterations on Nvidia Tesla K80 GPUs on three different datasets, each roughly 200 days shorter than the previous one. On this setup, it took about 6-7 hours per dataset.
Well, let's see what training these models looks like!
Notice the huge fluctuations in how "decisive" our AI trader aims to be when the iteration count is low: the AI is still learning and starts acting too optimistically, only to find out that it's much more valuable to predict long-term trends rather than playing the "guess tomorrow's price" game, hence the charts become mostly all "hold" towards the end of learning. However it still does not possess the decisiveness and steady hand of a human, and decides to take an odd trade every once in awhile in order to keep searching for a new "alpha" and correct course if need be. It also probably realizes that holding bitcoin was a good idea in hindsight, and the penalty for holding is too small to try do anything else.
So interestingly enough the model seems to converge to a solution (most likely the model will eventually converge to buy and hold strategy), this solution is simply not useful in practice. How could we make the robot more risky, make him predict peaks and sell at the highest price? For that goal, there's nothing like more data in different market conditions.
Do you have an idea or a strategy that you think is worth trying out? Let us know in the comments and we'll investigate the best one on part III!