To preface this article, I want to say that Iâm not biased. If you search online critically, youâll find countless other articles and experiences explaining how Reinforcement Learning simply does not work for real-world use-cases. The only people that are saying otherwise are course creators and academics within the field.
And I WANTED reinforcement learning to work. When I first heard of it 5 years ago, I was promised that it would revolutionize the world. An algorithm that can optimize ANYTHING with just a clever reward seemed like it could be applied ubiquitously, from designing medicines to advanced robotics. In 2016, when AlphaGo defeated Lee Sedol in Go, a famously complex game, this was supposed to be the turning point where RL would start dominating.
Yet, here we are, 8 years later and none of this materialized. Reinforcement Learning has accomplished nothing in the real-world. It dominates with toy problems and video games, but thatâs it. The only notable advances of RL in the past 8 years is Reinforcement Learning with Human Feedback (RLHF), which is used to train Large Language Models like ChatGPT. And in my opinion, we wonât be using it for very long. Other algorithms simply do it better.
Mathematically Improve Your Trading Strategy: An In-Depth Guide
What is Reinforcement Learning?
Reinforcement Learning is a subfield of Machine Learning. With traditional supervised learning, we have a bunch of input examples and labels. We train the model to apply the correct label with the input example. We do this with 8 supercomputers and millions of training examples and we eventually get a model that can recognize images, generate text, and understand spoken language.
Reinforcement Learning, on the other hand, learns by a different approach. Typically with reinforcement learning, we donât have labeled examples. But, weâre able to craft a âreward functionâ that tells us whether or not the model is doing what we want it to do. A reward function essentially punishes the model when itâs not doing what we want it to do, and rewards the model when it is.
This formulation seems amazing. Getting millions of labeled examples is extremely time-consuming and impractical for most problems. Now, thanks to RL, all we need to do is craft a reward function, and we can generate solutions to complex problems. But the reality is, it doesnât work.
Interested in how AI applies to trading and investing? Check out my no-code automated investing platform NexusTrade! Itâs free and insanely powerful!
Mathematically Improve Your Trading Strategy: An In-Depth Guide
My (Terrible) Experience with Reinforcement Learning
I fell into the hype of reinforcement learning. It started with a course in Introduction to Artificial Intelligence at Cornell University. We talked briefly about RL and the types of problems it could solve, and I became very intrigued. I decided to take an online course from the University of Alberta to dive deeper into reinforcement learning. I received a certificate, which demonstrates my expertise and knowledge in the field.
After graduating from Cornell, I went to Carnegie Mellon to get my Masters in software engineering. I decided to take a notoriously difficult course, Intro to Deep Learning, because I was interested in the subject. It was here where I was able to apply reinforcement learning to a real-world problem.
In the course, we were given a final project. We had the freedom to implement any Deep Learning algorithm and write a paper on our experiences. I decided to apply my passion for finance with my interest in reinforcement learning and implement a Deep Reinforcement Learning algorithm for stock market prediction.
The project failed spectacularly. It was extremely hard to setup, and once I did, there was always something wrong. It was painful to debug problems because once everything compiled and ran, you donât know what part of the system wasnât working properly. It could be the actor network that learns the mapping of states to actions, the critic network that learns the âvalueâ of these state-action pairs, the hyperparameters of the network, or just about anything else.
Iâm not mad at RL because the project failed. If a group of graduate students could make a widely profitable stock-trading bot in a semester, that would upend the stock market. No, Iâm mad at RL because it sucks. Iâll explain more in the next section.
For the source code for this project, check out the following repository. You can also read more technical details in the paper here. For more interesting insights on AI, subscribe to Auroraâs Insights.
GitHub - austin-starks/Deep-RL-Stocks
Why does Reinforcement Learning suck?
Reinforcement Learning has a plethora of problems with it that makes it unusable for real-world situations. To start, it is EXTREMELY complicated. While traditional reinforcement learning makes a little bit of sense, deep reinforcement learning makes absolutely none.
As a reminder, I went to an Ivy League school. Most of my friends and acquaintances would say Iâm smart. And deep reinforcement learning makes me feel stupid. Thereâs just so much terminology involved, that unless youâre getting your PhD in it, you canât possibly understand everything. Thereâs âactor networksâ, âcritic networksâ, âpoliciesâ, âQ-valuesâ, âclipped surrogate objective functionsâ, and other non-sensical terminology that requires a dictionary whenever youâre trying to do anything practical.
Itâs complexity extends beyond difficult-to-understand terminology. Whenever youâre trying to setup RL for any problem more complicated than CartPool, it doesnât work, and you have no idea why.
For example, when I did my project on using RL to predict the stock market, I tried several different architectures. I wonât go into the technical details (check out this article if you want to hear that), but nothing I tried worked. In the literature, you can see that RL suffers from many problems, including being computationally expensive, having stability and convergence issues, and being sample inefficient, which is crazy considering itâs using deep learning, something that is well-known to handle high-dimensional large-scale problems. For my trading project specifically, the thing that affected the final results the most was the initialization seed for the neural network. Thatâs pathetic.
Even a Failure is a Success â (Failing to) Create a Reinforcement Learning Stock Trading Agent
Why are transformers going to replace RL algorithms?
Transformers solve all of the problems with traditional RL algorithms. To start, itâs probably the easiest, most useful, AI algorithm. After you understand the Attention Mechanism, you can start implementing transformers in Google Colab. And the best part is, it actually works.
We all know that transformers are useful when implementing models similar to ChatGPT. What most people donât realize is that it can also be used as a replacement for traditional deep RL algorithms.
One of my favorite papers to ever come out was the Decision Transformer. I loved this paper so much that I emailed the authors of it. The Decision Transformer is a new way to think about reinforcement learning. Instead of doing complicated optimizations using multiple neural networks, sensitive hyper-parameters, and heuristics with no theoretical founding, we instead use an architecture thatâs proven to work for many problems â the transformer.
The Architecture of the âDecision Transformerâ:
We basically want to reframe reinforcement learning as a sequence-modeling problem. We still have states, actions, and rewards like in traditional RL; we just formulate the problem differently. We take our states, our actions, and our rewards, and lay it out in an auto-regressive manner. This leads to a very natural and efficient framework where the transformerâs ability to understand and predict sequences is leveraged to find optimal actions. By framing the problem in this way, the Decision Transformer can efficiently parse through the sequence of states, actions, and rewards, and intuitively anticipate the best course of action. This results in an algorithm that seamlessly uncovers the most effective strategies, elegantly bypassing the complexities and instabilities often encountered in traditional reinforcement learning methods.
For transparency, this is an offline algorithm, which means it canât work in real-time. However, there is additional work being done to enable Decision Transformers to be used in an online-manner. Even the offline version of the algorithm is far better than traditional reinforcement learning. This paper shows that this architecture is more robust, especially in situations with sparse or distracting rewards. Moreover, this architecture is extremely simple, only requires one network, and matches or surpasses the state-of-the-art reinforcement learning baselines.
For more details on the Decision Transformer, either check out the original paper or this extremely helpful video by Yannic Kilcher.
Prompt Engineering: The Definitive Step-By-Step How to Guide
Conclusion
Traditional reinforcement learning sucks. Unless the industry comes out with a new, stable, sample-efficient algorithm that doesnât require a PhD to understand, then I will never change my mind. The decision transformer IS the new RL; itâs just not popularized yet. Iâm looking forward to the day where researchers pick it up and use it for a variety of tasks, including Reinforcement Learning with Human Feedback.
Thank you for reading!
đ¤ Connect with me on LinkedIn
đŚ Follow me on Twitter
đ¨âđť Explore my projects on GitHub
đ¸ Catch me on Instagram
đľ Dive into my TikTok