Reinforcement Learning Review
Reinforcement Learning (RL) is a fascinating class, but I have mixed feelings now that the course has concluded. On the plus side, RL feels like it just might be the next "big thing." The field is a fascinating fusion of classical computer science, statistics, cognitive and neuroscience, and mathematics. Before taking this class and having taken the AI, ML4T, and ML courses, this class felt like what AI should be, or at least, what I expected AI and machine learning to be like; that is, an agent that explores an environment, acquiring feedback and evaluating its progress towards some goal at each step along the way. The agent-centric aspect of RL distinguishes it from other areas of AI and ML, and it is a novel way to think about problems in this space.
On the downside, RL is still very theoretical and has yet to gain traction in industry, as it is not applicable to most commercial use cases. The domains where RL has been successful have been confined mainly to robotics and gaming - although I have heard that large language models like chatGPT used specific RL techniques. RL is an active area of research, and I hope it will find broader appeal as success in new domains and use cases accrue. RL is definitely something to keep your eye on if you are in the Machine Learning or Interactive Intelligence specializations.
There are three projects (45%), six homeworks (30%), and a final exam (25%) that make up the final grade. There are course lectures, office hours, and a lot of readings. This class is demanding, and the projects require a lot of time. In terms of effort and workload, it is up there with ML and AI, and I easily spent at least 40-60 hours on each project. Preparing for the final was chaotic, and I was "burned out" after P3. The instructors included a series of game theory lectures at the end. Including this material was a little puzzling, at least to me, since none of the homework or assignments involved game theory. However, conceptually there is an overlap between game theory, a la John Nash's "a beautiful mind" equations, and multi-agent RL.
I finished with a final grade of 74%, which was a B. I did below average on the first project and the exam and was at or above the median for the rest. With a little effort, you can get 100s on all HWs - the first one is the hardest, and they get progressively easier and less time-consuming. Understand how to write a good report for the projects - results are less important than showcasing your understanding. For the final, don't overthink it - know the big picture. The average/mean was around 46-48% (I scored a 42%), but it is easy to go down the wrong path in studying since there is so much material. I certainly did.
Lectures and Class Material
The lectures for this class could be better. I did not enjoy them and stopped watching them halfway thru. Some are good, while other modules were just confusing. The focus was mainly on mathematical proofs and lacked a coherent narrative. I don't think the course modules were helpful for the final - they went into too much detail.
I watched most of the David Silver lectures, which were terrific. The David Silver lectures were coherent and well-paced, each one building on the previous one in a coordinated fashion. The course lectures were less connected and felt arbitrarily stitched together in comparison. Often it was unclear why a proof or equation was being introduced or how it fit into a larger RL context. I often felt "lost in the details." Combing both the Silver lectures and the course lectures is likely the best approach since the Silver lectures provide a layout while using the course lectures to fill in the mathematical details whenever needed - but who has the time to watch so many videos?
Many of the readings were excellent. After you bang your head for a while on an academic paper, it does eventually begin to make sense. I learned quite a bit from the papers and the book. I didn't read everything, and some readings were harder to grasp or seemed to fringe, but the course designers did an excellent job selecting the essential readings in the field. The DQN journal articles and the Sutton paper on TD learning are foundational readings. The Barto and Sutton text was also quite good, at least the early chapters (i.e., 1 thru 7) I had read.
The TAs for the class were talented and generally impressive, particularly the lead instructors. They were very positive and supportive, but your YMMV for the office hours. The TAs preferred facilitating class discussions, letting things flow organically, instead of a more structured guided office hour. Often, I felt they were reluctant to provide solid answers or advice, instead encouraging students to give it their best shot and not be afraid to take some liberties or explore the topic independently.
Ed discussions were good - lots of class participation but not too much direct involvement from the TAs. Again, they preferred to stay in the background and referee more than actively participate. But they did help direct and clarify topics when necessary.
Overall a great class and an important and emerging area for those interested in AI. It gets a bit theoretical at times, but the projects were fantastic and provided the necessary hands-on RL experience I was looking for. If you want to prepare for the class, study a bit of dynamic programming and read up on and understand the grid-world MDP problem using value and policy iteration. Learning the basics of PyTorch will come in handy. Understand gradient functions (or read Sutton's Temporal Difference Learning paper). Also, I would learn how to spin up and configure a virtual machine using one of the major cloud providers - that may come in handy for P3 if you don't own a souped-up home machine. Finally, if you have a lot of spare time, watch David Silver's lectures, read his DQN paper, and the first 5 or 6 chapters of Sutton.
Good luck.