For some background, I have previously taken AI4R/RAIT, RL, AI, and ML before this course, so I came in with a reasonably strong ML/python background. Plus my full time job involves quite a bit of SQL. I mention this as your perceived difficulty in this course is heavily dependent on your prior experience with the technologies.
The course consists of five homework assignments, a research project, and a final exam, covering a number of technologies: python/pandas, scikit, hadoop, pig, spark, scala, and pytorch.
Homework 1: If you've taken CS7641, this will be a breeze, I finished it in a day. It's basically A1-lite, or a brief dive into supervised learning using pandas and scikit. If you haven't taken 7641, you are being thrown a lot to deal with right away and if your python is weak, you'll struggle. I did not use the provided docker environment for this, just Pycharm and I felt that was easier. This one uses gradescope, so it's real easy to check your work.
Homework 2: Hadoop, lots of hadooping, in many ways it's implementing HW1, but with hadoop/HIVE. Hadoop is not a particularly pleasant language to work in, but useful. To get a headstart do the sun labs.
Homework 3: Scala phenotyping and GMM/K-Means. Scala sucks, there's no real way around it. You really should start this one early and if you can take a scala prep class ("Big Data Analysis with Scala and Spark" on Coursera is good), you'll be thankful for it after.
Homework 4: More Scala! Lots of grapth theory, but... scala sucks. More or less the same tips as HW4, but also the provided unit tests ARE NOT ENOUGH. You should make your own to verify your results with.
Homework 5: More pandas, but now with pytorch. I really enjoyed this one. You cover MLPs, CNNs, RNNs, and do some prediction with them. You don't need a GPU, but it helps. One tip is that there are hidden scores in gradescope when the assignment is finally graded they're testing the models on, and they're much higher than what you can see in gradescope before the due date. Train your model until it's getting pretty good percentages.
Final Project: Get yourself a good team and start early. Also if you take on something that requires training a model, hope someone has a strong GPU or setup a Google Colab. Fortunately I had a 3080, but I was running tests that took 10 hours to run on it still. Follow the requirements given and you'll be fine.
Final Exam: This kind of sucked. It was not at all based on the homeworks, but just on the lectures. So really covering material you have not done any work on, and at times it felt like a vocab quiz. If they want to cover the material on the exam, the homeworks should reflect the material, or they should make the exam based on the homeworks. I think they wanted to make the lectures not totally ignorable, but... I think they failed in that regard that now the lectures are just something you need to watch and memorize everything for the exam.
Things I really liked:
- Covering a number of technologies I've never used before, Hadoop, Scala, PyTorch, it was nice to have a chance to try them
- The Final Project was a lot of fun to try new things and work on a problem without strict guidance, lots of room to explore
Things that could be improved:
- For the most part, The lectures are not really that useful to completing the homeworks/project. The only reason you really need to watch them is for the final and the occasional problem on the homework.
- Scala could probably be replaced now, I think PySpark is generally preferred to scala these days and if the course used it instead, it would both be more useful, and allow for gradescope grading which would make it much easier to grade.
- The final needs to be more relevant to the homeworks, either by adding lecture content to the homeworks, or changing the exam to be based on the homeworks. It sucks that the lectures are not that useful to the class, but... the solution then is to make new lectures that are useful.
Tips to do well:
- As much as you can, go through the sunlabs in advance. They're all public and help out a ton with the homeworks. Figure out the docker env if you haven't used docker before
- Start all the homeworks early, some of them take a long time to figure out
- If you can, have a strong GPU for HW5/final project. Not a requirement, but it helps.
- For the final, the best I can say is rewatch the lectures and take notes. Everything is fair game for the final.
- Your participation is based on piazza's statistics (go to View Statistics on the top of Piazza to see yours). Add Piazza to your list of daily links to check, view all the posts, and post on things and you'll get full credit. If you have a question, someone else does too probably.
I can't speak to what technologies are most useful in healthcare or big data tech, but it felt like this course could use a refresher to bring it up to technologies used more frequently today. Regardless, you will learn plenty in this class and I would say it's a very worthwhile course to anyone who wants to actually implement ML on something more practical than random problems.