CS-6291 Embedded System Optimization
(see also: CS-8803 - Embedded Software)
Course
I enjoyed the course because I like the material, and I knew what to expect, but some may find the material formidable.
I have completed several OMSCS courses, and can review the course in context with other OMSCS courses. I have software development experience, so I was prepared to take this course. The course introduced new concepts, and reviewed compiler concepts.
Embedded systems are pervasive and their role can only expand. The course compares CISC and RISC, examines chip complexity for general purpose CPUs versus chip complexity for embedded processors. RISC and especially ARM are the primary focus for most lectures. Issues such as pipeline, register count, datapath, memory and register banks, instruction latency and code size are studied. The volume of material covered is vast.
The course is presented as three distinct parts. Part one, "Embedded Systems" places specific focus on VLIW and VEX and increased parallelism. Topics include Computer Architecture, VLIW, ISA, Datapaths, Registers and Memory, and Branches. Part two, "Compilers for Embedded Systems", presents an overview of compilers, then examines Control Flow Analysis and Liveness Analysis. This part deals heavily with graph theory, and includes register allocation and liveness analysis (fairly complex topic).
Part three, "Compiler Optimizations", is a deeper dive into Register Allocation, Optimizations for Code Size, Differential Register Allocation, Storage Assignment, and Parallelized Load&Store, optimizations which cooperate to improve performance, reduce code size, or decrease power consumption. Innovations expand addressable registers, improve data througput and instruction parallelism through banked memory and registers, DSP and NP specifics, et al. Network Processor optimizations are also considered.
Lectures & Readings
The lectures require substantial time, but are worth watching. Dr. Pande does a good job explaining the material in the lectures. The lectures are dense, and you will spend substantial time each week keeping up with the lectures. Lecture content is back-loaded, and once you complete the first project, the majority of the lectures (and readings) lie ahead. Plan to accelerate your viewing and reading midway through the course.
The papers (there are a large number of papers), the VLIW and VEX documentation, and the textbook readings provide an enormous volume of material to cover. Be prepared to read, a lot. The volume of readings was overwhelming, nearly abusive.
Projects
There are two substantial projects. Both projects are challenging and require that the student demonstrate understanding of the relevant embedded concepts. The first project took about 50-60 hours (average 60), the second about 30-40 (average 40). Students were given 4-6 weeks for each project.
The first project was a group project (3 students) to 'schedule' instructions on a VLIW (parallel) machine. The project required parsing the assembler for a substantial subset of the VEX instruction set, rescheduling instructions to preserve semantic behavior, reducing critical path length and thus improve runtime performance. The project required C++, parsing, and graph processing.
The second project was an individual project where students determine how to improve performance or code size for programs built to run on a Raspberry Pi. This project requires that the student profile code, determine hot functions, choose heuristics, and compile functions for both ARM and THUMB instruction sets based upon heuristics and measurements.
Students needed a Raspberry Pi 3+, needed to cross-compile for ARM, transfer programs to the Pi, run experiments, and produce a substantial report.
Homeworks
Lest you find time to sleep, there are four homework assignments to consume your weekends. Each homework consists of multiple parts and demands extensive work and precise attention to detail. The homeworks compel one to deep dive into algorithms presented in lectures and papers. Keep current with lectures and readings and the homeworks will be less painful. One homework required coding, the others merely benefited by writing optional code (write the scripts and automate the manual work, you will be glad you did). Expect to spend 10-20 or more hours on each homework.
Final Exams
The final exam was cumulative. Since the amount of material is enormous, you need to learn the material as you progress through the course. The exam is fair, and tests important concepts explored by projects and homeworks. If you understand the projects and homeworks, the exam holds few surprises. However, the exam is 33% of the course grade, closed book, proctored, and over 3 hours long, so it could be very stressful.
Overall
- The amount of work required was enormous.
- Grading on the first two homeworks was quite brutal.
- The course requires tons of reading and tons of work.
- Over half the students who started the course dropped.
- The high drop rate eviscerated project groups, and forced reorganization.
- Concepts were not hard to learn, well supported by lectures and readings.
- Piazza became calmer and the remaining students were very helpful.
- Office hours were excellent; TAs and Dr. Pande very involved and helpful.
- Grading was glacially slow. One cannot blame the TAs as the assignments are extremely detailed. Some revisions to organization could enable more automated grading.
The various optimizations can appear chaotic and unrelated, but when you consider them as a variety of approaches to improving various challenges in embedded systems architectures, the course is fairly logical and well organized.
Nomenclature aside the homeworks require more time than some projects in other courses. The course has essentially 6 projects. There are some rough patches to the projects and homeworks, so ask questions on Piazza. The TAs were above average, but they do not hold your hand.
Only one homework was graded by drop date (less than 8% of grade), the third and fourth homeworks were not graded until the middle of final exam week. Grading for the first project was not completed until after the final exam, and the second project and final (50% of grade) are not yet completed.
Class averages were low:
- homework: 77%, 85%, 80%, 84%
- project: 90%, ?
- final: ?
Update:
The first project is now an individual project, and there are no team projects.
Much of the grading has been automated, so the grades are returned much quicker.