Review by
Sravan Patchala. Feel free to contact him at sravps7@gmail.com
Review
This course deals with decision making in stochastic systems. In simple terms, it helps design policies that take actions to minimise (or maximise) cost (or profit) over some timeframe. This course does require sufficient knowledge of probability and so it is strongly advisable you do this course after doing a probability course.
The first half of this course is spent in laying the framework of the problem and providing dynamic programming algorithms to solve the decision problems. The second half of the course introduces concepts of policy iteration and value iteration. The course touches upon some concepts of Q-learning at the end. On the whole, the entire course moves at a steady pace, with most of the time spent on proofs of the results.
The course has a couple of quizzes along with a midsem and endsem, which were pretty easy. At the end, there will also be a course project (and presentation), where the idea would be to read and implement related research papers and perhaps suggest improvements on them. This takes quite a lot of time and effort since the papers are not easy to decode.
On the whole, the course is moderately difficult, especially if you have not seen such kind of problems before. If you do take this course, you’ll have quite a lot to learn from the course project if you can devote sufficient time to it. Otherwise, it might be really difficult to get through with a decent grade.