Basic Information

  • Course Code: CS 753
  • Course Name: Automatic Speech Recognition
  • Course Offered In: Spring 2023
  • Instructors: Prof. Preethi Jyothi
  • Prerequisites: Linear Algebra (hard), formal course in machine learning is must (can be done in parallel with professor’s consent). Probability (EE 325) was very helpful in the course. Prior exposure to Markov chains helps too, though not necessary.
  • Difficulty (on a scale of 5): 4

Course Content

Statistical Speech Recognition, WFSTs and WFST algorithms, WFSTs in ASR + Basics of speech production, Hidden Markov Models, Neural Networks, Deep Neural Network(DNN)-based Acoustic Models, Recurrent Neural Network(RNN) Models for ASR, Acoustic Feature Extraction for ASR, Language modeling.

Feedback on Lectures

Lectures were only taken upto the midsem. Everything was taught comprehensively in a way that was easy to understand. The course content is slightly on the difficult side but the slides are also very comprehensive and recordings were provided. However, I would recommend attending the lectures live as you will learn a lot and it is a great course. Ma’am made the class fun and even changed the topic if it was getting monotonous. In one lecture chocolates were thrown out to people who asked questions :)

Post midsem, a scientific seminar was conducted which involved studying research papers, so this course is definitely for those who are interested in getting a taste of research.

Feedback on Evaluations

Midsem(20%) and endsem(30%) were open slides and open notes. They were concept based. There were two team programming assignments(10% each) and the remaining 30 percent was a team seminar, which included going through 5 different research papers, presenting one in a PPT, making a scientific poster for one, modifying the code of one, writing a review on one and writing a scientific article on one.

The seminar really helps you learn how to actually understand research papers and how to convey your research to an audience.

Study Material and Resources

  • Daniel Jurafsky and James H. Martin, “Speech and Language Processing”, 2nd edition, 2008.
  • Mark Gales and Steve Young, The application of hidden Markov models in speech recognition, Foundations and Trends in Signal Processing, 1(3):195-304, 2008.
  • Mehryar Mohri, Fernando Pereira and Michael Riley, Weighted Finite-state Transducers in Speech Recognition , Computer Speech and Language, 16(1):69-88, 2002.
  • Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury, Deep Neural Networks for Acoustic Modeling in Speech Recognition , IEEE Signal Processing Magazine, 29(6):82-97, 2012.

Many more relevant references will be provided along with the lecture slides

Follow-up Courses

No follow up as such, but will open up the NLP and Speech processing domain for you.

Final Takeaway

If you want to do research in AI, or even scientific research in general, this course will really show you how conferences work and you will also learn how to read and understand research papers.