Call Number | 13322 |
---|---|
Day & Time Location |
MW 1:10pm-2:25pm To be announced |
Points | 3 |
Grading Mode | Standard |
Approvals Required | None |
Instructor | Shipra Agrawal |
Type | SEMINAR |
Method of Instruction | In-Person |
Course Description | Theory of Markov Decision Processes (MDP) and Dynamic Programming. Design and convergence properties of Reinforcement Learning (RL) algorithms including Q-learning and Policy iteration methods. Function approximation and deep RL algorithms: DQN, policy gradient, actor-critic methods. Exporation-Exploitation and regret bounds in RL. Multi-agent RL. RL with Human Feedback (RLHF). RL and Monte Carlo Tree Search (MCTS) for Agentic Systems. Note: Only one of ORCS E4529 or 6529 may be taken for credit. |
Web Site | Vergil |
Department | Industrial Engineering and Operations Research |
Enrollment | 0 students (50 max) as of 11:06AM Tuesday, October 14, 2025 |
Subject | Op Research - Computer Science |
Number | E6529 |
Section | 001 |
Division | School of Engineering and Applied Science: Graduate |
Section key | 20261ORCS6529E001 |