| Call Number | 13322 | 
|---|---|
| Day & Time Location  | 
    MW 1:10pm-2:25pm To be announced  | 
| Points | 3 | 
| Grading Mode | Standard | 
| Approvals Required | None | 
| Instructor | Shipra Agrawal | 
| Type | SEMINAR | 
| Method of Instruction | In-Person | 
| Course Description | Theory of Markov Decision Processes (MDP) and Dynamic Programming. Design and convergence properties of Reinforcement Learning (RL) algorithms including Q-learning and Policy iteration methods. Function approximation and deep RL algorithms: DQN, policy gradient, actor-critic methods. Exporation-Exploitation and regret bounds in RL. Multi-agent RL. RL with Human Feedback (RLHF). RL and Monte Carlo Tree Search (MCTS) for Agentic Systems. Note: Only one of ORCS E4529 or 6529 may be taken for credit.  | 
| Web Site | Vergil | 
| Department | Industrial Engineering and Operations Research | 
| Enrollment | 0 students (50 max) as of 9:07PM Monday, November 3, 2025 | 
| Subject | Op Research - Computer Science | 
| Number | E6529 | 
| Section | 001 | 
| Division | School of Engineering and Applied Science: Graduate | 
| Section key | 20261ORCS6529E001 |