CAPP 30524 – Machine Learning for Public Policy

Spring 2016
Tuesday-Thursday 10:30-11:50
Ryerson 276

Contact Information:

Rayid Ghani
Office: Searle 219 (5735 S Ellis)
Office Hours: Tuesday and Thursdays 12-1pm (or by appointment)
Email: rayid [at] uchicago [dot] edu

TA: Gustav Larsson
Email: larsson [at] cs [dot] uchicago [dot] edu

Course Description:

This course will be an introduction to machine learning and how it can be applied to public policy problems. It’s designed for students who are interested in learning how to use modern, scalable, computational data analysis methods and tools for social impact and policy problems.

This course will teach students about:

  1. What role Machine Learning can play in designing, implementing, evaluating, and improving Public Policy
  2. Machine Learning methods and tools.
  3. How to solve policy problems using machine learning methods and tools

This is a hands-on course where students will be expected to use Python (as well as other computational tools) to implement solutions to various policy problems. We will cover supervised and unsupervised learning algorithms and will learn how to use them with data from a variety of public policy problems in areas such as education, public health, sustainability, economic development, and public safety. There will be a project that students will do in teams of 3-4.


  • Two courses in Computer Programming (Python experience required),
  • Two courses in Probability & Statistics.
  • Prior experience with data analysis is highly recommended (using SQL, R, Python)


  • Machine Learning Process
    1. Understand Problem
    2. Map to Machine Learning formulation
    3. Understand the Data
    4. Data Prep and Initial Analysis
    5. Feature Development
    6. Modeling
    7. Evaluation
    8. Deployment
  • Machine Learning Methods
    1. Unsupervised
      1. Clustering
      2. PCA
    2. Supervised
      1. Regression
      2. KNN
      3. Trees
      4. NN
      5. SVM
      6. Random Forests
      7. Ensemble Methods
    3. Semi-Supervised (Not covered in this class)
  • Applying these methods to Policy Problems


The following lectures are a work in progress. The schedule is subject to change based on class interest and progress. In addition, we will have guest lectures which will cause some of these lectures to be merged. If there are additional topics you’d like to cover or guest lectures you’d like to see, please let me know.

If you’re trying to reuse the slides below, most of the lecture is done on the white/chalk board and the ppt presentations are not very detailed.

  1. Course Overview and Introduction: Goals, Expectations, Structure [slides]
  2. Case Studies: Machine Learning used in Public Policy problems
  3. Machine Learning Process and Workflow/Pipeline Overview
  4. Project Proposal Presentations
  5. Machine Learning Methods: Unsupervised Learning
  6. Machine Learning Methods: Supervised Learning I
  7. Machine Learning Methods: Supervised Learning II
  8. Machine Learning Methods: Supervised Learning III
  9. Feature Development/Engineering
  10. Evaluation Methodology I – Offline Evaluation
  11. Evaluation Methodology II – Experiments
  12. Mapping Policy Problems to Machine Learning Problems
  13. Open Source and Commercial Machine Learning Tools Overview
  14. Putting it all together: Case Study I
  15. Text Analysis
  16. Network Analysis
  17. Putting it all together: Case Study II
  18. Advanced Topics: Reinforcement Learning, Active Learning
  19. Ethics, Privacy, Transparency
  20. Recap and Class Discussion


  • Regular assignments
    • Assignment 1
    • Assignment 2
    • Assignment 3
    • Assignment 4
  • Short response to previous week’s lectures due Tuesday before class

Project: Students will form groups (3-4 students each) and work on a project they’ll propose after week 2

Grading Policy:

  • Assignments: 25%
  • Weekly Class Reviews: 10%
  • Class Participation: 15%
  • Project: 40%
    • Proposal writeup and presentation: 10%
    • Progress Report: 5%
    • Final Report: 15%
    • Final Presentation: 10%
  • Mid-Term (extended take-home assignment): 10%

Resources and Readings: