Rayid Ghani
Machine Learning Department and Heinz College of Information Systems and Public Policy
Office: Gates Hillman 8023 In front of my laptop, likely on zoom
Email: rayid at cmu dot edu

I am a Professor in the Machine Learning Department (in the School of Computer Science) and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. I work on developing and using Machine Learning, AI, and Data Science methods for solving high impact social good and public policy problems in a fair and equitable way across criminal justice, education, healthcare, energy, transportation, economic development, workforce development and public safety. My work includes collaborative projects with government agencies and NGOs, research in areas including explainablity, bias/fairness/equity, and developing and teaching educational and training programs.

I also started (at University of Chicago) and run the Data Science for Social Good Summer Fellowship (now at CMU).

Areas of Interest and Experience: Machine Learning/Data Science/Artificial Intelligence, Public Policy, Social Good, Ethics, Fairness, Social Justice and Equity

Recently (and reluctantly) added buzzwords: Big Data, Data Science, Artificial Intelligence
Older buzzwords that are trendy now: Machine Learning
Not so old buzzwords that are not trendy now: Data Mining, Analytics

What I used to do:

  • Founder/Director of the Center for Data Science and Public Policy, Research Associate Professor in the Department of Computer Science, and a Senior Fellow at the Harris School of Public Policy at the University of Chicago.
  • Chief Scientist at Obama for America 2012 campaign focusing on analytics, technology, and data.
  • Senior Research Scientist and Director of Analytics research at Accenture Labs where I led a technology research team focused on applied R&D in analytics, machine learning, and data mining for large-scale & emerging business problems in various industries including healthcare, retail & CPG, manufacturing, intelligence, and financial services.

In my ample free time, I advise several analytics start-ups and non-profits, speak at, organize and participate in academic and industry analytics conferences, and publish in Machine Learning, AI, Data Science, and Public Policy conferences and journals.

CV

Recent and Upcoming Events

Panel at TWIML on Explainability and Machine Learning – August 11th

KDD Hands-on tutorial on Dealing with Bias and Fairness in Building Data Science Systems – August 25th

KDD Social Impact Session – August 25

US GAO’s Comptroller General Forum on AI Oversight – September 9th and 10th

Recent Tweets

Research

 

My current research is focused on the question “How do we build AI-Human collaborative systems for social and policy problems that can be trusted to achieve fair and equitable policy outcomes?” and consists of three pillars:

1. Building Human-AI Collaborative Systems (for social and policy problems)

How do we use HCI and design approaches to build these systems? How do we make AI systems interpretable and explainable to users to help them improve their decision-making? How do we elicit and incorporate human feedback?

2. Ensuring that the social outcomes are fair and equitable.

How do we define equity? How do we measure and detect bias? How do we mitigate bias to create fair and equitable outcomes?

3. Ensuring that these systems are robust and resilient

How do we explicitly build these systems to match their use and deployment settings? What is the appropriate model selection and validation methodology to make them robust to changes over time? How do we make them resilient to gaming? How do we create transparency and trust for different stakeholders including those who will be impacted by the system?

 

This work is done through three types of activities:

  1. Collaborative Applied Projects: with governments and non profits to solve problems in policy and social good
  2. Training: students and professionals in governments and non profits in the use of data driven methods (Machine Learning, AI, Data Science) for policy and social good
  3. Research, Tools, and Methodology Development: to develop new methods and tools that are needed across policy areas with special emphasis on increasing fairness, reducing bias, making machine learning/AI algorithms and models more understandable and transparent.

Our areas of interest span health, criminal justice, education, public safety, workforce development, sustainability, transportation, social services, and economic development and we work closely with government agencies (local, federal, international) and NGOs.

Historically, my research has spanned from general machine learning and data science to privacy preserving data analysis, text analysis, semi-supervised learning, active learning, information retrieval, Natural Language Processing, and knowledge management. Most of my work has focused on developing and using machine learning & data mining approaches to solve large-scale problems in corporate, political, and non-profit areas.

My current interests lie at the intersection of Machine Learning, Public Policy, and Social Sciences. I’m interested in solving large-scale and high impact social problems using data driven and evidence based methods. A lot of government, civic, and non-profit organizations are realizing the value of better data and have been focusing on improving data collection and data standardization. My goal is to build on these efforts, and work with these organizations to use this data to help improve outcomes in a fair and equitable manner. My work involves developing and using machine learning and social science methods that can be operationalized to solve policy and social challenges across health, criminal justice, education, public safety, social services, and economic development.

If you’re interested in working with me, I’m currently working on the following types of research problems:
  • Machine Learning methods specifically targeted at the needs of social good and public policy problems.
  • Challenges in building, validating, and deploying Machine learning/AI based systems
  • Building explainable and interpretable models for use in human in the loop systems.
  • Dealing with defining, detecting, reducing, and mitigating bias and increasing fairness to create AI systems that result in equitable outcomes.

Recent Papers

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning in JAMA Network Open

Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions @ ACM FAT* 2020

An Experience-Centered Approach to Training Effective Data Scientists in the Big Data Journal

Predictive Analytics for Retention in Care in an Urban HIV Clinic. in Nature Scientific Reports

Projects, Programs, Tools

2020-03-11T01:03:22+00:00

Data Science for Social Good Summer Fellowship

Full-time summer program to train aspiring data scientists to work on machine learning, data science, and AI projects with social impact in a fair and equitable manner. Working closely with governments and nonprofits, fellows take on real-world problems in education, health, criminal justice, sustainability, public safety, workforce development, human services, transportation, economic development, international development, and more.

2020-03-11T01:03:54+00:00

Triage

An open source machine learning toolkit to help data scientists, machine learning developers, and analysts quickly prototype, build and evaluate end-to-end predictive risk modeling systems for public policy and social good problems.

2020-03-11T01:04:27+00:00

Aequitas: Bias and Fairness Audit Toolkit

An open source bias audit toolkit for machine learning developers, analysts, and policymakers to audit machine learning models for discrimination and bias, and make informed and equitable decisions around developing and deploying predictive risk-assessment tools.

Teaching

I’m currently co-teaching (with Kit Rodolfa) “Machine Learning for Public Policy Lab” at Carnegie Mellon University designed to provide students training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.

Co-Created and Teaching (with Julia Lane and Frauke Kreuter) Applied Data Analytics for Policy courses for Government Agencies since 2015.

 

I frequently do talks, workshops, and trainings for organizations on the following topics:

  • Ethics, Fairness, Bias, and Equity in AI and Machine Learning
  • Fair and Equitable Decisions-Making using AI
  • The Future of Machine Learning/AI/Data Science
  • How to Scope Actionable Goal-Driven Data Science Projects
  • The Value of Data-Driven Decision-Making for Governments and Non-Profits

 

Selected Publications (Full List)

Data Science Education and Training

Big Data and Social Science: A Practical Guide to Methods and Tools. Ian Foster, Rayid Ghani, Ron Jarmin and Frauke Kreuter, Julia Lane. Chapman and Hall/CRC Press, 2016. (Second edition coming this year)

An Experience-Centered Approach to Training Effective Data Scientists. Kit T Rodolfa, Adolfo De Unanue, Matt Gee, and Rayid Ghani. Big Data Journal. 2019.

Change Through Data: A Data Analytics Training Program for Government Employees. Frauke Kreuter, Rayid Ghani, Julia Lane. Harvard Data Science Review, 1(2). 2019

Machine Learning (Book Chapter). Rayid Ghani and Malte Schierholz. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2016.

Bias, Fairness, and Equity in AI, Machine Learning Systems

Bias and Fairness (in Machine Learning) Book Chapter. Kit T. Rodolfa, Pedro Saleiro, Rayid Ghani. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2020

Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions. K. Rodolfa; E. Salomon; L. Haynes; I. Mendieta; J. Larson; R. Ghani. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) 2020.

Aequitas: A Bias and Fairness Audit Toolkit. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani.

Overview Papers

Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. Vollmer SebastianMateen Bilal ABohner GergoKirály Franz JGhani RayidJonsson Pall et al.

Artificial Intelligence for Social Good.
Gregory D. Hager, Ann Drobnis, Fei Fang, Rayid Ghani, Amy Greenwald, Terah Lyons, David C. Parkes, Jason Schultz, Suchi Saria, Stephen F. Smith, and Milind Tambe. Computing Community Consortium. March 2017.

Case Studies: Applying Machine Learning/Data Science/AI to tackle Social and Policy Problems

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning. Eric Potash, Rayid Ghani, Joe Walsh, Emile Jorgensen, Cortland Lohff, Nik Prachand,  Raed Mansour. JAMA Netw Open. 2020;3(9):e2012734. doi:10.1001/jamanetworkopen.2020.12734

Predictive Analytics for Retention in Care in an Urban HIV Clinic. Arthi RamachandranAvishek KumarHannes KoenigAdolfo De UnanueChristina SungJoe WalshJohn Schneider, Rayid Ghani & Jessica P. Ridgway  Nature Scientific Reports 10, 6421 (2020). https://doi.org/10.1038/s41598-020-62729-x

Using Machine Learning to Help Vulnerable Tenants in New York City. Teng Ye, Rebecca Johnson, Samantha Fu, Jerica Copeny, Bridgit Donnelly, Alex Freeman, Mirian Lima, Joe Walsh, and Rayid Ghani. Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ’19). ACM, New York, NY, USA, 248-258.

Deploying Machine Learning Models for Public Policy: A Framework. Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Hareem Naveed, Andrea Navarrete Rivera, Sun-Joo Lee, Jason Bennett, Michael Defoe, Crystal Cody, Lauren Haynes and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).

Reducing Incarceration through Prioritized Interventions. Matthew J. Bauman, Kate Boxer, Tzu-Yun Lin, Erika Salomon, Hareem Naveed, Lauren Haynes, Joe Walsh, Jen Helsby, Steve Yoder, Robert Sullivan, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.

Improving Government Response to Citizen Requests Online. Garren Gaut, Andrea Navarette, Laila Wahedi, Paul van der Boor, Adolfo de Unánue, Jorge Díaz, Eduardo Clark, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.

Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks. Avishek Kumar, Syed Ali Asad Rizvi, Benjamin Brooks, Ali Vanderveld, Kevin Hayes Wilson, Chad Kenney, Adria Finch, Andrew Maxwell, Sam Edelstein, Joe Zuckerbraun and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).

Machine Learning for Social Services: A case study of prenatal case management in Illinois. Ian Pan, Laura B. Nolan, Rashida R. Brown, Romana Khan, Paul van der Boor, Daniel G. Harris, Rayid Ghani. American Journal of Public Health, 2017.

Early Intervention Systems – Predicting Adverse Interactions Between Police and the Public. Jennifer Helsby, Samuel Carton, Kenn­­eth Joseph, Ayesha Mahmud, Youngsoo Park, Andrea Navarrete, Klaus Ackermann, Joe Walsh, Lauren Haynes, Crystal Cody, Major Estella Patterson, Rayid Ghani. Criminal Justice Policy Review, 2017.

Building Better Early Intervention Systems. Crystal Cody, Estella Patterson, Kerr Putney, Jennifer Helsby, Joe Walsh, Lauren Haynes, and Rayid Ghani. Police Chief Magazine. International Association of Chiefs of Police. 2016

Detecting fraud, corruption, and collusion in international development contracts. Emily Grace, Ankit Rai, Elissa Redmiles, Rayid Ghani. 2016 IEEE International Conference on Big Data.

The Legislative Influence Detector: Finding Text Reuse in State Legislation. Burgess et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Identifying Police Officers at Risk of Adverse Events. Carton et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Designing Policy Recommendations to Reduce Home Abandonment in Mexico. Ackerman et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Identifying Earmarks in Congressional Bills. Khabsa et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes.
Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani, Kecia Addison. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning.
Eric Potash, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Early Prediction of Code Blue Using Electronic Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time.
Everaldo Aguiar, Himabindu Lakkaraju, Nasir Bhanpuri, David Miller, Ben Yuhas, Kecia Addison, Shihching Liu, Marilyn Powell, and Rayid Ghani. 5th International Learning Analytics and Knowledge (LAK) Conference 2015.

Early Code Blue Prediction Using Patient Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Workshop on Machine Learning for Clinical Data Analysis and Healthcare – held with NIPS 2013.