Machine Learning Department and Heinz College of Information Systems and Public Policy
Office: Gates Hillman 8023
Email: rayid at cmu dot edu
I am a Professor in the Machine Learning Department (in the School of Computer Science) and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. I work on developing and using Machine Learning, AI, and Data Science methods for solving high impact social good and public policy problems in a fair and equitable way across criminal justice, education, healthcare, energy, transportation, economic development, workforce development and public safety. See my congressional testimony to the Task Force on AI on ways to reduce AI bias in Financial Services to get an idea of my views in this space. My work includes collaborative projects with government agencies and NGOs, research in areas including explainablity, bias/fairness/equity, and developing and teaching experiential education and training programs.
I also started (at University of Chicago) and run the Data Science for Social Good Summer Fellowship (now at CMU).
Areas of Interest and Experience: Machine Learning/Data Science/Artificial Intelligence, Public Policy, Social Good, Ethics, Fairness, Social Justice and Equity
Recently (and reluctantly) added buzzwords: Big Data, Data Science, Artificial Intelligence
Older buzzwords that are trendy now: Machine Learning
Not so old buzzwords that are not trendy now: Data Mining, Analytics
What I used to do:
- Founder/Director of the Center for Data Science and Public Policy, Research Associate Professor in the Department of Computer Science, and a Senior Fellow at the Harris School of Public Policy at the University of Chicago.
- Chief Scientist at Obama for America 2012 campaign focusing on analytics, technology, and data.
- Senior Research Scientist and Director of Analytics research at Accenture Labs where I led a technology research team focused on applied R&D in analytics, machine learning, and data mining for large-scale & emerging business problems in various industries including healthcare, retail & CPG, manufacturing, intelligence, and financial services.
In my ample free time, I advise several analytics start-ups and non-profits, speak at, organize and participate in academic and industry analytics conferences, and publish in Machine Learning, AI, Data Science, and Public Policy conferences and journals.
My current research is focused on the question “How do we build AI-Human collaborative systems for social and policy problems that can reliably help achieve fair and equitable outcomes?” and consists of three pillars:
1. Building AI systems designed to explicitly collaborate with humans to help them make better decisions (for social and policy problems)
How do we use HCI and design approaches to build these systems? How do we augment the typical “prediction scores” with additional information that helps human users make better decisions? How do we make AI systems interpretable and explainable to users to help them improve their decision-making? How do we elicit and incorporate human feedback?
2. Designing them to support fair and equitable social outcomes.
My work is focused on embedding fairness and equity in the entire process of scoping, formulating, designing, developing, validating, and deploying AI systems. This includes developing methods for eliciting fairness values from different stakeholders, auditing AI systems for bias and fairness, designing them to be fair, and reducing bias.
3. Designing these systems to be reliable, robust and resilient to changing environments and policies
How do we explicitly build these systems to match their use and deployment settings? What is the appropriate model selection and validation methodology to make them robust to changes over time? How do we make them resilient to gaming? How do we create transparency and trust for different stakeholders including those who will be impacted by the system?
The work in my group is across three types of activities:
- Collaborative Applied Projects: with governments, non profits, and industry to solve problems in policy and social good
- Training: students and professionals in governments and non profits in the use of data driven methods (Machine Learning, AI, Data Science) for policy and social good
- Research, Tools, and Methodology Development: to develop new methods and tools that are needed across policy areas with special emphasis on increasing fairness, reducing bias, making machine learning/AI algorithms and models more understandable and transparent.
Our areas of interest span health, criminal justice, education, public safety, workforce development, sustainability, transportation, social services, and economic development and we work closely with government agencies (local, federal, international) and NGOs.
Historically, my research has spanned from general machine learning and data science to privacy preserving data analysis, text analysis, semi-supervised learning, active learning, information retrieval, Natural Language Processing, and knowledge management. Most of my work has focused on developing and using machine learning & data mining approaches to solve large-scale problems in corporate, political, and non-profit areas.
My current interests lie at the intersection of Machine Learning, Public Policy, and Social Sciences. I’m interested in solving large-scale and high impact social problems using data driven and evidence based methods. A lot of government, civic, and non-profit organizations are realizing the value of better data and have been focusing on improving data collection and data standardization. My goal is to build on these efforts, and work with these organizations to use this data to help improve outcomes in a fair and equitable manner. My work involves developing and using machine learning and social science methods that can be operationalized to solve policy and social challenges across health, criminal justice, education, public safety, social services, and economic development.
- Machine Learning methods specifically targeted at the needs of social good and public policy problems.
- Challenges in building, validating, and deploying Machine learning/AI based systems
- Building explainable and interpretable models for use in human in the loop systems.
- Dealing with defining, detecting, reducing, and mitigating bias and increasing fairness to create AI systems that result in equitable outcomes.
Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions @ ACM FAT* 2020
An Experience-Centered Approach to Training Effective Data Scientists in the Big Data Journal
Predictive Analytics for Retention in Care in an Urban HIV Clinic. in Nature Scientific Reports
Projects, Programs, Tools
Data Science for Social Good Summer Fellowship
Full-time summer program to train aspiring data scientists to work on machine learning, data science, and AI projects with social impact in a fair and equitable manner. Working closely with governments and nonprofits, fellows take on real-world problems in education, health, criminal justice, sustainability, public safety, workforce development, human services, transportation, economic development, international development, and more.
An open source machine learning toolkit to help data scientists, machine learning developers, and analysts quickly prototype, build and evaluate end-to-end predictive risk modeling systems for public policy and social good problems.
Aequitas: Bias and Fairness Audit Toolkit
An open source bias audit toolkit for machine learning developers, analysts, and policymakers to audit machine learning models for discrimination and bias, and make informed and equitable decisions around developing and deploying predictive risk-assessment tools.
Early Intervention Systems to Reduce Adverse Police Incidents (with Charlotte Mecklenburg Police Department)
Building a Machine Learning-based Early Intervention System that helps identify officers at risk of adverse incidents — such as sustained complaints, unjustified uses of force, and preventable accidents — in order to match them with appropriate early interventions.
I typically teach “Machine Learning for Public Policy Lab” and Machine Learning in Practice at Carnegie Mellon University designed to provide students training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good. I co-taught “Designing Better Human-AI Futures“, a first-year seminar course for the College of Humanities and Social Sciences in 2022.
I’ve created and taught Machine Learning for Public Policy courses at the University of Chicago from 2015-2019 as well as the Machine Learning & Public Policy Lab in Winter 2017 and a Data Analytics for Campaigns class in Winter 2015.
Co-Created and Teaching (with Julia Lane and Frauke Kreuter) Applied Data Analytics for Policy courses for Government Agencies since 2015.
I frequently do talks, workshops, and trainings for organizations on the following topics:
- Ethics, Fairness, Bias, and Equity in AI and Machine Learning: Hands-on tutorial on Dealing with Bias and Fairness in ML Systems
- Fair and Equitable Decisions-Making using AI
- The Future of Machine Learning/AI/Data Science
- How to Scope Actionable Goal-Driven Data Science Projects
- The Value of Data-Driven Decision-Making for Governments and Non-Profits
Selected Publications (Full List)
Data Science Education and Training
Big Data and Social Science: A Practical Guide to Methods and Tools. Ian Foster, Rayid Ghani, Ron Jarmin and Frauke Kreuter, Julia Lane. Chapman and Hall/CRC Press, 2016. (Second edition 2020)
An Experience-Centered Approach to Training Effective Data Scientists. Kit T Rodolfa, Adolfo De Unanue, Matt Gee, and Rayid Ghani. Big Data Journal. 2019.
Taking our Medicine: Standardizing Data Science Education With Practice at the Core. Kit Rodolfa and Rayid Ghani. Commentary, Harvard Data Science Review, 2021.
Change Through Data: A Data Analytics Training Program for Government Employees. Frauke Kreuter, Rayid Ghani, Julia Lane. Harvard Data Science Review, 1(2). 2019
Machine Learning (Book Chapter). Rayid Ghani and Malte Schierholz. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2016.
Bias, Fairness, and Equity in AI/Machine Learning Systems
Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Kit T. Rodolfa, Hemank Lamba, Rayid Ghani. Nature Machine Intelligence 3, 896–904 (2021).
Bias and Fairness (in Machine Learning) Book Chapter. Kit T. Rodolfa, Pedro Saleiro, Rayid Ghani. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2020
An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings. Hemank Lamba, Kit T. Rodolfa, Rayid Ghani. ACM SIGKDD Explorations, 2021.
Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions. K. Rodolfa; E. Salomon; L. Haynes; I. Mendieta; J. Larson; R. Ghani. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) 2020.
Aequitas: A Bias and Fairness Audit Toolkit. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani.
Explainability and Human-AI Interaction
Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions. Kasun Amarasinghe, Kit Rodolfa, Hemank Lamba, Rayid Ghani. Data and Policy (2023). Cambridge University Press.
Machine Learning Informed Decision-Making with Interpreted Model’s Outputs: A Field Intervention. Leid Zejnilovic, Susana Lavado, Carlos Soares, Íñigo Martínez De Rituerto De Troya, Andrew Bell, Rayid Ghani. Academy of Management Proceedings, 2021.
On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods. Kasun Amarasinghe, Kit T Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani. Arxiv.
Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness.
Artificial Intelligence for Social Good. Gregory D. Hager, Ann Drobnis, Fei Fang, Rayid Ghani, Amy Greenwald, Terah Lyons, David C. Parkes, Jason Schultz, Suchi Saria, Stephen F. Smith, and Milind Tambe. Computing Community Consortium. March 2017.
Case Studies: Applying Machine Learning/Data Science/AI to tackle Social and Policy Problems
Bandit Data-Driven Optimization. Zheyuan Ryan Shi, Zhiwei Steven Wu, Rayid Ghani, Fei Fang. AAAI-22: the 36th AAAI Conference on Artificial Intelligence, 2022.
A recommendation and risk classification system for connecting rough sleepers to essential outreach services. Harrison Wilde, Lucia L. Chen, Austin Nguyen, Zoe Kimpel, Joshua Sidgwick, Adolfo De Unanue, Davide Veronese, Bilal Mateen, Rayid Ghani, and Sebastian Vollmer. Data & Policy 3 (2021).
Validation of a Machine Learning Model to Predict Childhood Lead Poisoning. JAMA Netw Open. 2020;3(9):e2012734. doi:10.1001/jamanetworkopen.2020.12734
Predictive Analytics for Retention in Care in an Urban HIV Clinic. Arthi Ramachandran, Avishek Kumar, Hannes Koenig, Adolfo De Unanue, Christina Sung, Joe Walsh, John Schneider, Rayid Ghani & Jessica P. Ridgway Nature Scientific Reports 10, 6421 (2020). https://doi.org/10.1038/s41598-020-62729-x
Using Machine Learning to Help Vulnerable Tenants in New York City. Teng Ye, Rebecca Johnson, Samantha Fu, Jerica Copeny, Bridgit Donnelly, Alex Freeman, Mirian Lima, Joe Walsh, and Rayid Ghani. Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ’19). ACM, New York, NY, USA, 248-258.
Deploying Machine Learning Models for Public Policy: A Framework. Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Hareem Naveed, Andrea Navarrete Rivera, Sun-Joo Lee, Jason Bennett, Michael Defoe, Crystal Cody, Lauren Haynes and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).
Reducing Incarceration through Prioritized Interventions. Matthew J. Bauman, Kate Boxer, Tzu-Yun Lin, Erika Salomon, Hareem Naveed, Lauren Haynes, Joe Walsh, Jen Helsby, Steve Yoder, Robert Sullivan, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.
Improving Government Response to Citizen Requests Online. Garren Gaut, Andrea Navarette, Laila Wahedi, Paul van der Boor, Adolfo de Unánue, Jorge Díaz, Eduardo Clark, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.
Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks. Avishek Kumar, Syed Ali Asad Rizvi, Benjamin Brooks, Ali Vanderveld, Kevin Hayes Wilson, Chad Kenney, Adria Finch, Andrew Maxwell, Sam Edelstein, Joe Zuckerbraun and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).
Machine Learning for Social Services: A case study of prenatal case management in Illinois. Ian Pan, Laura B. Nolan, Rashida R. Brown, Romana Khan, Paul van der Boor, Daniel G. Harris, Rayid Ghani. American Journal of Public Health, 2017.
Early Intervention Systems – Predicting Adverse Interactions Between Police and the Public. Jennifer Helsby, Samuel Carton, Kenneth Joseph, Ayesha Mahmud, Youngsoo Park, Andrea Navarrete, Klaus Ackermann, Joe Walsh, Lauren Haynes, Crystal Cody, Major Estella Patterson, Rayid Ghani. Criminal Justice Policy Review, 2017.
Building Better Early Intervention Systems. Crystal Cody, Estella Patterson, Kerr Putney, Jennifer Helsby, Joe Walsh, Lauren Haynes, and Rayid Ghani. Police Chief Magazine. International Association of Chiefs of Police. 2016
Detecting fraud, corruption, and collusion in international development contracts. Emily Grace, Ankit Rai, Elissa Redmiles, Rayid Ghani. 2016 IEEE International Conference on Big Data.
The Legislative Influence Detector: Finding Text Reuse in State Legislation. Burgess et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).
Identifying Police Officers at Risk of Adverse Events. Carton et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).
Designing Policy Recommendations to Reduce Home Abandonment in Mexico. Ackerman et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).
Identifying Earmarks in Congressional Bills. Khabsa et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes.
Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani, Kecia Addison. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)
Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning.
Eric Potash, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)
Early Prediction of Code Blue Using Electronic Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)
Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time.
Everaldo Aguiar, Himabindu Lakkaraju, Nasir Bhanpuri, David Miller, Ben Yuhas, Kecia Addison, Shihching Liu, Marilyn Powell, and Rayid Ghani. 5th International Learning Analytics and Knowledge (LAK) Conference 2015.
Early Code Blue Prediction Using Patient Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Workshop on Machine Learning for Clinical Data Analysis and Healthcare – held with NIPS 2013.