I am a Professor in the Machine Learning Department (in the School of Computer Science) and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University and lead the Data Science and Public Policy Group as well as the Data Science for Social Good Program. I’m also the co-lead of the Responsible AI Initiative at CMU. I work on developing and using Machine Learning, AI, and Data Science methods for solving high impact social good and public policy problems with a deliberate focus on fairness and equity across human services, public health and healthcare, criminal justice, education, energy, transportation, economic development, workforce development, and public safety.

My work includes collaborative projects with government agencies and NGOs, research in areas including explainability, bias/fairness/equity, and developing and teaching experiential education and training programs.

I also started (at  the University of Chicago) and run the Data Science for Social Good Summer Fellowship (now running at CMU).

AI Governance & Regulation: See my testimony to the United States Senate Committee on Homeland Security and Governmental Affairs Hearing on “Governing AI Through Acquisition and Procurement” and my congressional testimony to the Task Force on AI on ways to reduce AI bias in Financial Services to get an idea of my views in this space.

Areas of Interest and Experience: Machine Learning/Data Science/Artificial Intelligence, Public Policy, Social Good, Ethics, Fairness, Social Justice and Equity

Policy/Societal Areas: Human Services, Public Health and Healthcare, Criminal Justice, Housing, Economic Development, Workforce Development, Transportation, Environment and Sustainability

Recently (and reluctantly) added buzzwords: Artificial Intelligence, Data Science, and Big Data (not so recently)
Older buzzwords that are trendy now: Machine Learning
Not so old buzzwords that are not trendy now: Data Mining, Analytics

What I used to do:

  • Founder/Director of the Center for Data Science and Public Policy, Research Associate Professor in the Department of Computer Science, and a Senior Fellow at the Harris School of Public Policy at the University of Chicago.
  • Chief Scientist at Obama for America 2012 campaign focusing on analytics, technology, and data.
  • Senior Research Scientist and Director of Analytics research at Accenture Labs where I led a technology research team focused on applied R&D in analytics, machine learning, and data mining for large-scale & emerging business problems in various industries including healthcare, retail & CPG, manufacturing, intelligence, and financial services.

In my ample free time, I advise several analytics start-ups and non-profits, speak at, organize, and participate in academic and industry analytics conferences, and publish in Machine Learning, AI, Data Science, and Public Policy conferences and journals.

Recent Talks and Events

The Impact, Implications, and Opportunities of AI. Panel at NASPAA (Network of Schools of Public Policy, Affairs, and Administration)   – October 2024
AI and Disinformation –
Voter Protection Corps Webinar – September 2024
Brookings AI Policy Idea Incubator Convening
– Panel on US Competitiveness and AI – May 2024
National Academies Panel on Navigating the AI Landscape: Strategies for State and Local Leaders – April 2024
SXSW Panel on Can AI Solve the Extreme Weather Pandemic? – March 2024

Research

I broadly work on developing and using Machine Learning, Data Science, and AI  to help tackle social and policy problems as well as on Responsible AI efforts for governance and regulation. My current research is focused on the question “How do we build AI-Human collaborative systems for social and policy problems that can reliably help achieve fair and equitable outcomes?” and consists of three pillars:

1. Building AI systems designed to explicitly collaborate with humans to help them make better decisions (for social and policy problems)

How do we use HCI and design approaches to build these systems? How do we augment the typical “prediction scores” with additional information that helps human users make better decisions? How do we make AI systems interpretable and explainable to users to help them improve their decision-making? How do we elicit and incorporate human feedback?

2. Designing them to deliberately support fair and equitable social outcomes.

My work is focused on embedding fairness and equity in the entire process of scoping, formulating, designing, developing, validating, and deploying AI systems. This includes developing methods for eliciting fairness values from different stakeholders, auditing AI systems for bias and fairness, designing them to be fair, and reducing bias. 

3. Designing these systems to be reliable, robust and resilient to changing environments and policies

How do we explicitly build these systems to match their use and deployment settings? What is the appropriate model selection and validation methodology to make them robust to changes over time? How do we make them resilient to gaming? How do we create transparency and trust for different stakeholders including those who will be impacted by the system?

The work in my group is across three types of activities:

  1. Collaborative Applied Projects: with governments, non profits, and industry to solve problems in policy and social good
  2. Training: students and professionals in governments and non profits in the use of data driven methods (Machine Learning, AI, Data Science) for policy and social good
  3. Research, Tools, and Methodology Development: to develop new methods and tools that are needed across policy areas with special emphasis on increasing fairness, reducing bias, making machine learning/AI algorithms and models more understandable and transparent.

Our areas of interest span health, criminal justice, education, public safety, workforce development, sustainability, transportation, social services, and economic development and we work closely with government agencies (local, federal, international) and NGOs.

Historically, my research has spanned from general machine learning and data science to privacy preserving data analysis, text analysis, semi-supervised learning, active learning, information retrieval, Natural Language Processing, and knowledge management. Most of my work has focused on developing and using machine learning & data mining approaches to solve large-scale problems in corporate, political, and non-profit areas.

My current interests lie at the intersection of Machine Learning, Public Policy, and Social Sciences. I’m interested in solving large-scale and high impact social problems using data driven and evidence based methods. A lot of government, civic, and non-profit organizations are realizing the value of better data and have been focusing on improving data collection and data standardization. My goal is to build on these efforts, and work with these organizations to use this data to help improve outcomes in a fair and equitable manner. My work involves developing and using machine learning and social science methods that can be operationalized to solve policy and social challenges across health, criminal justice, education, public safety, social services, and economic development.

If you’re interested in working with me, I’m currently working on the following types of research problems:
  • Machine Learning methods specifically targeted at the needs of social good and public policy problems.
  • Challenges in building, validating, and deploying Machine learning/AI based systems
  • Building explainable and interpretable models for use in human in the loop systems.
  • Dealing with defining, detecting, reducing, and mitigating bias and increasing fairness to create AI systems that result in equitable outcomes.

Recent Publications

Aequitas Flow: Streamlining Fair ML Experimentation  in Journal of Machine Learning Research 2024

Beyond Implicit Bias – from a National Academies of Sciences, Engineering, and Medicine Workshop on the Science of Implicit Bias

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods in AAAI 2024

Preventing Eviction-Caused Homelessness through ML-Informed Distribution of Rental Assistance in AAAI 2024

Explainable machine learning for public policy: Use cases, gaps, and research directions in Data & Policy

Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy in Nature Machine Intelligence

Projects, Programs, Tools

2020-03-11T01:03:22+00:00

Data Science for Social Good Summer Fellowship

Full-time summer program to train aspiring data scientists to work on machine learning, data science, and AI projects with social impact in a fair and equitable manner. Working closely with governments and nonprofits, fellows take on real-world problems in education, health, criminal justice, sustainability, public safety, workforce development, human services, transportation, economic development, international development, and more.

2020-03-11T01:03:54+00:00

Triage

An open source machine learning toolkit to help data scientists, machine learning developers, and analysts quickly prototype, build and evaluate end-to-end predictive risk modeling systems for public policy and social good problems.

2020-03-11T01:04:27+00:00

Aequitas: Bias and Fairness Audit Toolkit

An open source bias audit toolkit for machine learning developers, analysts, and policymakers to audit machine learning models for discrimination and bias, and make informed and equitable decisions around developing and deploying predictive risk-assessment tools.

2023-06-23T13:41:36+00:00

Supporting Proactive Behavioral Health Outreach Programs to Improve Mental and Behavioral Health Outcomes

In partnership with Johnson County, Kansas (JoCo) and Douglas County, Kansas (DoCo), our goal is to support caseworkers address mental health needs in their respective counties.  JoCo and DoCo caseworkers and service providers are critical to improving overall community well-being and preventing these deaths of despair. However, identifying people in need of assistance is hampered [...]

Teaching

I typically teach “Machine Learning for Public Policy Lab” and Machine Learning in Practice at Carnegie Mellon University designed to provide students training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good. I co-taught “Designing Better Human-AI Futures“, a first-year seminar course for the College of Humanities and Social Sciences in 2022.

Co-Created and Teaching (with Julia Lane and Frauke Kreuter) Applied Data Analytics for Policy courses for Government Agencies since 2015.

 

I frequently do talks, workshops, and trainings for organizations on the following topics:

  • Ethics, Fairness, Bias, and Equity in AI and Machine Learning: Hands-on tutorial on Dealing with Bias and Fairness in ML Systems  
  • Fair and Equitable Decisions-Making using AI
  • The Future of Machine Learning/AI/Data Science
  • How to Scope Actionable Goal-Driven Data Science Projects
  • The Value of Data-Driven Decision-Making for Governments and Non-Profits

Selected Publications (Full List)

Congressional and Senate Testimonies

United States Senate Committee on Homeland Security and Governmental Affairs Hearing on “Governing AI Through Acquisition and Procurement” 

Congressional testimony to the Task Force on AI on ways to reduce AI bias in Financial Services 

Data Science Education and Training

Big Data and Social Science: A Practical Guide to Methods and Tools. Ian Foster, Rayid Ghani, Ron Jarmin and Frauke Kreuter, Julia Lane. Chapman and Hall/CRC Press, 2016. (Second edition 2020)

An Experience-Centered Approach to Training Effective Data Scientists. Kit T Rodolfa, Adolfo De Unanue, Matt Gee, and Rayid Ghani. Big Data Journal. 2019.

Taking our Medicine: Standardizing Data Science Education With Practice at the Core. Kit Rodolfa and Rayid Ghani. Commentary, Harvard Data Science Review, 2021.

Change Through Data: A Data Analytics Training Program for Government Employees. Frauke Kreuter, Rayid Ghani, Julia Lane. Harvard Data Science Review, 1(2). 2019

Machine Learning (Book Chapter). Rayid Ghani and Malte Schierholz. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2016.

Bias, Fairness, and Equity in AI/Machine Learning Systems

Aequitas Flow: Streamlining Fair ML Experimentation. S Jesus, P Saleiro, BM Jorge, RP Ribeiro, J Gama, P Bizarro, R Ghani. Journal of Machine Learning Research.  25(354):1−7, 2024.

Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools. Emily Black, Rakshit Naidu, Rayid Ghani, Kit Rodolfa, Daniel Ho, Hoda Heidari. EAAMO ’23: Equity and Access in Algorithms, Mechanisms, and Optimization, Boston, MA, USA, October 2023,

Systematic analysis of the impact of label noise correction on ML Fairness. IO Silva, C Soares, I Sousa, R Ghani – Australasian Joint Conference on Artificial Intelligence, 2023

Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Kit T. Rodolfa, Hemank Lamba, Rayid Ghani. Nature Machine Intelligence 3, 896–904 (2021).

An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy SettingsHemank LambaKit T. Rodolfa, Rayid Ghani. ACM SIGKDD Explorations, 2021.

Bias and Fairness (in Machine Learning) Book Chapter. Kit T. Rodolfa, Pedro Saleiro, Rayid Ghani. In Big Data and Social Science: A Practical Guide to Methods and Tools. Chapman and Hall/CRC Press, 2020

Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions. K. Rodolfa; E. Salomon; L. Haynes; I. Mendieta; J. Larson; R. Ghani. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) 2020.

Aequitas: A Bias and Fairness Audit Toolkit. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani.

“Explainability” and Human-AI Interaction

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods. Kasun Amarasinghe, Kit T Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani. AAAI-24: the 36th AAAI Conference on Artificial Intelligence, 2024.

Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions. Kasun Amarasinghe, Kit Rodolfa, Hemank Lamba, Rayid Ghani. Data and Policy (2023). Cambridge University Press.

Machine Learning Informed Decision-Making with Interpreted Model’s Outputs: A Field Intervention. Leid Zejnilovic, Susana Lavado, Carlos Soares, Íñigo Martínez De Rituerto De Troya, Andrew Bell, Rayid Ghani. Academy of Management Proceedings, 2021.

Overview Papers

Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. Vollmer SebastianMateen Bilal ABohner GergoKirály Franz JGhani RayidJonsson Pall et al.

Artificial Intelligence for Social Good.
Gregory D. Hager, Ann Drobnis, Fei Fang, Rayid Ghani, Amy Greenwald, Terah Lyons, David C. Parkes, Jason Schultz, Suchi Saria, Stephen F. Smith, and Milind Tambe. Computing Community Consortium. March 2017.

Case Studies: Applying Machine Learning/Data Science/AI to Tackle Social and Policy Problems

Preventing Eviction-Caused Homelessness through ML-Informed Distribution of Rental Assistance. Catalina Vajiac; Arun Frey; Joachim Baumann; Abigail Smith; Kasun Amarasinghe; Alice Lai; Kit T. Rodolfa; Rayid Ghani. AAAI-24: the 36th AAAI Conference on Artificial Intelligence, 2024.

Bandit Data-Driven Optimization. Zheyuan Ryan Shi, Zhiwei Steven Wu, Rayid Ghani, Fei Fang. AAAI-22: the 36th AAAI Conference on Artificial Intelligence, 2022.

A recommendation and risk classification system for connecting rough sleepers to essential outreach services. Harrison Wilde, Lucia L. Chen, Austin Nguyen, Zoe Kimpel, Joshua Sidgwick, Adolfo De Unanue, Davide Veronese, Bilal Mateen, Rayid Ghani, and Sebastian Vollmer. Data & Policy 3 (2021).

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning. Eric Potash, Rayid Ghani, Joe Walsh, Emile Jorgensen, Cortland Lohff, Nik Prachand,  Raed Mansour. JAMA Netw Open. 2020;3(9):e2012734. doi:10.1001/jamanetworkopen.2020.12734

Predictive Analytics for Retention in Care in an Urban HIV Clinic. Arthi RamachandranAvishek KumarHannes KoenigAdolfo De UnanueChristina SungJoe WalshJohn Schneider, Rayid Ghani & Jessica P. Ridgway  Nature Scientific Reports 10, 6421 (2020). https://doi.org/10.1038/s41598-020-62729-x

Using Machine Learning to Help Vulnerable Tenants in New York City. Teng Ye, Rebecca Johnson, Samantha Fu, Jerica Copeny, Bridgit Donnelly, Alex Freeman, Mirian Lima, Joe Walsh, and Rayid Ghani. Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ’19). ACM, New York, NY, USA, 248-258.

Deploying Machine Learning Models for Public Policy: A Framework. Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Hareem Naveed, Andrea Navarrete Rivera, Sun-Joo Lee, Jason Bennett, Michael Defoe, Crystal Cody, Lauren Haynes and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).

Reducing Incarceration through Prioritized Interventions. Matthew J. Bauman, Kate Boxer, Tzu-Yun Lin, Erika Salomon, Hareem Naveed, Lauren Haynes, Joe Walsh, Jen Helsby, Steve Yoder, Robert Sullivan, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.

Improving Government Response to Citizen Requests Online. Garren Gaut, Andrea Navarette, Laila Wahedi, Paul van der Boor, Adolfo de Unánue, Jorge Díaz, Eduardo Clark, Rayid Ghani. ACM SIGCAS Conference on Computing and Sustainable Societies, 2018.

Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks. Avishek Kumar, Syed Ali Asad Rizvi, Benjamin Brooks, Ali Vanderveld, Kevin Hayes Wilson, Chad Kenney, Adria Finch, Andrew Maxwell, Sam Edelstein, Joe Zuckerbraun and Rayid Ghani. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2018).

Machine Learning for Social Services: A case study of prenatal case management in Illinois. Ian Pan, Laura B. Nolan, Rashida R. Brown, Romana Khan, Paul van der Boor, Daniel G. Harris, Rayid Ghani. American Journal of Public Health, 2017.

Early Intervention Systems – Predicting Adverse Interactions Between Police and the Public. Jennifer Helsby, Samuel Carton, Kenn­­eth Joseph, Ayesha Mahmud, Youngsoo Park, Andrea Navarrete, Klaus Ackermann, Joe Walsh, Lauren Haynes, Crystal Cody, Major Estella Patterson, Rayid Ghani. Criminal Justice Policy Review, 2017.

Building Better Early Intervention Systems. Crystal Cody, Estella Patterson, Kerr Putney, Jennifer Helsby, Joe Walsh, Lauren Haynes, and Rayid Ghani. Police Chief Magazine. International Association of Chiefs of Police. 2016

Detecting fraud, corruption, and collusion in international development contracts. Emily Grace, Ankit Rai, Elissa Redmiles, Rayid Ghani. 2016 IEEE International Conference on Big Data.

The Legislative Influence Detector: Finding Text Reuse in State Legislation. Burgess et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Identifying Police Officers at Risk of Adverse Events. Carton et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Designing Policy Recommendations to Reduce Home Abandonment in Mexico. Ackerman et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

Identifying Earmarks in Congressional Bills. Khabsa et al. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016).

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes.
Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani, Kecia Addison. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning.
Eric Potash, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Early Prediction of Code Blue Using Electronic Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015)

Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time.
Everaldo Aguiar, Himabindu Lakkaraju, Nasir Bhanpuri, David Miller, Ben Yuhas, Kecia Addison, Shihching Liu, Marilyn Powell, and Rayid Ghani. 5th International Learning Analytics and Knowledge (LAK) Conference 2015.

Early Code Blue Prediction Using Patient Medical Records.
Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, and Rayid Ghani. Workshop on Machine Learning for Clinical Data Analysis and Healthcare – held with NIPS 2013.