Amsterdam’s Bold AI Experiment in Welfare Fraud Detection Ends in Controversy

Two Visions of AI in Welfare

Hans de Zwart, a former gym teacher turned digital rights advocate, was shocked when he learned about Amsterdam’s plan to use an algorithm called Smart Check to evaluate every welfare applicant for potential fraud. De Zwart, having advised Amsterdam’s city government on AI ethics, identified fundamental, unfixable issues with applying such an algorithm to real people.

Meanwhile, Paul de Koning, a consultant managing the pilot phase of Smart Check, saw it as a promising tool to improve efficiency and reduce bias in Amsterdam’s social benefits system. His team had spent years developing the model which incorporated expert feedback, bias testing, and technical safeguards, aiming to create a fair and transparent system.

The Global Debate on Algorithmic Fairness

Smart Check’s story reflects a global debate about whether algorithms can fairly make life-impacting decisions. Past AI use cases in welfare and social services have often caused discriminatory outcomes, such as unfairly targeting nonwhite applicants or vulnerable groups. While proponents argue AI can improve public service efficiency and reduce fraud, many systems have been poorly designed or untested for bias, with limited recourse for affected individuals.

Amsterdam’s Ambitious but Flawed Approach

Amsterdam sought to build a responsible AI system that followed ethical guidelines and involved stakeholder consultation. Smart Check used an "explainable boosting machine" algorithm considering 15 non-demographic factors to assign fraud risk scores. The city disclosed multiple versions of the model and data for outside scrutiny, aiming for transparency.

However, the model was trained on historical investigation data, which embedded historic biases. Despite rigorous auditing, defining fairness mathematically proved challenging. The city aimed for equal distribution of wrongful investigations across demographics, but bias persisted.

Opposition and Concerns

The Participation Council, representing welfare recipients and advocates, opposed Smart Check from the start, fearing its impact on vulnerable citizens’ rights. Their feedback led to some changes, but they ultimately called for discontinuing the project, citing disproportionate impact given the low fraud rate.

Pilot Results and Bias Challenges

When tested live, Smart Check flagged more applicants for investigation than human caseworkers and was not more accurate. Bias shifted unpredictably, sometimes wrongly flagging Dutch nationals and women. Efforts to reweight training data reduced some bias but could not eliminate complex intersecting biases.

Project Termination and Reflections

In late 2023, Amsterdam’s city official halted the pilot, citing difficulty justifying biased outcomes. The city reverted to analog processes, which themselves are biased. Some experts argue AI systems are held to higher standards than human workers, while others see ethical AI as an evolving field requiring trial and error.

The Broader Implications

The Smart Check experiment highlights deep challenges in defining and achieving fairness in AI systems affecting social welfare. It underscores the need for political and ethical deliberation beyond technical fixes, and the importance of involving affected communities in decisions. As AI increasingly influences public services, governments worldwide face urgent questions about responsible deployment and potential unintended harms.

This article was produced in partnership between MIT Technology Review, Lighthouse Reports, and Trouw, supported by the Pulitzer Center.