Does Generative AI Threaten the Open Source Ecosystem?
26 octobre 2025 à 20:34
"Snippets of proprietary or copyleft reciprocal code can enter AI-generated outputs, contaminating codebases with material that developers can't realistically audit or license properly."
That's the warning from Sean O'Brien, who founded the Yale Privacy Lab at Yale Law School. ZDNet reports:
Open software has always counted on its code being regularly replenished. As part of the process of using it, users modify it to improve it. They add features and help to guarantee usability across generations of technology. At the same time, users improve security and patch holes that might put everyone at risk. But O'Brien says, "When generative AI systems ingest thousands of FOSS projects and regurgitate fragments without any provenance, the cycle of reciprocity collapses. The generated snippet appears originless, stripped of its license, author, and context." This means the developer downstream can't meaningfully comply with reciprocal licensing terms because the output cuts the human link between coder and code. Even if an engineer suspects that a block of AI-generated code originated under an open source license, there's no feasible way to identify the source project. The training data has been abstracted into billions of statistical weights, the legal equivalent of a black hole.
The result is what O'Brien calls "license amnesia." He says, "Code floats free of its social contract and developers can't give back because they don't know where to send their contributions...."
"Once AI training sets subsume the collective work of decades of open collaboration, the global commons idea, substantiated into repos and code all over the world, risks becoming a nonrenewable resource, mined and never replenished," says O'Brien. "The damage isn't limited to legal uncertainty. If FOSS projects can't rely upon the energy and labor of contributors to help them fix and improve their code, let alone patch security issues, fundamentally important components of the software the world relies upon are at risk."
O'Brien says, "The commons was never just about free code. It was about freedom to build together." That freedom, and the critical infrastructure that underlies almost all of modern society, is at risk because attribution, ownership, and reciprocity are blurred when AIs siphon up everything on the Internet and launder it (the analogy of money laundering is apt), so that all that code's provenance is obscured.
Read more of this story at Slashdot.