Python 'Chardet' Package Replaced With LLM-Generated Clone, Re-Licensed
6 mars 2026 à 18:00
Ancient Slashdot reader ewhac writes: The maintainers of the Python package `chardet`, which attempts to automatically detect the character encoding of a string, announced the release of version 7 this week, claiming a speedup factor of 43x over version 6. In the release notes, the maintainers claim that version 7 is, "a ground-up, MIT-licensed rewrite of chardet." Problem: The putative "ground-up rewrite" is actually the result of running the existing copyrighted codebase and test suite through the Claude LLM. In so doing, the maintainers claim that v7 now represents a unique work of authorship, and therefore may be offered under a new license. Version 6 and earlier was licensed under the GNU Lesser General Public License (LGPL). Version 7 claims to be available under the MIT license.
The maintainers appear to be claiming that, under the Oracle v. Google decision, which found that cloning public APIs is fair use, their v7 is a fair use re-implementation of the `chardet` public API. However, there is no evidence to suggest their re-write was under "clean room" conditions, which traditionally has shielded cloners from infringement suits. Further, the copyrightability of LLM output has yet to be settled. Recent court decisions seem to favor the view that LLM output is not copyrightable, as the output is not primarily the result of human creative expression -- the endeavor copyright is intended to protect. Spirited discussion has ensued in issue #327 on `chardet`s GitHub repo, raising the question: Can copyrighted source code be laundered through an LLM and come out the other end as a fresh work of authorship, eligible for a new copyright, copyright holder, and license terms? If this is found to be so, it would allow malicious interests to completely strip-mine the Open Source commons, and then sell it back to the users without the community seeing a single dime.
Read more of this story at Slashdot.