loading . . . Most of the discussions about the impact of the latest generative AI systems on copyright have centred on text, images and video. That’s no surprise, since writers, artists and film-makers feel very strongly about their creations, and members of the public can relate easily to the issues that AI raises for this kind of creativity. But there’s another creative domain that has been massively affected by genAI: software engineering. More and more professional coders are using generative AI to write major elements of their projects for them. Some top engineers even claim that they have stopped coding completely, and now act more as a manager for the AI generation of code, because the available tools are now so powerful. This applies in the world of open source software too. But a recent incident shows that it raises some interesting copyright issues there that are likely to affect the entire software world.
It concerns a project called chardet, “a universal character encoding detector for Python. It analyzes byte strings and returns the detected encoding, confidence score, and language.” A long and detailed post on Ars Technica explains what has happened recently:
> The [chardet] repository was originally written by coder Mark Pilgrim in 2006 and released under an LGPL license that placed strict limits on how it could be reused and redistributed.
>
> Dan Blanchard took over maintenance of the repository in 2012 but waded into some controversy with the release of version 7.0 of chardet last week. Blanchard described that overhaul as “a ground-up, MIT-licensed rewrite” of the entire library built with the help of Claude Code to be “much faster and more accurate” than what came before.
Licensing lies at the heart of open source. When Richard Stallman invented the concept of free software, he did so using a new kind of software licence, the GPL. This allows anyone to use and modify software released under the GPL, provided they release their own code under the same licence. As the above description makes clear, chardet was originally released under the LGPL – one of the GPL variants – but version 7.0 is licensed under the much more permissive MIT licence. According to Ars Technica:
> Blanchard says he was able to accomplish this “AI clean room” process by first specifying an architecture in a design document and writing out some requirements to Claude Code. After that, Blanchard “started in an empty repository with no access to the old source tree and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code.”
That is, generative AI would appear to allow open source licences like the GPL to be circumvented by rewriting the code without copying anything directly from the original. That’s possible because AI is now so good at coding that the results can be better than the original, as Blanchard proved with version 7.0 of chardet. And because it is new code, it can be released under any licence. In fact, it is quite possible that code produced by genAI is not covered by copyright at all, for the same reason that artistic output created solely by AI can’t be copyrighted. If the licence can be changed or simply cancelled in this way, then there is no way to force people to release their own variants only under the GPL, as Stallman intended. Similarly, the incentive for people to contribute their own improvements to the main version is diminished.
The ramifications extend even further. These kind of “AI clean room” implementations could be used to make new versions of any proprietary software. That’s been possible for decades – Stallman’s 1983 GNU project is itself a clean-room version of Unix – but generally requires many skilled coders working for long periods to achieve. The arrival of highly-capable genAI coding tools has brought down the cost by many orders of magnitude, which means it is relatively inexpensive and quick to produce new versions of any software.
In effect, generative AI coding systems make copyright irrelevant for software, both open source and proprietary. That’s because what is important about computer code is not the details of how it is written, but what it does. AI systems can be guided to create drop-in replacements for other software that are functionally identical, but with completely different code underneath.
Companies that license their proprietary software will probably still be able to do so by offering support packages plus the promise that they take legal responsibility for their code in a way that AI-generated alternatives don’t: businesses would pay for a promise of reliability plus the ability to sue someone when things go wrong. But for the open source world these are not relevant. As a result, the latest progress in AI coding seems a serious threat to the underlying development model that has worked well for the last 40 years, and which underpins most software in use today. But a wise post by Salvatore “antirez” Sanfilippo sees opportunities too:
> AI can unlock a lot of good things in the field of open source software. Many passionate individuals write open source because they hate their day job, and want to make something they love, or they write open source because they want to be part of something bigger than economic interests. A lot of open source software is either written in the free time, or with severe constraints on the amount of people that are allocated for the project, or – even worse – with limiting conditions imposed by the companies paying for the developments. Now that code is every day less important than ideas, open source can be strongly accelerated by AI. The four hours allocated over the weekend will bring 10x the fruits, in the right hands (AI coding is not for everybody, as good coding and design is not for everybody).
Perhaps a new kind of open source will emerge – Open Source 2.0 – one in which people do not contribute their software patches to a project, as they do today, but instead send their prompts that produce better versions. People might start working directly on the prompts, collaborating on ways to fine tune them. It’s open source hacking but functioning at a level above the code itself.
One possibility is that such an approach could go some way to solving the so-called “Nebraska problem”: the fact that key parts of modern digital infrastructure are underpinned up by “a project some random person in Nebraska has been thanklessly maintaining since 2003”. That person may not receive many more thanks than they have in the past, but with AI assistants constantly checking, rewriting and improving the code, at least the selfless dedication to their project becomes a little less onerous, and thus a little less likely to lead to programmer burn out.
Featured image by Lynn Greyling.
_Follow me @glynmoody onMastodon and on Bluesky._
* __
* __
* __
* __
*
https://walledculture.org/why-genai-means-the-end-of-copyright-for-software-and-the-re-invention-of-open-source/