In June, Microsoft released a new artificial intelligence (AI) technology capable of generating its own computer code. The technology, called Copilot, speeds the work of professional programmers by suggesting ready-made blocks of computer code they could instantly add to their own. Copilot developed its skills by analyzing billions of lines of computer code posted to the internet.
Not every programmer welcomed the new technology.
Matthew Butterick, a programmer in Los Angeles, has filed a lawsuit seeking class-action status against Microsoft, GitHub, and OpenAI, the companies that designed and distributed Copilot. Butterick, 52, argues that what Copilot did amounts to piracy because the system does not acknowledge its debt to existing work. Butterick’s lawsuit claims that Microsoft and the other companies violated the legal rights of millions of programmers who spent years writing the original code that Copilot was trained on and from which the system pulls its recommendations.
To pull off its incredible feats of prediction, Copilot looks through massive databases of open-source code, or code open to tinkering by the public, such as that found on GitHub, which was acquired by Microsoft in 2018 for $7.5 billion. Microsoft distributes Copilot to programmers through GitHub. Most open-source code is still tied to a creator by a series of licenses, and when used, a creator’s name must be credited. In the case of Copilot, there is no attribution given when the AI system uses someone else’s code to fill in the lines for another user.
Even if the system doesn’t intend to copy another creator’s code, pulling from such a large database of code can recreate recognizable lines. Tim Davis, a professor of computer science at Texas A&M, took to Twitter after GitHub spat out large pieces of his copyrighted code with no attribution. “Not OK,” he commented.
“This whole arc that we’re seeing right now—this generative AI space—what does it mean for these new products to be sucking up the work of these creators?” said Butterick. His lawsuit is the first to tackle the legal gray area of AI-sourced code. Butterick says that he supports coders learning from other coders, but regarding AI, he believes it is unfair for the system to copy code.
The lawsuit could have an effect beyond auto-filling lines of code. Popular AI systems such as DALL-E, which creates artwork based on given prompts, searches and learns from thousands of copyrighted images to curate an “original image.” Other tools can create blocks of text based on certain parameters given by the user. In these instances, it is technically possible that the AI could create a work of art almost identical to one already made and copyrighted. Butterick’s lawsuit could decide whether or not AI systems have to keep logs of copyright information when creating products. According to the NY Times, most experts believe that training an AI system on copyrighted material is not necessarily illegal under existing laws, but doing so could be if the system creates material that is substantially similar to the data it was trained on.
GitHub initially declined to comment on the lawsuit to the NY Times, later emailing a statement saying that the company has been “committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe.” The other plaintiffs, Microsoft and OpenAI, declined to comment on the lawsuit.