Written by Melissa Heikkilä
Source: MIT Technology Review
Image source: Generated by Unbounded AI
A new tool that allows artists to add invisible changes to pixels in their artwork before uploading it online, causing the generative model to crash in a chaotic and unpredictable way if the images are included in the AI training set.
The tool, called “Nightshade,” is designed to push back against AI companies that use artists’ work to train models without the creator’s permission. Using it to “poison” this training data could harm future iterations of image-generating models, such as DALL-E, Midjourney, and Stable Diffusion, scrambling some of their outputs—dogs into cats, cars into cows, and so on. The study has been submitted to the computer security conference Usenix for peer review.
AI companies such as OpenAI, Meta, Google, and Stability AI have faced a series of lawsuits from artists who claim their copyrighted materials and personal information were stolen without consent or compensation. Ben Zhao, a professor at the University of Chicago who led Nightshade’s founding team, said he hopes it will provide a powerful deterrent to disrespect for artists’ copyrights and intellectual property, helping to shift the balance of power from AI companies to artists. Meta, Google, Stability AI and OpenAI did not respond to MIT Technology Review’s request for comment.
Zhao’s team has also developed a tool, Glaze, that allows artists to “mask” their personal style to prevent theft by AI companies. It works similarly to Nightshade: changing the pixels of an image in subtle ways invisible to the human eye, manipulating machine learning models to interpret the image as something different from what it actually shows.
The team intends to integrate Nightshade into Glaze, and artists can choose whether or not to use a tool that can “poison” data. The team also intends to open source Nightshade, meaning that anyone can modify it and make their own version. Zhao says that the more people who use it and make their own version, the more powerful the tool will become. The datasets of large AI models may contain billions of images, so the more toxic images in the model, the greater the damage caused by the technology.
Nightshade exploited a security flaw in generative AI models that was trained on a large amount of data—in this case, images searched for on the Internet. Nightshade destroys these images.
Artists who want to upload their work online but don’t want their image to be scraped by AI companies can upload it to Glaze and choose to cover it up with an art style different from their own. They can then also choose to use Nightshade. Once AI developers take more data from the internet to tweak existing AI models or build new ones, these toxic samples make their way into the model’s dataset, causing the model to fail.
For example, a sample of poisoning data manipulates the model to think that the image of a hat is a cake and the image of a handbag is a toaster. Poisoning data is difficult to clean up because it requires technology companies to painstakingly find and delete each corrupted sample.
The researchers tested the attack on Stable Diffusion’s latest model and their own AI model trained from scratch. When they fed Stable Diffusion only 50 pictures of poisoned dogs and let it create their own pictures of dogs, the output started to get strange — too many limbs, a cartoonish face. After entering 300 poisoned samples, the attacker can manipulate Stable Diffusion to generate images of dogs that look like cats.
Generative AI models are good at making connections between words, which also contributes to the diffusion of toxicity. Nightshade is infected not only with the word “dog” but also with all similar concepts like “puppy,” “husky,” and “wolf.” This attack also applies to the images in question. For example, if the model grabs a poisonous image for the prompt “fantasy art,” the prompts “dragon” and “castle in the Lord of the Rings” will similarly be manipulated to output something else.
Zhao acknowledges that it is possible for people to abuse data poisoning techniques to carry out malicious attacks. But he also said attackers need thousands of poisoned samples to wreak real damage to larger, more powerful models that are trained on billions of data samples.
“We don’t yet know strong defenses against these attacks. We haven’t seen attacks on modern [machine learning] models yet, but it’s probably just a matter of time. Vitaly Shmatikov, a professor at Cornell University who studies the safety of AI models, said he was not involved in the study. It’s time to look into defense,” Shmatikov added.
Gautam Kamath, an assistant professor at the University of Waterloo who studies data privacy and the robustness of AI models, was also not involved in the study, but said the work was “fantastic.”
According to Kamath, the study shows that vulnerabilities “don’t magically disappear with these new models, they actually only get worse,” and “this is especially true when these models become more powerful and people trust them more and more, because the risk only increases over time.” ”
Junfeng Yang, a computer science professor at Columbia University, has studied the security of deep learning systems but was not involved in the study. If Nightshade can make AI companies more respectful of artists’ rights, such as being more willing to pay royalties, he said, it will have a huge impact.
AI companies that develop text-to-image generation models, such as Stability AI and OpenAI, have proposed giving artists the option not to use their images to train future versions of the model. But artists say it’s not enough. Eva Toorenent, an illustrator and artist who used Glaze, said the exit policy requires artists to pass hurdles, while tech companies still hold all the power.
Toorenent hopes Nightshade will change that.
“This would make [AI companies] think twice because they could potentially take our work without our consent and destroy their entire model,” she said. ”
Another artist, Autumn Beverly, said tools like Nightshade and Glaze gave her the confidence to post her work online again. Previously, she found out that her work had been scraped into the fire’s LAION image database without consent and removed it from the internet.
“I’m really grateful that we have a tool that helps artists regain control of their work,” she says. ”