AI art has rapidly become the hot new medium, as recent advances in AI art engine capabilities have made experimenting with AI more accessible to the general public than it’s been before. With such widespread usage, the question of how AI will (or should) handle attribution, ownership, and copyright issues is becoming increasingly urgent.
Artists receiving proper attribution was already an issue for the online art community, and how generative AI training sets function could quickly worsen the problem. Making sure artists get the credit and compensation they’re due is one of CO2ign’s tenets, so of course, this is a topic we’re keeping a close eye on.
The question we’re most interested in is: how can AI art exist alongside existing art in a way that does not take advantage of human artists? AI-assisted art already looks here to stay, and there are many ways it could be beneficial for artists and others – but we can also see the pitfalls. Getting to a place where AI art engines are functional and ethical will take a lot of work and intentional, thoughtful effort on the part of model creators, users, and the art world.
Stable Diffusion and how open source AI works
To understand why artists are concerned about credit, we must understand how AI learns to create art. So, how do AI art generators work?
All AI models must be “trained” using an example set of images. A human gives the AI model input, known as a training set – such as a batch of 500 photographs labeled as apples. To the model, all 500 images are the answer to the question, “what does an apple look like?” The AI uses this set of training images to learn what features make up an “apple” and builds a mathematical model around it. Then, when prompted to produce an apple, the AI runs its calculations and outputs a new “apple.” The larger the training set, the more variation is possible for the output. And once the AI is trained (has a model of how to produce answers), it no longer needs the training set.
Stable Diffusion is a set of AI models created by the company Stability.ai and recently released as open source for anyone to experiment with. The Stable Diffusion models released don’t include any attribution to the artists whose work was scraped for the training set, and because the model discards the training set once trained, none of the original images remain. While that sounds like a good thing, it means the original image can’t be used to trace back to the artist. Concerned parties have had to build third-party tools so artists can tell if their work was included in the training set.
A large part of the reason Stable Diffusion works so well is that it uses LAION-5B, a training set of 5 billion images. These images were “scraped” via crawlers from across the web, including from popular online gallery sites such as Art Station, Pinterest, Imgur, and DeviantArt. It’s worth noting, however, that the creators of Stable Diffusion’s training set intended the dataset to be used only for research purposes. They “do not recommend using it for creating ready-to-go industrial products”; that is, they didn’t intend for their model to be used to create commercial, saleable products to replace hiring human artists.
Companies using AI Art Projects
Two of the recent companies to announce AI-generator projects using Stable Diffusion are DeviantArt, an online art gallery, and Celsys, the makers of Clip Studio Paint. Both experienced immediate, passionate pushback from their core user base (artists) – and not without reason. Artists have already experienced a negative impact on their careers thanks to AI art generators. For many, the companies going in on AI need to do more to address artists’ concerns before plunging ahead with the emerging medium. (Update: as of Friday morning, 12/2, Celsys has backed off from this project due to community feedback.)
By using Stable Diffusion models for their products, companies like DeviantArt are trying to jump on the bandwagon of providing generative AI tools to drive users and profit without having to take responsibility for issues in the model or the processes used to create it. We believe all responsible participants in the space should be aware of the problems and work to combat them.
What are the issues with AI art?
Many of the issues with AI art are issues that already exist with art in general, especially online, but the scale and the “black box” nature of AI models hugely amplifies these problems.
Theft and Copyright Infringement
Straight-up copying (obvious copyright infringement)
Contrary to some popular conceptions, AI isn’t just copy-pasting from existing images for a sort of “collage.” As mentioned above, the original training images aren’t available to the model when it’s run post-training.
However, all machine learning models, especially ones with small datasets, can be prone to “memorization” – learning the attributes of one particular work rather than generalizing – and therefore produce something identical or near-identical to its training data. That means there’s a possibility of very close copying–and no easy way for the end user (or even the platform using the model, like DeviantArt or Clip Studio) to tell when it’s happening. However, ignorance is not an excuse, and if copying can happen, companies should ensure they have the rights to do it.
Gray-area copying (less clear-cut legally, but still dubious ethically)
It’s relatively common to request that AI generate art in a specific artist’s style by name. Even when it’s not, AI-generated work can look close to existing work without strictly copying.
Even when humans create art, the spectrum of what is considered “copying” – vs. homage, inspiration, or reference – is ill-defined. Copying the works of the masters, or trying to reproduce a style element of your favorite artist, are common and valid methods of learning. Sometimes reproducing another artist’s work is the art – it can be a gray area. However, even in gray areas, artists often name their reference or inspiration, and using others’ creative efforts for money or acclaim without doing so is, at best, in bad taste. AI should follow the same standard.
The prompts given (input) to an AI are simple, and the images output are complex. Any details not specified are filled in by what the AI has learned, and what it has learned is subject to biases in the training set. These biases include things like racial bias and sexism, such as producing images of women and POC in response to prompts for “homemaker” or “janitor.” Users have also noticed an obvious slant toward European art. There are likely more issues that aren’t immediately obvious, which is another reason why the rapid expansion of the field of AI concerns many, from artists to sociologists. Of course, human art also represents the biases of the artist – but AI bias is much harder to inspect.
Whether AI-generated art directly competes with human-generated art on quality may be a matter of perspective, but it can certainly beat out humans on speed and quantity. On a platform like ArtStation or DeviantArt, there’s a natural curation that comes from the fact that humans can only create and upload so many pieces in a given timespan. Since AI can generate near-infinite pieces, there’s a new problem in curation and discovery – allowing a human to reasonably browse the limited number of images they want to see and not drowning out human-generated work in the sea of content.
CO2ign’s Perspective on Generative AI Art
With the above in mind, here are the things we think should happen for all AI models provided for public use. We acknowledge that some of this will decrease the quality of the results – but we believe better results are not a justification to exploit artists.
- AI models should only be trained on public domain work or where the artist has given consent for the work to be included in training sets.
- AI models should only allow artists’ names as input with the artist’s consent.
- Artwork should have proper identification and attribution: AI work should clearly identify when the artist used an AI model, what model they used, the prompt entered, and any other human input from the artist (such as editing the piece after receiving the AI model output.)
- Art hosting platforms should have an established perspective on whether they allow AI-generated work. Where AI work is permitted, there should be rate limits, filters, and other controls to prevent it from drowning out the work of other artists.
There have been some attempts in these areas – Stable Diffusion 2.0 has seemingly reduced the ability to prompt for arbitrary artist names. The Shutterstock + OpenAI partnership only allows AI art uploads from their generator, which is opt-in and pays royalties to the artists whose work is in the training set. And though they overshadowed it by launching a Stable Diffusion generator, DeviantArt took steps here by implementing an opt-out directive, allowing artists to opt-out of their works being used in prompts. They also clearly label AI art as such.
Solutions for AI Art
Of course, making the above the standard for AI model usage will require a coordinated long-term effort from all the industry parties. Here are a few ideas on what could be done – both short-term and long – that will allow these tools to be useful while not infringing on artists.
Short term Solutions
First, we’ll cover some things that could be implemented relatively quickly, which would go a long way toward reassuring artists and keeping this issue from spiraling out of control.
Make good-faith efforts to avoid stealing from artists. Some ideas:
- Remove “portfolio” sites like Artstation or DeviantArt and unsourced image collection sites like Pinterest or Imgur from all training sets in the future (even experimental ones).
- Develop opt-in training sets or models trained exclusively on public domain data for commercial use.
- Publicly attribute training sets so that artists are credited, if indirectly, when their work is included in a training set.
- Block using artists’ names as features in the model.
Art hosting platforms
- Allow artists to specify usage rights for their art and include those rights in the image’s metadata. For example, creative commons license information can be included in DC and XMP metadata.
- Notably, DeviantArt’s noai directive is a good step here. Essentially, it’s a feature that tells web crawlers (the bots responsible for scraping art to add to training sets) that it can’t use a particular webpage or image. Unfortunately, that instruction isn’t included in the image itself. Instead, it’s in the HTML or HTTP headers of each image’s display page, so the opt-out gets lost if the image is downloaded, re-uploaded, or even served from a different location on DeviantArt. Putting an AI opt-out in the image metadata would be more robust, as it would stay with the image no matter where it goes.
- Create policies that specify how much AI generators can be used in works uploaded. These policies would vary by platform but could be anything from not allowing usage of generative art tools whatsoever, only allowing AI images where the user provides their own art as input or substantially transforms the image after it’s generated, or even simply only allowing a limited number of images per user.
All other image-hosting platforms
Read the metadata of images upon upload to check for usage rights information per above, and retain and display that information along with the image. Google Images already does this for licensing information.
Don’t host generators using, or share art generated from, models which don’t take the above steps.
Here are some ideas that will take considerable time and thought to achieve or depend on the above short-term plans.
(Once hosting platforms have executed the short-term goals above) When creating new training sets, read image metadata for licensing information and only use those which permit model training.
Licenses should be created which directly allow or permit use for model training. For example, a new Creative Commons CC-NT element that prohibits use for model training could be written. (Because all CC licenses require attribution even for derivative works, it’s arguable whether generating AI art based on copyrighted work would be allowed based on the current CC licenses anyway – but it depends on how you define “derivative” and “attribution.”)
Copyright law should be updated to clarify fair use and derivative works, including algorithmically-generated works.
Art creation programs/platforms
Include information about the work’s provenance – information about the creator and what they used to create the image – in the metadata. This can include AI generation as well as human editing.
All other image-hosting platforms
(Once art creation programs have included provenance in their metadata) Read image metadata and display information on the provenance of images when viewed.
None of this is easy to solve – but it’s not impossible. The speed at which AI is being adopted and embraced by non-artists (not unreasonably – AI is fun to play with!) means these are problems that need to be considered now. The field is moving quickly, with new developments coming what feels like weekly. Fortunately, that also means none of the issues are set in stone yet.
AI art models need to be implemented with thought and care and with the input of the artists that AI will impact, but we believe AI could be a powerful and exciting tool for everyone.