Paper: AI Art and its Impact on Artists

2023/10/08

Paper: AI Art and its Impact on Artists

This week, I wanted to cover a collective paper written by a group of researchers and artists collaborating together (Harry H. Jiang, Lauren Brown, Jessica Cheng, Mehtab Khan, Abhishek Gupta, Deja Workman, Alex Hanna, Johnathan Flowers, Timnit Gebru), titled AI Art and its Impact on Artists.

The paper starts with an overview of how image generation works at a high level, starting with Convolutional Neural Networks (CNNs) doing image recognition, variational autoencoders (VAE) which used mirrored neural networks to enable the first generative models (such as VQ-VAE-2), followed by generative adversarial network (GAN) where two competing neural networks try to please a discriminator (often a third network evaluating how realistic an image is). This later type of tech eventually got augmented with an ability to consider tags defining the data, and was used for images as large as 512x512. Natural Language Processing (NLP) allowed increasing the complexity of tests and generated images, and the inclusion of Large Language Models (LLMs) led to natural language prompts.

Eventually (in the last few years), diffusion models inspired by fluid dynamics—they apply noise to an image and then de-noise the results—led to models not constrained by natural language understanding. This lands us close to where we are with Stable Diffusion, DALL-E, Midjourney, and others. Models of these types are trained on large image datasets such as JFT-300M or LAION (which has sub-variants), which contain hundreds of millions to billions of image-text pairs. In total, the paper lists roughly 20 commercial products using various datasets.

The authors point out that there's a tendency to anthropomorphize image generators, of talking about them like they're artists, even going as far as saying they are "inspired" by the data in their training set. The authors disagree, and present us with some philosophy of art to support their point, defining art as a uniquely human endeavor connected specifically to human culture and experience:

[W]hile non-human entities can have aesthetic experiences and express affect, a work of art is a cultural product that uses the resources of a culture to embody that experience in a form that all who stand before it can see. [...] Further, this process must be controlled by a sensitivity to the attitude of the perceiver insofar as the product is intended to be enjoyed by an audience. [...] This control over the process of production is what marks the unique contribution of humanity: while art is grounded in the very activities of living, it is the human recognition of cause and effect that transforms activities once performed under organic pressures into activities done for the sake of eliciting some response from a viewer.

As an example, they mention a robin singing or a peacock dancing under organic pressures, but human song and dance serving purposes different from organic ones, including cultural ones and communication. Image generators however do not have that understanding of the perspective of the audience, and do not undergo a similar artistic process. Instead, they imitate whichever parts of the process are embodied in the works within the training set—works from image generators may be aesthetic, but not artistic: true artistic works generally require to also be aesthetic, but this latter point is mostly limited to technique, which isn't sufficient to be truly artistic.

This plays out in how image generators can give good results, but to do so require extensive training to be shown what the "right" output should be, whereas humans do not require such criteria. This makes image generators great at copying style, but, the authors say, it is very rare for artists to be able to copy each other's styles:

The very few artists who are able to do this copying are known for this skill. An artists’ ‘personal style’ is like their handwriting, authentic to them, and they develop this style (their personal voice and unique visual language) over years and through their lived experiences.

The adoption of any particular style of art, personal or otherwise, is a result of the ways in which the individual is in transaction with their cultural environment such that they take up the customs, beliefs, meanings, and habits, including those habits of aesthetic production, supplied by the larger culture. As philosopher John Dewey argues, an artistic style is developed through interaction with a cultural environment rather than bare mimicry or extrapolation from direct examples supplied by a data set.

In short, the development of an artist's style comes from repeated interactions with their environment and culture, and there's a cycle of influence and impact shaping it. It is unique to each of them and does not come in isolation, but from active participation and growth in a way that is constantly evolving. By comparison, image generators, once trained, stop changing until they are explicitly trained again, either from scratch or fine-tuning. The abstract interpretations and sentimental imagery are missing, the paper argues.

image generators are not artists: they require human aims and purposes to direct their “production” or “reproduction,” and it is these human aims and purposes that shape the directions to which their outputs are produced. However, many people describe image generators as if these artifacts themselves are artists, which devalues artists’ works, robs them of credit and compensation, and ascribes accountability to the image generators rather than holding the entities that create them accountable.

This is why we need to be really careful about the words we choose to describe image generators. Anthropomorphisation shifts accountability and credit in a distinct way between the automation, the stakeholders who produce and train them, and the artists whose output is used to train them.

Impact on Artists

The paper at this point shifts in covering the impact of AI art on artists, under many lenses:

Economic loss
Digital artwork forgery
Hegemonic views and stereotyping
Effects on cultural production and consumption

For economic loss, the argument is that an artist's style is formed over years of honing their craft through practice, observation, schooling, and costs of materials (books, supplies, tutorials). Their output is then used without compensation by companies like Stability AI—companies backed by billions from venture capitalists—who then compete with them directly in the market. Folks like Sam Altman of OpenAI specifically call out the expectation to replace creatives' jobs; Stability AI CEO Emad Mosque has accused artists of wanting to have a “monopoly on visual communications” and “skill segregation”. The paper retorts:

To the contrary, current image generation business models like those of Midjourney, Open AI and Stability AI, stand to centralize power in the hands of a few corporations located in Western nations, while disenfranchising artists around the world.

The behavior observed is that image generators can output content much faster and cheaper, but without nearly as much depth of expression as a human. They allow flooding the market with "acceptable" imagery that will supplant demand for artists. The paper then covers multiple examples of this happening already in TV series, movies, and gaming industries.

While this hurts fully employed artists, they point out that self-employed artists are also likely to suffer. They point out the example of the Clarkesworld science fiction magazine, which got flooded so much by AI-generated sci-fi that they had to stop accepting all submissions, and eventually re-opened them while only accepting submissions from previously published authors. The net impact, they say, is that rather than democratizing art, the number of artists who can share their work and receive recognition is reduced.

Many artists already have to use image generators in order to keep their jobs, and report having their role slowly shifting to "clean up work, with no agency for creative decisions". Basically, if they want to keep working, they have to make the output of image generator good enough, which reinforces the pattern that de-skills their work. Actual artwork allowing full creative control is increasingly likely to only be affordable to people who are already independently wealthy, and to stall development of artists from other backgrounds.

In terms of digital artwork forgery, the lack of consent and attribution also is problematic. Copyrighted images and photographs are used to train image generators, which often produce near-exact replicas. While artists have increasing trouble living from their art, some companies directly market their ability to replicate their style. Often, the artists' name is associated (because it's their style) by people who asked for the images to be generated, and their reputation slowly gets tied to images they wouldn't have agreed to produce.

In some cases, they are used in harsher situations such as harassment, hate speech, or genocide denial. This existed before image generators but is faster now. Artist Sarah Andersen states:

"Through the bombardment of my social media with these images, the alt-right created a shadow version of me, a version that advocated neo-Nazi ideology... I received outraged messages and had to contact my publisher to make my stance against this ultraclear.” She underscores how this issue is exacerbated by the advent of image generators, writing "The notion that someone could type my name into a generator and produce an image in my style immediately disturbed me... I felt violated”

Since the artists' style is a product of their own growth and history, this becomes far more personal than people realize.

Going to hegemonic views and stereotyping, the authors report that underrepresented groups, those more used to being more invisible, can attest to seeing a distortion of themselves in the output of image generators, often warping reality based on stereotypes:

For instance, [Senegalese artist Linda Dounia Rebeiz] notes that the images generated by Dall-E 2 pertaining to her hometown Dakar were wildly inaccurate, depicting ruins and desert instead of a growing coastal city.

(As a personal note, I saw an article just yesterday on how challenging it is to ask for generated images of black doctors helping white children, and it similarly reflects how dominant views and media shape the output.)

The objectification of some cultures goes further, where "synthetic models" are generated and licensed to organizations, and the benefits go to the people who generate the images rather than people from the cultures off which they are based. Once again, this brings back the question about where credit, attribution, and accountability ends up being distributed.

This is where chilling effects on cultural production and consumption come in play. Since many artists already struggle to make ends meet and that the job prospects are rapidly worsening, students are dissuaded from honing their crafts, and both new and current artists are more reluctant to share their work to protect themselves from mass scraping. This causes tension, because they often build their audience and visibility by sharing content on social media, crowdfunding platforms, and trade shows, but are now incentivised against doing that to protect themselves from the unethical practices of corporations profiting from their work:

Artists’ reluctance to share their work and teach others also reduces the ability of prospective artists to learn from experienced ones, limiting the creativity of humans as a whole. Similar to the feedback loop created by next generations of large language models trained on the outputs of previous ones, if we, as humanity, rely solely on AI-generated works to provide us with the media we consume, the words we read, the art we see, we would be heading towards an ouroboros where nothing new is truly created, a stale perpetuation of the past.

What the authors are warning against is a potential feedback loop by which art stops progressing and becomes stale.

AI Art, US copyright law, and Ethics

The paper uses words such as unethical when describing image generators, and this section mostly gives weight to that element. Currently, it isn't exactly clear whether the way image models are trained represents copyright infringement. Class action lawsuits are kicking off, and the scales in play here in terms of the number of artists involved is somewhat unprecedented.

What the authors assert here is that these unanswered legal questions about whether copyright applies are used by the companies producing image generators to operate without accountability, so long as they aren't being sued for specific violations of existing copyright law. Since courts take time to work, economic and social harms to artists are allowed to go on.

In terms of authorship, for example, the generated images are not copyrightable under US law, although the prompts used might be copyrightable if they are independently creative. So the iterative work that requires continuous transformations is somewhat hard to define copyright-wise. The way artists interact with the tools may end up defining the status, and given uncertainty here, they call for more caution.

One of the major arguments used by the producers of image generators is the concept of fair use:

Fair use is a doctrine in copyright law that permits the unauthorized or unlicensed use of copyrighted works; whether it is to make copies, to distribute, or to create derivative works. Whether something constitutes fair use is determined on a case-by-case basis and the analysis is structured around four factors.

Of the four factors, two are most relevant:

whether the use is commercial and “transformative”; transformative use may be valid for commercial reasons, but not always.
whether a use is a threat to the market of the original creator’s work.

So while arguments can often be made that the image generators make transformative work, the fact that it often copies the style of smaller independent artists who can't necessarily afford to fight legal battles about copyright (unlike Getty images), means that the fair use argument may fall apart given how image generators often end up threatening the market for the original creator. This is without counting moral rights, which protect reputational interests.

The authors call out "data laundering" practices that roughly work as follows:

LAION is established as a nonprofit organization
LAION releases the LAION-5B dataset, containing 5 billions of image-text pairs, many of which contain copyrighted material
Hugging Face and Stability AI are declared as sponsors of the above dataset and models
LAION claims the dataset is for research purposes, which makes the dataset more likely to be fair use, since nonprofit educational and noncommercial uses are fair
Hugging Face and Stability AI use the "fair use" datasets for commercial purposes
Stability AI raise $101M in funding with a $1B valuation
The accountability for the dataset creation and maintenance, including copyright or privacy issues, is shifted to the nonprofit that collected it
Artists get no credit nor compensation

As such, the cycle is generally that by having universities and research labs funded by private corporations, they practically end up bypassing copyright claims for commercial uses. Investors and corporations are free to do whatever with limited responsibility. This is more or less a direct call-out from the authors to the ML and AI communities to figure out their ethics and take responsibility to protect people.

Most existing or suggested mechanisms to protect artists (eg. watermarking) either don't work, or put the responsibility on artists to prove harm for any action to be taken. The paper calls for better accountability of the entities who create image generators in the first place, rather than on the artists. They advocate for legislation that prevents training models without artists' consent, funding AI research that isn't tangled with corporate interests, and to evaluate and task work based on how they can serve specific communities. This, however, would require shifting ML researchers' point of view to be aware of their relationship to power, rather than assuming their technology is neutral and usage isn't their responsibility.

The authors conclude:

Image generators can still be a medium of artistic expression when their training data is not created from artists’ unpaid labor, their proliferation is not meant to supplant humans, and when the speed of content creation is not what is prioritized. [...] If we orient the goal of image generation tools to enhance human creativity rather than attempt to supplant it, we can have works of art [...] that explore its use as a new medium, and not those that appropriate artists’ work without their consent or compensation.

So there you have it. The stance isn't necessarily anti image generators, but far more about how they are built and what impact that has.