You might have heard it in the past few months: some areas of the Internet are buzzing with discussions on “generative art”, that is, artwork generated from “AI” models that were fed with an absurdly large amount of images as training as a base. There are supporters, there are critics, and techology advances: among all this, this post offers my humble experience with computer-generated imagery.
In this post, you’ll be guided by Yumiko, Satsuki, and Maya: the first two are characters created by someone else I know (who wants to remain anonymous) which I then expanded, cooperating with their creator in a certain project from a few years ago; the latter is… well, a character with an interesting history, which will be explained later.
But first, an introduction for those unfamiliar with this world is in order.
What is this AI you speak of?
Unless you were living in hibernation in Alpha Centauri waiting for the right moment to start the invasion, you might have heard about the use of various methods of “machine learning” to have a computer program (to be very simple) “learn” particular features out of a data set (chess games, video games, images, sounds…) and use them to perform various tasks, such as playing games, generate texts, tackle complex scientific problems and many more things.
Scientific background or not, my mind is limited and I’d need someone more expert than me to find a precise definition of these systems (feel free to do so in the comments!). In this space there is active scientific research, both by universities and commercial companies, such as DeepMind (now owned by Google) and OpenAI (although they’re not really “open”…).
One important point, which is relevant for the background of all this, is that aside the algorithms (the instructions required to “learn” and then “apply” the digested data), one of the key ingredients of this potentially palatable recipe is the model: you can view it as what the “AI” has learned after processing the data. While most of the algorithms are public (there are even research papers), models are often not, so only their creators can change, improve, or build upon them.
In January 2021, OpenAI announced DALL-E, a play on Disney’s WALL-E and renowned artist Salvador Dalì:
an extension to their GPT-3 system (which generated texts) which allowed the generation of images from natural language. This meant that the text “an apple on a wooden table” would produce (more or less) an image of an apple on a wooden table. To prevent any possible potential liability (they said “misuse” or similar terms, but it was mainly for liability), both sexual and violent images were removed from the (massive) data set used for training. However, despite “Open” in the name, DALL-E was only available to OpenAI’s paying customers (like GPT-3). There was no way to alter, modify, or improve the whole deal unless OpenAI wanted to (in part they did, with DALL-E 2).
Similar models were made by other companies, like Midjourney, but likewise, they kept everything to themselves. Some places started offering generation services (free, or at a price), but it looked like some sort of niche interest. Until summer 2022.
One event that changed everything
That summer, Patrick Esser (from RunwayML) and Robin Rombach from a university research group (Machine Vision & Learning, LMU Berlin) released Stable Diffusion, another approach to generate images, trained on about 5 billion reference images. The big deal is that, although with some restrictions, the model was substantially more open than the others, and in fact was available, and allowed modification and reuse. That was when AI generation exploded.
Many, from researchers to particularly smart lay people, actually started working on improvements to the models, the algorithms, and everything that turned around Stable Diffusion. Although Stable Diffusion aimed at all kinds of art, specialized models were made, for example, to draw anime-style art. In addition, NovelAI, a company which provided a service to create stories using GPT-3, developed a custom (and high quality) model to draw anime art. Said model was also somehow leaked, and prompted further modifications (link in Chinese), although of likely questionable legality.
The rest of this post, in fact, deals to what I actually did with all this stuff.
Those who are familiar with me or my background know it already: art was never my forte. In high school, my lowest grades were in art and related subjects. I was never, ever able to draw more than stick figures. It wasn’t a big deal: whatever I lacked in art I made up for it in other disciplines (like science).
So what has that to do with AI art? Well, there are a couple of reasons that might be worth telling.
Ideas, and lack of implementation
As I discussed with the creator of Yumiko and Satsuki several times, it would’ve been nice to actually see characters “spring to life”. Ever since the characters were conceived, thanks to a totally random comment I made that opened the lid to a train of thought which brought them in this form.
Neither of us could do it. Even Yumiko’s creator is an artistically-challenged person, although less than myself. However, we asked (and paid, of course) artists. Despite sometimes troublesome relationships (involving, in one case, a dispute on a payment site), it helped shape the characters and even prompted new ideas for them. One of the issues was that often for cost or time we would actually cut some of the planned ideas for the images.
Example: one image arrived late, there were some more adjustments to do, but we glossed over to avoid waiting more time. Plus some of the artists weren’t trusting their skill enough drawing certain types of characters (either women or men, for example) Nevertheless, there was quite an amount of art produced in the past years (our wallets weren’t happy).
For these reason, using AI-generated imagery is a potentially useful way to prototype scenes, fine tune them, and see whether they “fly” in the context of what they’re depicting (sometimes a good idea might have a bad implementation, for example). At this point, one can ask an artist to draw the final composition after gathering enough information.
Building upon a memory
The second reason involves memories, or rather, fading memories. Around twenty years ago, or perhaps a little more (in the 1999-2002 period), back when high speed Internet had just arrived in my country but I was stuck with a 28.8K modem (or an even more unreliable 56K one). At the time, I somehow found myself on Japanese sites, although at the time, I didn’t understand even a single word of what was written. One of these sites was an aggregator called Surfer’s Paradise (or “surpara”).
Clicking at random through the list of the “newly added” sites made me end up on an artist’s page: I do not know who it was. Even more clicking around, and I found myself in the rough sketch (らくがき) section, and in particular my attention was attracted by the drawing (rough) of a girl with light blue, long hair, wearing a futuristic armor (a mixture of black, white and blue), with a shield and a blade that sprung out of a vambrace on the right wrist and two long ribbon-like structures attached to the headgear. A scribbled text next to it read “i-Girl” (note, this is not a more famous image of an “i-Girl” related to the iMac). I liked the concept, and fantasized a bit on it. That’s how “Maya” (at the time without a family name) was born.
However, I was never able to find the drawing again. I don’t even remember the web page, and the image was not on any of the many backups I made. I remember saving it, but it was likely lost. So for many years, “Maya” was just an idea that kept going back and forth in my head. At some point, the concept of a “civilian Maya” (the one you’ve seen so far) and a “battle Maya”, heavily inspired by 1980s anime like 超音戦士ボーグマン (Sonic Soldier Borgman), having the ability to transform and don some kind of powered suit (with a bracelet heavily borrowed from Borgman’s Baltector), took shape.
This is what allowed the character (and a few others around her) to “exist”. No longer as a vague memory (which is also fading), but in something that can be seen. I wouldn’t have asked an artist to do this, as the details were too scant to actually do anything. This way of generating allowed me not only to put form to my memory, but also to start fleshing out more details (although it is unlikely that it will become anything - there’s already too much bad stuff out there).
What AI art gets near-perfectly
Being a system that is trained on existing art (and, due to technical reasons, to mostly square, low-resolution images), such a system can get some things right, and some things wrong. First, we’ll focus on the good points. Remember, however, that this is from a perspective of someone who has no artistic sense.
Certain models (like the leaked NovelAI, or its evolution, the “Anything” model) get most anime-related stuff almost perfectly. The images of Satsuki and Yumiko you’ve seen here are extremely close to the character designs that were commissioned years ago. Of course the artist style is different, but it’s really them. Even other characters that existed only in textual descriptions ended up being “just right”, like the fellow below (by the way, his name is Nindech: don’t make him angry, or he’ll eat you).
So, getting good results is definitely possible, for the most part. However, that doesn’t mean that they come out easily (more about this later).
What isn’t that great
Of course, like all software, this method has some limitations, which can become even more evident with anime-style art. One of the major issues is that the data fed to it was in rather low resolution for the most part (512x512), which means that a lot of fine details are lost, and thus the algorithm has a hard time getting them right. Examples are fingers (you might have noticed in the images in this post), but also stuff like weapons, which barely end up in the characters’ arms (the suspicious “Anything v3.0” model gets them much better, to be honest).
Also, if there are not enough pixels to properly form the image, some details like faces are completely messed up. Photographic proof below.
The more acute observers might also have noticed, sometimes, a lack of symmetry in clothing and backgrounds. Again, this is in part to due to how the algorithm works, and in part due to the low resolution of the input images. You can raise the resolution, of course, but that requires loads of video RAM on your GPU (around 16G or more).
The AI also has no notion of what the elements in the scene are there. As such, multiple characters in one scene often share the same traits (e.g., if you have one with glasses, everyone will have them), and this is even more evident if there are characters of different gender together.
In addition, most (but not all) anime-styled models have a major slant towards female characters. Getting male characters out that don’t look like eldritch abominations requires some work.
Is it really low-effort?
One of the major criticism of AI-generated art (in general, not necessarily anime-styled), along with the “stealing” of art (we’ll discuss that later) is that it would, potentially, enable “low-effort” generation of art.
While it’s true that some (see on DLsite.com, for example) are selling AI-generated art that is of abysmal quality (low resolution and not even checked), getting pictures right requires quite a bit of work. For each of the images in the post, I’ve gone at least through 30 iterations each, adjusting the inputs, blacklisting some terms, and adjusting resolutions so they would look fine (only to, sometimes, redo everything due to a mistake). I can say it takes about more than an hour for more complex subjects, and several hours for the most difficult ones.
There are tools to “help” the AI to generate images correctly, which involve either slightly altering a provided image (as opposed to pure text input) or by instructing it to only touch part of the image itself. These are considerably more complex to use. For one series of images I think I spent about six hours in total (cumulative) to get them right.
Let’s get this clear: it’s not comparable to the effort one artist makes in creating a drawing, but it’s not at the level of “push button, receive cookie”. Absolutely not.
Will the AI overlords replace humans?
And this brings us to the main argument of AI-generated art, which is true even more for anime-styled one. Will the AI overlords make artists obsolete? Big question.
(The above is Yumiko. Don’t ask, long story.)
I’ve had quite a long debate with someone far more art-inclined than me on this topic. The point this person made is that morally, if there’s no effort involved, it shouldn’t be really sold as “art”, although with a distinction between those who spread, or sell, generated images “as-is” and those who put effort in getting them better. Of course, I’m summarizing a lot (this post is already very long as it is!) but that was the gist of our argument.
As I see it, the matter is undoubtedly complex and time will be definitely needed to sort most of the issues out.
There are several points that will need to get cleared:
This is not “stealing”. At best it is plagiarism, or copyright infringement. They are not the same thing, as stealing implies a physical deprivation of something, not an unauthorized copy or modification;
Whether AI art is a derivative work of the images it was trained on, it is likely a matter that will need courts to get involved, because at this point there is absolutely no clear way to call the generated art “original” or “derivative”, in particular when no preexisting subjects are involved;
There’s the whole stuff of AI being used for stuff that people would find unpleasant, borderline legal, or outright revolting and creepy. Again, now that the models are out, while one can “censor” or “limit” them, other people may be still able to add whatever they want to them, so it may need law, or courts if such limits are wanted.
Given the pace this field is evolving, I’m guessing it will take some time for everything to settle and these issues are discussed (if they are). Interesting (or troubling, depending on your point of view) times ahead.
So, what’s the conclusion of this somewhat long and rambling text (aside “close browser tab”, of course)? I’m not an artist at all, and I admittedly enjoy the opportunity that these methods give me to give form to concepts that, for lack of skill (and sometimes money) were left to random thoughts in moments of boredom. Likewise, they injected a second life (or rather a third in the project’s development history) in old characters and stories: Yumiko and Satsuki’s creator actually enjoys those images quite a lot.
Like everything, it is not perfect and there are problems of all kinds. It’s getting to the level of good enough for me, and I honestly look forward to any potential improvements the future may bring. And with a technology that brings, inherently, some distruption, I guess that the debates on AI generated imagery won’t calm down any time soon, and some shift in perspective (good or bad) will have to occur.
So, this brings the journey narrated in this very long post to an end. The three real stars of it (certainly not me), Yumiko, Satsuki, and Maya, bid you farewell. Maybe they’ll pop here again with another post on AI generated art, maybe they won’t.