Hi there, do you have any info you can share on your process? What do you use for these generations and how do you use them? What do you mean when you talk about touch-ups that aren't done in editing programs like Gimp? How do you know for a fact that the generations aren't causing enviromental damage and the data sets aren't stolen? I'm super super super interested in AI image generation, but I want it to be ethical and controlled like you're saying you're doing. You're honestly my dream AI user haha. So I'd love a guide <3
Oh, sure!! So, basically, these days, a lot of data sets don't use stolen data anymore because it leads to worse outcomes since the data isn't controlled at selection or groomed afterwards, relying on sheer volume instead of training quality. Like, the big mainstream models still do, as do plenty of style imitators, etc, I don't want to pretend this isn't still a problem. But, you can find a lot of models that are trained on really tiny data sets of volunteer, public domain, and licensed work.
Those are the ones I use, and I find it's easiest to use Civitai's model search with keywords like "cc0" and "public domain." I recommend Civitai for two reasons. 1) It has preview images that give you a sense of the model's visual style. And 2) because the models there are what's called "open weight."
Open weight models are the ones you can run on your own hardware at home. That is how I can guarantee no environmental damage beyond what is already done to sustain me as a USAmerican citizen: I own the hardware itself.
It's a entry-to-mid range gaming desktop with a 600w PSU. I know when it is run, and how much electricity (very little) and water (none) is used every time it is run.
In fact, that's also how I know that most high intensity video games are more resource intensive than the AI models I use, too. Because most of those games won't run at all on a rig this cheap, and the ones that do run like shit and spike my power draw way higher anyway. Same for 3D modelling software. The resource usage by AI is a lot more comparable to 2D software like video editors.
When I started this blog, I didn't actually know the terms for what I meant by "touch ups," but I can give you more detail now, haha!
I meant 3 things: in-painting, posing, and upscaling.
In-painting is when you tell the AI model "I only want the prompt I am telling you now to apply to this specific area in the image." That way, you can make very fine, granular changes to things like how fingers are held, how hair is moved, how eyes are shaped, etc.
Posing is when you give the AI access to a reference image with a pose you want (I also only use my own images or public domain ones for this, pexels is my favourite source), and tell it how to extract the pose. It can then limit images generated to include that pose!
Upscaling is what it sounds like: it makes a small image bigger. Is also, however, analyzes images in small parts and as a whole (which means it takes way longer). With the right tuning and prompts, you can then get it to add a bunch of additional details like texturing and lighting effects that are "coherent" with the rest of the image!
The most recent set I did for a commissioner had 25 "images" in the set at the end. It took over 3000 generations (most of which were very small generations to correct fine details, of course) to get the 25 finals how I wanted them, and then of course I had to do my manual corrections in GIMP for each final, as well.
That's why my first and final generations in each set look so dramatically different! There's literally thousands of steps between A and B.
But, even so, there's really only about 5 stages.
If I'm not starting with a sketch or reference image of my own making, then I'll start with a generation that I like some compositional factors of.
So, I make a general prompt detailing things like lighting conditions, medium, genre, number of subjects, and maybe some light descriptions of the subjects, like "human man with dark skin and long hair" or "snow leopard with white fur and blue spots."
If I am starting with an image I made myself, well, I just skip this step lmao
This is where I refine the compositional elements into a final layout, building off whatever elements I liked from the spitballing. A lot of the time, I'll take several spitballs and collage them in GIMP or another software first, then take the collage back into my AI software. Other times I'll use in-painting for this, since the AI software I use (InvokeAI Community edition) supports layers.
Going from an image with composition I like to an image with subjects I want in that composition tends to be the part I dislike the most, because it's the part where you have to really build a lot of technical skill with very little guidance. You see, every model has slightly different translations of the words in your prompts, so every model has a "style" it wants you to prompt in to use it effectively. And bestie, figuring out that style is just a whole ass load of trial and error, and it sucks.
It also, by the way, is extremely reminiscent of the same goddamn skill wall in every artform, which is why I am so struck by the sheer ignorance of people who refuse to understand it. It's not actually that fucking easy!! Even if it was that easy it would still be human art because a human triggered the artistic tool, but whatever. It's not even that easy of a fucking tool!!!!!!
Drafting is where I go from compositional prompts to detail oriented prompts, which means it's also where I do most of my in-painting. This is the point where I begin to highlight specific parts of the face, body, and other focal point(s) at really tiny degrees and my prompts start looking more like, "dark skin, black skin, dark brown skin, high contrast reflections on skin, bare forehead, hairline, babyhairs, curly hair, coily hair, coiled hair, kinky hair, afrotexture hairline with baby hairs, black hair curling over brown skin," and carry on like that for um.
After I get everything settled how I want it, and take in the whole image in all it's glory? Oh boy, wow, everything is wrong lmao.
This is the stage where I correct things like "this limb is attached at a slightly wrong angle" and "Well human legs simply are not that long, sir."
This part is the part I actually enjoy the most, because the image is really starting to take shape towards the goal I had in mind at the beginning, and it's like. Very motivating to see it get more and more real!
This is where I upscale the image on various levels of structural and creative divergence, and collage together the ones I like best.
This is where I either correct the mistakes left behind by the collage process in the AI software, or, the point at which I swap back to traditional software and just do a paint-over, depending on how I am feeling that day and about that piece.