Maciej Mróz Personal Blog

Because why not

Aug 8, 2023 - 4 minute read - Technology

My Dog Is a Superhero

My dog is not really a superhero ;) Still, the title is catchy and Generative AI can make it look like one:

Dog as Superhero 1

What’s even cooler is that we can create multiple variants around this theme quite easily:

Dog as Superhero 2 Dog as Superhero 3 Dog as Superhero 4

These images are all originating from single photo of my french bulldog but were processed using generative AI model (to be technically correct it’s a pipeline and not just one model) called Stable Diffusion XL, done using technique called inpainting, which in a nutshell fills the gaps in the image using user supplied mask. Pretty cool is you ask me, especially considering I am not really an artist. I do know my way around plenty of content creation tools (this is where demoscene/game development background and being a nerd pays off) but my artistic abilities were always … limited. With Generative AI anyone can do it, and while quality may not be what’s expected of professional artist for many uses even images above would simply be good enough.

To be totally fair, maybe not exactly “anyone”, at least today. This was done in Jupyter notebook with Python script taking straight to “diffusers” library by Stability AI (implementing diffusion pipelines on top of PyTorch), something a developer might be comfortable with but not really user friendly for non-technical folks. This was also done on a Linux system and using pretty powerful GPU with 12 GB VRAM (maybe not high end but well above average at the time of writing these words). The tooling around stable diffusion improves every day, there are already GUI tools people can use. These tools will only continue to get better and we’ll also get more and more of this implemented inside trational content creation tools.

For now, we are in a world of arcane art of prompt engineering, trading off performance for memory usage (or just not seeing CUDA OutOfMemory errors), model finetuning, LoRA (clever way of adjusting neural network behavior without fully retraining it) etc. The power we have today is already huge but at the same time entire Generative AI space is still in its infancy (that also includes large language models like GPT).

Next few years are going to be very exciting and at least a bit unpredictable, on many levels. Social and economic consequences of rapid productivity boost are hard to predict, and many jobs will be displaced or will evolve and change in nature. There will still be place for great artists, but I’d wager only people with art director/technical artist skills blend will survive in new reality simply because gruntwork will be delegated to AI running on powerful GPU. On business side it is extremely tempting because if you invest in training proprietary model this alone is unpassable barrier to entry for most competitors except ones with deepest pockets. As a context right now we’re talking hundreds of thousands of dollars in infrastructure costs just for single model training and for anyone serious about competing in this space capital requirements are going to be very high multiple of that number. AI specialists are few and work “elsewhere” so even if there aren’t as many jobs available in this space (compared to wide IT sector) I am pretty sure filling open positions is not easy, and most likely expensive. Winners are going to win big so it’s not suprising money is flowing like crazy.

On top of that, there a lot of safety concerns, in the short term simply related to what nefarious actors can do (with deepfakes being obvious example) and in longer term touching problems like safety of Artificial General Intelligence (AGI). Problem of AGI deserves a post of its own. I am not convinced that next-gen GPT will become sentient superintelligence but there are things in research labs today that might lead to one being created. If anything scares me today is the possibility of AI going rogue. For now, I guess the only right thing to do is to enjoy living in exciting times, and if you work in AI space, staying extra vigilant about what you do.