You’ve no doubt noticed the plethora of AI art generators that have emerged over the last year or so: super-intelligent engines that can create images that look like real photographs or works of art created by real people. As time goes on, they become more and more powerful and add more features – now you can even find an AI art tool in Microsoft Paint.
Novel to the DALL-E AI image model, available to ChatGPT Plus members who pay $20 per month, is the ability edit parts of the imagejust like you can do in Photoshop: you no longer have to regenerate a completely up-to-date photo just because you want to change one element in it – you can show DALL-E the part of the image you want to adjust, give it some up-to-date instructions, and leave everything else alone.
It overcomes one of the vital limitations of the art of AI, which is that each image (and video) is something completely unique and different, even if you employ identical prompts. This makes it hard to achieve consistency in images or refine an idea. However, these creators of AI art, based on the so-called diffusion modelsI still have many limitations to overcome – as we will show here.
Editing images in ChatGPT
If you are a ChatGPT Plus subscriber, you can load the app on the web or on your mobile phone and request a photo of anything: a cartoon dog detective solving a case in a cyberpunk setting, a hilly landscape with a lone figure halfway in the distance, and storm clouds gathering overhead, or whatever it is Is. After a few seconds you will receive a photo.
To edit a photo, you can now click on the generated image and then on To choose in the upper right corner (looks like a pen drawing a line). You then adjust the size of the selection tool using the slider in the upper left corner and draw the part of the image you want to change.
Editing interface in ChatGPT
Source: Lifehaker
This is a significant step forward: you can leave part of the image untouched and simply refresh the selection. Previously, if you sent a prompt asking to change one specific part of an image, the entire image would be regenerated and would likely look much different than the original.
Once you have made your selection, you will be asked to enter up-to-date instructions for only the highlighted part of the image. As usual with AI graphics tools, the more specific you are, the better: you can ask a person to look happier (or less content), or to color a building differently. The requested changes will then be applied.
Success! ChatGPT and DALL-E replace one dog with another.
Source: Lifehacker / DALL-E
From my experiments, ChatGPT and DALL-E employ the same type of AI tricks we’ve seen in apps like Google’s Magic Eraser: intelligently filling in the background based on existing information in the scene, while trying to leave everything beyond the selection untouched.
It’s not the most advanced selection tool, and I noticed inconsistencies in the borders and edges of objects – which was perhaps to be expected given how much control selection provides. For the most part, the editing feature worked well enough, although by no means reliable every time, which is undoubtedly something OpenAI will want to improve in the future.
Where the art of AI reaches its limits
I tried out a up-to-date editing tool to perform various tricks. It was good at changing the color and position of the dog in the meadow, but less so at shrinking the size of the giant man standing on the castle walls – the man simply disappeared into a blur of fragments of the rampart, suggesting that the AI was trying to paint around him, without much success.
In a cyberpunk setting, I asked for a car, but no car showed up. In another scene in the castle, I asked for the flying dragon to be turned the other way, from green to red, with flames coming from its mouth. After a few moments of processing, ChatGPT removed the dragon completely.
Fail! ChatGPT and DALL-E erased the dragon instead of changing it.
Source: Lifehacker / DALL-E
This feature is still brand up-to-date, and OpenAI doesn’t claim it can replace human image editing yet – because it clearly can’t. It will improve, but these errors aid show where the challenges lie with certain types of AI-generated art.
What DALL-E and models like it are very good at is the ability to arrange pixels to provide a good approximation of a castle (for example) based on the millions (?) of castles it has been trained on. However, the AI doesn’t know what a castle is: it doesn’t understand geometry or physical space, which is why my castles have turrets sticking out of nowhere. You’ll notice this in many AI-generated graphics that include buildings, furniture, or other objects that aren’t rendered properly.
It’s quite white, but far from “plain.”
Source: Lifehacker / DALL-E
In essence, these models are probability machines that don’t (yet) understand what they’re actually showing: that’s why in many of OpenAI Sora’s videos, people disappear into nowhere because the AI is very clever at arranging pixels rather than humans tracking them. You may have also read about artificial intelligence struggles with creating photos different-race pairs because same-race pairs are more likely to occur according to the image training data.
Another quirk that has recently been noticed is the inability of AI graphics generators create a plain white background. They are incredibly brainy tools in many ways, but they don’t “think” the way you or I would, and they don’t understand what they’re doing in the same way a human artist would – and it’s vital that bear this in mind when using them.