OpenAI Dall-E 2 – the new version of Dall-E

Since I wrote my last blog, when I had already sent a request for evaluating the new version, I received it some days after. Unfortunately, due to a lot of different engagements, I wasn’t able to test it out. Now that I got some time, I wanted to run it through some images that I generated.

So the Dall-E trial comes with fifty free images that you can generate and then you can buy your own package of images based on your need. This seems to be a fairly nice deal if you want to create artwork for some site and do not want to draw the entire thing yourself.

According to the website of OpenAI, DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles. DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account.

Sample Generations

Imagine you wanted to see how a renaissance artist would interpret Leonardo Da Vinci having dinner with Monalisa. I asked Dall-E 2 to interpret this concept. Out of the four generated paintings, this was something that was commendable.

a renaissance painting of monalisa having dinner with leonardo da vinci

Of course, it is not limited to renaissance painting. How about Dali and his surrealistic style? Let’s try out one of that too. Imagine a dog and Dali’s famous melting watch molded in one.

a surrealist dream-like oil painting by Salvador Dalí of a dog climbing a melting watch

Dall-E generates a lot of variations of the same image. I am putting here just one image from the set that to me is the best. Since perspective is different for everyone, I do not anticipate these to be the best images possible.

Suppose I am writing an article on the effects of nuclear reactors to human population I want to represent a image that shows a nuclear reactor in the background and a sole growing plant in the foreground to show the power of life. Let’s ask Dall-E to create an image for us.

An oil painting of a small live plant in front of deserted nuclear facility

If you see, it makes it so much easier for a non-artist to represent a concept in an image. That is where this ML program shines. It brings your concepts to life.

Till now we looked at renaissance style and a surrealistic style of painting. What about a realistic 3D style of painting? Or a watercolor painting? Let’s try to imagine both of these formats below.

3d render of a diver jumping from top of Niagara falls
watercolor painting of a man praying in front of a cross with a forest backdrop

Now if all of these are not impressive enough, let us see the rendering of the same image in different mediums.

Same image in different mediums

What I tried to do here is to create two different paintings for the same concept with different medium of colors. One of these is colored in pastel, while the other one is a oil painting. See the difference of styles generated by Dall-E.

pastel painting of a quaint little village during sunset with a farmer tending his farm
oil painting of a quaint little village during sunset with a farmer tending his farm

As you can see, they are generated quite differently. So, the mediums are identified individually, and images are drawn according to specification of the medium.

Gotchas

Does everything work as expected? Sometimes I have seen the image drawn is different from the concept requested. For example, see the image below.

a big frog holding a spear sitting on a lion in front of a live audience

There are quite a few things wrong with this render. We asked to see a big frog holding a spear. What we see is a lion holding a spear. We wanted to see a frog sitting on a lion – not what is drawn. We wanted a live audience backdrop; however, we do not have any. So it doesn’t look like everything works as expected, but what it gives us is awesome.

Conclusion

So does this ML rendering software work? Based on what I see, it is definitely a big jump from generative analysis of images. The paintings rendered are varied and well executed based on concept that is required. So, I would definitely say this is a great advancement in machine learning for generation of images based on natural language text entered. Hope you found this useful. Ciao for now!