AuraFlow vs. Stable Diffusion, how do they compare

It’s not new anymore to generate images from text using deep AI. There are various models proposed for this. OpenAIs Dall-E, MidJourney or Stable Diffusion are just some examples.

It started as a trend when GAN created great images with ease. It used a discriminator network to create images.

One of the new kids on the block is AuraFlow (AF). What makes AuraFlow so attractive is the open licensing. It’s still considered beta as the version is no more than 0.1. It is completely open source so easy to get started. The diffuser model is hosted on huggingface site. Also downloading the model does not need you to login.

One of the gripes for Stable Diffusion (SD3) is the restrictive licensing. You will have to provide your information and agree to a non-commercial licensing. Although this is good enough for educational purposes, many people found it was too intrusive and against the open AI model. They always thought that open source AI is in jeopardy.

What is Stable Diffusion?

Stable Diffusion is modeled by a company called Stability AI. It was started by Emad Mostaque in 2019. Stability AI is based in London and San Francisco and as a startup it was one of the companies that was able to get a lot of funding from different venture companies. Its rise to prominence is of course Stable Diffusion which was touted to be open source but was licensed under a more restrictive licensing that open source gurus would prefer. Current version is 3 and the model in our test needed very less memory and was easily able to fit in our GPU.

What is AuraFlow?

Like we said, AuraFlow is the new kid on the block. It is released by a California based company FAL AI which was started in 2021 by Burkay Gur and Gorkem Yurtseven and licensed under Apache 2.0 licensing terms which is very generous. It is still in beta with a version number of 0.1 and may not be stable to run. Also the model itself is huge, and in our tests, it looked for 18 GB of GPU memory to load fully. I was not able to load it up on a 12 GB GPU.

Introductions out of the way, we will try to get some basic tests done with each of these models. I will start by introducing small code chunk that I used for running each of these models. There is nothing fancy in any of these codes. During the first run, they will download the model and then eventual runs they will only use it to build images based on provided prompts.

Let’s talk about Huggingface diffusers

We randomly brought up the point that we will use Huggingface diffusers. So what is it? To quote HuggingFace, Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you’re looking for a simple inference solution or want to train your own diffusion model, Diffusers is a modular toolbox that supports both.

So, to put it simply, it is a library that Huggingface provides that helps in saving pretrained models for use by other programs. We will just use couple of these models for our tests.

Get our tools ready

$ python -m venv vhugger
$ .\vhugger\Scripts\activate.bat
$ pip install wheel
$ pip install transformers accelerate protobuf sentencepiece
$ pip install git+https://github.com/huggingface/diffusers.git

The first thing we did was create a virtual environment using pip. We installed diffusers library from github. We then activate it and add libraries as mentioned above to it. Now let’s start creating the main programs. The programs look very similar and very short.

Let’s start with the one for AuraFlow (AF).

import torch
from diffusers import AuraFlowPipeline

pipeline = AuraFlowPipeline.from_pretrained(
    "fal/AuraFlow",
    torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()

prompt = "A high resolution photograph of a sloping ridge with ..."

saveFile = "./images/aura.jpg"

image = pipeline(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=3.5,
).images[0]

image.save(saveFile)

Stable Diffusion (SD3) is not too different,

import torch
from diffusers import StableDiffusion3Pipeline
from huggingface_hub import login

#login()

pipeline = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()

prompt = "A high resolution photograph of a sloping ridge with ..."

saveFile = "./images/stable.jpg"

image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=25,
    height=1024,
    width=1024,
    guidance_scale=7.0,
).images[0]

image.save(saveFile)

There is not much of a difference in these codes. The reason being both of them were using diffuser library framework. Like I said before, Stable Diffusion license is a bit more restrictive. So, we have to login to get the models. Before that we have to register with huggingface so we can get an API key. We need this the first time only. From next time onwards, application will use the cached version.

Compare output

Next on, let’s run some tests on some random generations and compare output. In all the outputs below, left side is always AuraFlow and right-side image is Stable Diffusion.

I don’t think it is fair to compare these based on just a few outputs, especially when most of the time for AuraFlow I only had one output generated.

Basic Realistic Photograph