Build an AI QR Code Generator with ControlNet, Stable Diffusion, and LangChain

Build an AI QR Code Generator with ControlNet, Stable Diffusion, and LangChain
Infuse Creativity into your QR Codes with Deep Lake, LangChain, Stable Diffusion and ControlNet and Create Eye-Catching Artistic Images
- Emanuele Fenocc...
- Dominik Benk
28 min readon Apr 1, 2024Updated Apr 8, 2024

Summary

We built a tool that can generate artistic QR codes for a specific website/url with the use of Deep Lake, LangChain, Stable Diffusion and ControlNet via AUTOMATIC1111 and ComfyUI. If you want to try the code directly from our notebook, just download it from the repository.

Deep Lake is a Database for AI, designed to efficiently store and search large-scale AI data including audio, video or embeddings from text documents, which will also be utilized in this article. It offers unique storage optimization for deep learning applications, featuring data streaming, vector search, data versioning, and seamless integration with other popular frameworks such as LangChain. This comprehensive toolkit is designed to simplify the process of developing workflows of large language model (LLM), and in our case we will focus on its capability to summarize and answer questions from large-scale documents such as web pages.

Stable diffusion is a recent development in the field of image synthesis, with exciting potential for reducing high computational demand. It is primarily used for text-to-image generation, but is capable of variety of other tasks such as image modification, inpainting, outpainting, upscaling and generating image-to-image conditioned on text input. Meanwhile, ControlNet is an innovative neural network architecture that is a game-changer in managing the control of these diffusion models by integrating extra conditions. These control techniques include edge and line detection, human poses, image segmentation, depth maps, image styles or simple user scribbles. By applying these techniques, it is then possible to condition our output image with QR codes as well. In case you would be interested in more details, we recommend reading the original ControlNet article.

By combining all of this, we can achieve a scalable generation of QR codes that are very unique and more likely will attract attention. These are the steps that we are going to walk you through:

Steps

Scraping the Content From a Website and Splitting It Into Documents
Saving the Documents Along With Their Embeddings to Deep Lake
Extracting the Most Relevant Documents
Creating Prompts to Generate an Image Based on Documents
1. Custom summary prompt + LLMChain
2. QA retrieval + LLM
Summarizing the Created Prompts
Generating Simple QR From URL and inserting custom logo
Generating Artistic QR Codes for Activeloop
1. Txt2Img
  1. Content prompt
  2. Deep Lake prompt
2. Img2Img with logo
  1. Content prompt
  2. Deep Lake prompt
Generating Artistic QR Codes for E-commerce
1. Img2Img with logo - Tommy Hilfiger
2. Img2Img with logo - Patagonia
Hands On with ComfyUI
Limitations of Our Approach
Conclusion
FAQs

 
      
        1# Install dependencies
2!pip install langchain deeplake openai qrcode apify_client tiktoken  langchain-openai python-dotenv pydantic==1.10.8
3!apt install libzbar0
4!pip install qreader opencv-python 
5

 
      
        1import cv2
2import numpy as np
3import sys
4import time
5
6from langchain_community.vectorstores import DeepLake
7from langchain_openai import OpenAIEmbeddings
8from langchain.utilities import ApifyWrapper
9from langchain.text_splitter import CharacterTextSplitter
10#from langchain.document_loaders.base import Document
11from langchain.docstore.document import Document
12from langchain.chains import RetrievalQA
13from langchain_openai import OpenAI
14from langchain.chains import LLMChain
15from langchain import PromptTemplate
16import os
17from dotenv import load_dotenv
18
19load_dotenv()
20
21# Set API tokens
22os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
23os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
24os.environ["APIFY_API_TOKEN"] = os.getenv('APIFY_API_TOKEN')
25

Step 1: Scraping the Content From a Website and Splitting It Into Documents

First of all, we need to collect data that will be used as a content used to generate QR codes. Since the goal is to personalize it to a specific website, we provide a simple pipeline that can crawl data from a given URL. As an example, we use https://www.activeloop.ai/ from which we scraped 20 pages, but you could use any other website as long as it does not violate the Terms of Use. Or, if you wish to use other type of content, LangChain provide many other File loaders and Website loaders and you can personalize QR codes for them too!

 
      
        1# We use crawler from ApifyWrapper(), which is available in Langchain
2# For convenience, we set 20 maximum pages to crawl with a timeout of 300 seconds.
3apify = ApifyWrapper()
4loader = apify.call_actor(
5    actor_id="apify/website-content-crawler",
6    run_input={"startUrls": [{"url": "https://www.activeloop.ai/"}], "maxCrawlPages": 20},
7    dataset_mapping_function=lambda item: Document(
8        page_content=item["text"] or "", metadata={"source": item["url"]}
9    ),
10    timeout_secs=300,
11)
12
13# Now the pages are loaded and split into chunks with a maximum size of 1000 tokens
14pages = loader.load()
15text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0, separator = ".")
16docs = text_splitter.split_documents(pages)
17docs
18

Step 2: Saving the Documents Along With Their Embeddings to Deep Lake

Once the website is scraped and pages are split into documents, it’s time to generate the embeddings and save them to the Deep Lake. This means that we can come back to our previously scraped data at any time and don’t need to recalculate the embeddings again. To do that, you need to set your ACTIVELOOP_ORGANIZATION_ID.

 
      
        1activeloop_org = "YOUR_ACTIVELOOP_ORG_ID"
2# initialize the embedding model
3embeddings = OpenAIEmbeddings()
4
5# initialize the database, can also be used to load the database
6db = DeepLake(
7    dataset_path=f"hub://{activeloop_org}/scraped-websites",
8    embedding=embeddings,
9    overwrite=False,
10)
11
12# save the documents
13db.add_documents(docs)
14

Step 3: Extracting the Most Relevant Documents

Since we want to generate an image in the context of the given website that can have hundreds of pages, it is useful to filter documents that are the most relevant for our query, in order to save money on chained API calls to LLM. For this, we are going to leverage Deep Lake Vector Store similarity search as well as retrieval functionality.

To pre-filter the documents based on a query, choose the query that works best for you based on the type of information you have scraped from the internet

 
      
        1query_for_company = 'Business core of the company'
2
3result = db.similarity_search(query_for_company, k=10)
4result
5

For question-answering pipeline, we can then define the retriever

 
      
        1retriever = db.as_retriever(
2    search_kwargs={"k":10}
3)
4

Step 4: Creating Prompts to Generate an Image Based on Documents

The goal is to understand the content and generate prompts in an automated way, so that the process can be scalable. We start by initializing the LLM with a default gpt-3.5-turbo-instruct model and set medium temperature to introduce some randomness.

 
      
        1# Initialize LLM
2llm = OpenAI(temperature=0.5)
3

One of many advantages of LangChain are also prompt templates, which significantly help with clarity and readability. To make the output description more precise, we should also provide examples as can be seen here.

 
      
        1query = "You are a prompt generator. Based on the content, write a detailed one sentence description that can be used to generate an image"
2
3prompt_template = """{query}:
4
5Content: {text}
6"""
7
8# set the prompt template
9PROMPT = PromptTemplate(
10    template=prompt_template,
11    input_variables=["text"],
12    partial_variables={"query": query}
13)
14

The query is used to indicate some information we expect to output, text is the content provided to LLM, whereby it is supposed to provide a detailed description of the image. Additionally, to have more control over the output, we also create an alternative prompt that can generate a specific image type.

Using this, we then experimented with 2 following approaches, that differ in what kind of text is provided.

Option 1: Custom Summary Prompt with LLMChain

The idea is simple, we chain the description prompt on each filtered document and then apply it once again on the summarized descriptions. In other words, text will be a variable that is iterated during LMMChain operation.

 
      
        1# Initialize the chain
2chain = LLMChain(llm=llm, prompt=PROMPT)
3
4# Filter the most relevant documents
5result = db.similarity_search(query_for_company, k=10)
6# Run the Chain
7image_prompt = chain.invoke(result)
8image_prompt = image_prompt["text"]
9image_prompt
10

Option 2: Retrieval Question-Answering with LLM

Here we initialize QA retriever, which will allow us to ask to explain a particular concept on the filtered documents.

 
      
        1qa = RetrievalQA.from_chain_type(
2    llm=llm,
3    chain_type='stuff',
4    retriever=retriever
5)
6
7chain_answer = qa.invoke("Explain what is Deep Lake")
8chain_answer
9

The answer is then used as text in the PromptTemplate without the need for any chain.

 
      
        1answer = llm(prompt=PROMPT.format(text=answer))
2answer
3

Step 5: Summarizing the Created Prompts

We experimented with different prompt setups in the previous section, and yet there is more to explore. In case you would be interested in perfectionizing your LLM prompts even further, we have an amazing course that will provide you many useful tips and tricks. Mastering prompts for image generation is, however, more art than science. Nevertheless, by providing the LLM with examples we can see that it can do a pretty good job by generating very specific image descriptions. Here are 3 different types of prompts that we were able to generate with our approach:

1. Content Prompt

This prompt summarizes all relevant documents scraped from Activeloop into a general but detailed image description: `high-tech, futuristic, AI-driven, advanced, complex, computer-generated, robot, machine learning, data visualization, interactive, cutting-edge technology, automation, precision, efficiency, innovation, digital transformation, smart technology, science fiction-inspired.

2. Deep Lake Prompt

Here we show a Question-Answering example with a detailed image description of Deep Lake: `An aerial view of a serene, glassy lake surrounded by trees and mountains, with giant blocks of data floating on the surface, each block representing a different data type such as images, videos, audio, and tabular data, all stored as tensors, while a team of data scientists in a nearby cabin focus on their work to build advanced deep learning models, powered by GPUs that are seamlessly integrated with Deep Lake.

Step 6: Generating Simple QR From URL and Inserting Custom Logo

Before we generate the art, it is important to prepare the simple QR code for ControlNet, which can be created directly from Python code. It is important to set the error correction level to 'H’, which increases the probability of QR being readable, as 30% of the code can be covered/destroyed by an image. To generate a QR code with a logo, we created a function that takes the logo image and places it on the previously generated QR code. It is also important to note, that some of the URLs might be too long to generate a QR that is not too complicated and reliable enough for scanning. For this purpose, we can use url shorteners such as bit.ly.

 
      
        1import qrcode
2from PIL import Image
3
4def create_qrcode(url:str):
5    QRcode = qrcode.QRCode(
6        error_correction=qrcode.constants.ERROR_CORRECT_H
7    )
8
9    # taking url or text
10    url = 'https://www.activeloop.com/'
11
12    # adding URL or text to QRcode
13    QRcode.add_data(url)
14
15    # adding color to QR code
16    QRimg = QRcode.make_image(
17        back_color="white").convert('RGB')
18    return QRimg
19
20def qr_with_logo(logo_path: str, QRimg: Image.Image, output_image_name: str):
21    logo = Image.open(logo_path)
22
23    # taking base width
24    basewidth = 100
25
26    # adjust image size
27    wpercent = (basewidth/float(logo.size[0]))
28    hsize = int((float(logo.size[1])*float(wpercent)))
29    logo = logo.resize((basewidth, hsize))
30
31    # set size of QR code
32    pos = ((QRimg.size[0] - logo.size[0]) // 2,
33        (QRimg.size[1] - logo.size[1]) // 2)
34    QRimg.paste(logo, pos)
35
36    # save the QR code generated
37    QRimg.save(output_image_name)
38    return QRimg
39

Step 7: Generating Artistic QR Codes for Activeloop

First of all, we need to keep in mind that it is still very fresh and unexplored topic and the more pleasing-looking QRs you want to generate, the higher risk of not being readable by a scanner. This results in an endless cycle of adjusting parameters to find the most general setup. Many approaches can be applied, but their main difference is in ControlNet units. The highest success we had was with brightness and tile preprocessors, as well as the qrcode preprocessor. Sometimes, adding a depth preprocessor was also helpful. A great guide on how to set up the Stable-diffusion webui with ControlNet extension to generate your first QR codes can be found for example here. Nevertheless, there is no single setup that would work 100% of the time and a lot of experimenting is needed, especially in terms of finetuning the control’s strength/start/end to achieve a desirable output.

For example, in most of the QR codes we used the following setup:

Negative prompt: ugly, disfigured, low quality, blurry, nsfw
Steps: 20
Sampler: DPM++ 2M Karras
CFG scale: 9
Size: 768x768
Model: dreamshaper_631BakedVae
ControlNet
- 0: preprocessor: none, model: control_v1p_sd15_qrcode, weight: 1.1, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced
- 1: preprocessor: none, model: control_v1p_sd15_brightness, weight: 0.3, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced

In case of Img2Img, we would also need to put an inpaint mask to disable any changes to the logo.

Download and Run AUTOMATIC1111

In order to play with this interface you have to clone the official repository in you workspace and double click the webui-user.bat file.

 
      
        1!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
2

Integrate ControlNet in AUTOMATIC1111

If you want to install ControlNet you must follow the istructions given in the official repository:

Open “Extensions” tab.
Open “Install from URL” tab in the tab.
Enter https://github.com/Mikubill/sd-webui-controlnet.git to "URL for extension’s git repository".
Press “Install” button.
Wait for 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\sd-webui-controlnet. Use Installed tab to restart".
Go to “Installed” tab, click "Check for updates", and then click "Apply and restart UI". (The next time you can also use these buttons to update ControlNet.)
Completely restart A1111 webui including your terminal. (If you do not know what is a "terminal", you can reboot your computer to achieve the same effect.)
Download models (see below).
After you put models in the correct folder, you may need to refresh to see the models. The refresh button is right to your “Model” dropdown.

If you visit the official page of AUTOMATIC1111, you can see the interface from which you can select the settings you prefer.

In the above part you can choose what type of task to perform txt2img, img2img and so on, in the middle part of the interface there are settings you can change. If you install the controlnet plugin you will be able to find all the configurations that will allow you to control how much the impact of ControlNet will weigh on the final result.

qr_code_controlnet_interface_1
qr_code_controlnet_interface_2

Txt2Img - Generating QR Code From a Simple QR and Previously Created Prompt

Content prompt

Deep Lake prompt

Img2Img with logo - Generating QR Code From a QR with Logo and Previously Created Prompt

Content prompt

Deep Lake prompt

Step 8: Generating Artistic QR Codes for E-commerce

The idea here is a little different compared to the previous examples in context of Activeloop.
Now, we focus on product advertising and we want to generate a QR code only for a single URL and its product. The challenge is to generate QR code, while also keeping the product as similar to the original as possible to avoid misleading information. To do this, we experimented with many preprocessors such as the tile, depth, reference_only, lineart or styles, but we found most of them too unreliable and far from being similar to the original input. At this moment, we believe that the most useful is the tile preprocessor, which can preserve a lot of information. The disadvantage is, however, that it does not allow for many changes during control phase and the QR fit can sometimes be questionable. In practice, this would be done by adding another CotntrolNet unit:

2: preprocessor: none, model: control_v11f1e_sd15_tile, weight: 1.0, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced
Since the tile input image control is very strong, theres not much else we can do. Styles are one of the little extra adjustments possible and very useful style cheat sheet can be found here. For our purposes, however, we did not end up utilizing any of them.

Similarly as before, we generated prompts automaticaly from the given websites. We randomly selected 2 products and in the first case (Tommy Hilfiger) We added logo to the initial basic QR code while in the second case (Patagonia), we only mask the logo that is already present on the product. To see the comparison, we also provide the original input images (Sources: Patagonia, Tommy Hilfiger).

Img2Img with Logo - Generating Tommy Hilfiger QR Code

Img2Img with logo - Generating Patagonia QR Code

Step 9: Hands On with ComfyUI

A different approach from the one listed above can be obtained by taking advantage of ComfyUI which is a powerful and modular stable diffusion GUI.
The main idea is to create a schema from the proposed GUI and transform this schema into code thanks to an extension called ComfyUI-to-Python-Extension.
We need to load the Stable Diffusion and the ControlNet checkpoints we want to use.

In our case we experimented with:

Diffusion models: v1-5-pruned-emaonly, dreamshaper_8 and revAnimated_v122EOL
ControlNet models: control_v1p_sd15_brightness, control_v1p_sd15_qrcode, control_v11f1e_sd15_tile and control_v11f1p_sd15_depth.

Install ComfyUI

 
      
        1!git clone https://github.com/comfyanonymous/ComfyUI.git
2%cd ComfyUI
3!pip install -r requirements.txt
4

Add the ComfyUI plugin to transform the schema into code

 
      
        1!git clone https://github.com/pydn/ComfyUI-to-Python-Extension.git
2%cd ComfyUI-to-Python-Extension
3!pip install -r requirements.txt
4

Download all the checkpoints in the right folder

 
      
        1!wget https://github.com/efenocchi/QRCodeGenerator/blob/main/3_workflow_qr_codes2.json https://github.com/efenocchi/QRCodeGenerator/blob/main/2_workflow_qr_codes.json -P /content/ComfyUI/ComfyUI-to-Python-Extension
2!wget https://github.com/efenocchi/QRCodeGenerator/blob/main/workflow_api.py -P /content/ComfyUI/ComfyUI-to-Python-Extension
3!wget https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.safetensors -P /content/ComfyUI/models/vae
4!wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth -P /content/ComfyUI/models/controlnet
5!wget https://huggingface.co/latentcat/latentcat-controlnet/resolve/main/models/control_v1p_sd15_brightness.safetensors -P /content/ComfyUI/models/controlnet
6!wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth -P /content/ComfyUI/models/controlnet
7!wget https://huggingface.co/autismanon/modeldump/resolve/main/dreamshaper_8.safetensors -P /content/ComfyUI/models/checkpoints
8!wget https://civitai.com/api/download/models/46846?type=Model&format=SafeTensor&size=full&fp=fp32 -P /content/ComfyUI/models/checkpoints
9

If you have problems downloading the model from CivitAI try downloading it manually after logging in or download it directly from Hugging Face, in this last case remember to pay attention to choosing the right name when loading the model in the following steps.

 
      
        1# !wget https://huggingface.co/emmajoanne/models/resolve/main/revAnimated_v122.safetensors -P /content/ComfyUI/models/checkpoints
2

Run the Comfyui GUI.

If you encounter problems, run this command from the terminal. For advice and error resolution, refer to the official repository.

Since this is not a ComfyUI guide and some things may not be clear, if you want to know more, consult the official repository or some free guides like this one.

This command has been put for informational purposes only, to proceed with image generation you will not need to create the scheme from scratch with ComfyUI but directly execute the cells shown below.

 
      
        1#!python main.py
2

Below will be shown one of the schemes used, it is composed by 1 Diffusion model called v1-5-pruned-emaonly and 3 different controlnet models control_v1p_sd15_brightness and control_v11f1e_sd15_tile.

schema ControlNet ComfyUI

As illustrated, the QR Code was generated by combining the basic QR Code image with a textual input, making it easy to merge the initial image with the generated one. In this case the positive prompt was simply “a cyborg character” and the negative one "ugly, artefacts, bad". This pipeline produced the images shown below:

To export the schema and transform it to python code you must follow some different step:

Enable Dev mode Options: you need to click on the settings button (located above to the Queue Prompt text in the window that will appear when you activate ComfyUI) and select the “Enable Dev mode Options” box.
Export the schema via the button "Save (API Format)"
Put this schema in the ComfyUI-to-Python-Extension folder
Run the python file comfyui_to_python.py

As with the previous command, this explanation has also been made for informational purposes only, to proceed with image generation you will not need to create the scheme from scratch with ComfyUI and convert it into python code because this step has already been done by me.

The following functions are used to create the QR code from a text and to load a logo in the center of it:

Go to the main ComfyUI folder:

 
      
        1# if you are in Colab you can simply run
2# % cd /content/ComfyUI
3%cd ..
4!ls
5
6img = create_qrcode('https://www.activeloop.com/')
7img.save("activeloop_qr.jpg")
8img_with_logo = qr_with_logo("activeloop_logo.jpg", img, "activeloop_qr_with_logo.jpg")
9img_with_logo
10
11

Set the correct path to be able to work with ComfyUI

 
      
        1import os
2import random
3import sys
4from typing import Sequence, Mapping, Any, Union
5import torch
6
7def get_value_at_index(obj: Union[Sequence, Mapping], index: int) -> Any:
8    """Returns the value at the given index of a sequence or mapping.
9    If the object is a sequence (like list or string), returns the value at the given index.
10    If the object is a mapping (like a dictionary), returns the value at the index-th key.
11
12    Some return a dictionary, in these cases, we look for the "results" key
13    """
14    try:
15        return obj[index]
16    except KeyError:
17        return obj["result"][index]
18
19def find_path(name: str, path: str = None) -> str:
20    """
21    Recursively looks at parent folders starting from the given path until it finds the given name.
22    Returns the path as a Path object if found, or None otherwise.
23    """
24    # If no path is given, use the current working directory
25    if path is None:
26        path = os.getcwd()
27
28    # Check if the current directory contains the name
29    if name in os.listdir(path):
30        path_name = os.path.join(path, name)
31        print(f"{name} found: {path_name}")
32        return path_name
33
34    # Get the parent directory
35    parent_directory = os.path.dirname(path)
36
37    # If the parent directory is the same as the current directory, we've reached the root and stop the search
38    if parent_directory == path:
39        return None
40
41    # Recursively call the function with the parent directory
42    return find_path(name, parent_directory)
43
44def add_comfyui_directory_to_sys_path() -> None:
45    """
46    Add 'ComfyUI' to the sys.path
47    """
48    comfyui_path = find_path("ComfyUI")
49    if comfyui_path is not None and os.path.isdir(comfyui_path):
50        sys.path.append(comfyui_path)
51        print(f"'{comfyui_path}' added to sys.path")
52
53add_comfyui_directory_to_sys_path()
54

Upload all the models that you will need during the generation phase, in this case we will need a model to generate the image starting from the text and another 2-3 models to integrate the generated image into the image of our QR code

 
      
        1from nodes import (
2    KSampler,
3    CLIPTextEncode,
4    ControlNetApplyAdvanced,
5    VAEDecode,
6    CheckpointLoaderSimple,
7    LoadImage,
8    ControlNetLoader,
9    NODE_CLASS_MAPPINGS,
10    EmptyLatentImage,
11    VAELoader,
12    SaveImage,
13)
14controlnetapplyadvanced = ControlNetApplyAdvanced()
15ksampler = KSampler()
16vaedecode = VAEDecode()
17saveimage = SaveImage()
18vaeloader = VAELoader()
19
20def load_checkpoints(diffuser_model:str = "dreamshaper_8.safetensors", controlnet_1:str = "control_v11f1e_sd15_tile.pth", controlnet_2: str = "control_v1p_sd15_brightness.safetensors", controlnet_3: str = None):
21    with torch.inference_mode():
22
23        checkpointloadersimple = CheckpointLoaderSimple()
24        checkpointloadersimple_4 = checkpointloadersimple.load_checkpoint(
25            ckpt_name=diffuser_model
26        )
27        emptylatentimage = EmptyLatentImage()
28        emptylatentimage_5 = emptylatentimage.generate(
29            width=768, height=768, batch_size=4
30        )
31        controlnetloader = ControlNetLoader()
32        controlnetloader_10 = controlnetloader.load_controlnet(
33                control_net_name=controlnet_1
34        )
35
36        controlnetloader_11 = controlnetloader.load_controlnet(
37            control_net_name=controlnet_2
38        )
39        controlnetloader_12 = None
40        if controlnet_3 is not None:
41            controlnetloader_12 = controlnetloader.load_controlnet(
42                control_net_name=controlnet_3
43            )
44        vaeloader_24 = vaeloader.load_vae(
45            vae_name="vae-ft-mse-840000-ema-pruned.safetensors"
46        )
47        return checkpointloadersimple_4, emptylatentimage_5, controlnetloader_10, controlnetloader_11, controlnetloader_12, vaeloader_24
48

Choose which models to use, pass the same name as the models in the “checkpoints” and “controlnet” folders. You can decide whether to use two controlnets or three, passing the filenames for controlnet_1 and controlnet_2 or controlnet_1, controlnet_2 and controlnet_3. By default only two are used.

 
      
        1checkpointloadersimple_4, emptylatentimage_5, controlnetloader_10, controlnetloader_11, controlnetloader_12, vaeloader_24 = load_checkpoints()
2

Load the CLIP templates which, given the texts, will allow you to choose which image to generate and which effects to avoid

 
      
        1
2def load_text_and_image(prompt_text:str = None, prompt_text_negative:str = None, input_image_path: str = None):
3  torch.device("cuda" if torch.cuda.is_available() else "cpu")
4
5  with torch.inference_mode():
6
7        cliptextencode = CLIPTextEncode()
8        cliptextencode_6 = cliptextencode.encode(
9            text=prompt_text, clip=get_value_at_index(checkpointloadersimple_4, 1)
10        )
11
12        cliptextencode_7 = cliptextencode.encode(
13            text=prompt_text_negative,
14            clip=get_value_at_index(checkpointloadersimple_4, 1),
15        )
16
17        loadimage = LoadImage()
18        loadimage_14 = loadimage.load_image(image=input_image_path)
19
20        return cliptextencode_6, cliptextencode_7, loadimage_14
21

This is the heart of the generation and allows you to choose the values to attribute to the different controlnets, if you want to play with the ou to choose theparameters and see how the result varies as they vary, you can go to the custom values variable or modify the start_percent and end_percent values directly in the models.

strength: strength of controlnet; 1.0 is full strength, 0.0 is no effect at all.
start_percent: sampling step percentage at which controlnet should start to be applied - no matter what start_percent is set on timestep keyframes, they won’t take effect until this start_percent is reached.
stop_percent: sampling step percentage at which controlnet should stop being applied - no matter what start_percent is set on timestep keyframes, they won’t take effect once this end_percent is reached.

 
      
        1def run_inference(cliptextencode_6, cliptextencode_7, loadimage_14, tile_controlnet_values: float=0.5, brightness_controlnet_values: float = 0.35, depth_controlnet_values: float = 1.0, number_of_examples=3):
2  with torch.inference_mode():
3          for _ in range(number_of_examples):
4            controlnetapplyadvanced_28 = controlnetapplyadvanced.apply_controlnet(
5                strength=tile_controlnet_values,
6                start_percent=0.35,
7                end_percent=0.6,
8                positive=get_value_at_index(cliptextencode_6, 0),
9                negative=get_value_at_index(cliptextencode_7, 0),
10                control_net=get_value_at_index(controlnetloader_10, 0),
11                image=get_value_at_index(loadimage_14, 0),
12            )
13
14            controlnetapplyadvanced_27 = controlnetapplyadvanced.apply_controlnet(
15                strength=brightness_controlnet_values,
16                start_percent=0,
17                end_percent=1,
18                positive=get_value_at_index(controlnetapplyadvanced_28, 0),
19                negative=get_value_at_index(controlnetapplyadvanced_28, 1),
20                control_net=get_value_at_index(controlnetloader_11, 0),
21                image=get_value_at_index(loadimage_14, 0),
22            )
23            controlnetapplyadvanced_26 = None
24            if controlnetloader_12 is not None:
25                controlnetapplyadvanced_26 = controlnetapplyadvanced.apply_controlnet(
26                    strength=depth_controlnet_values,
27                    start_percent=0,
28                    end_percent=0.2,
29                    positive=get_value_at_index(controlnetapplyadvanced_27, 0),
30                    negative=get_value_at_index(controlnetapplyadvanced_27, 1),
31                    control_net=get_value_at_index(controlnetloader_12, 0),
32                    image=get_value_at_index(loadimage_14, 0),
33                )
34
35          if controlnetapplyadvanced_26 is not None:
36            ksampler_17 = ksampler.sample(
37                seed=random.randint(1, 2**64),
38                steps=20,
39                cfg=8,
40                sampler_name="euler",
41                scheduler="normal",
42                denoise=1,
43                model=get_value_at_index(checkpointloadersimple_4, 0),
44                positive=get_value_at_index(controlnetapplyadvanced_26, 0),
45                negative=get_value_at_index(controlnetapplyadvanced_26, 1),
46                latent_image=get_value_at_index(emptylatentimage_5, 0),
47            )
48          else:
49            ksampler_17 = ksampler.sample(
50                seed=random.randint(1, 2**64),
51                steps=20,
52                cfg=8,
53                sampler_name="euler",
54                scheduler="normal",
55                denoise=1,
56                model=get_value_at_index(checkpointloadersimple_4, 0),
57                positive=get_value_at_index(controlnetapplyadvanced_27, 0),
58                negative=get_value_at_index(controlnetapplyadvanced_27, 1),
59                latent_image=get_value_at_index(emptylatentimage_5, 0),
60            )
61
62            vaedecode_18 = vaedecode.decode(
63                samples=get_value_at_index(ksampler_17, 0),
64                vae=get_value_at_index(vaeloader_24, 0),
65            )
66
67            saveimage_29 = saveimage.save_images(
68                filename_prefix="ComfyUI", images=get_value_at_index(vaedecode_18, 0)
69            )
70

 
      
        1%cd ComfyUI
2# IF YOU ARE IN COLAB COULD BE QUICKER TO USE THE FOLLOWING PATH
3#OUTPUT_DIR = "/content/ComfyUI/output/"
4#OUTPUT_DIR_NR = "/content/ComfyUI/output/non_readable/"
5
6# IF YOU ARE NOT IN COLAB
7OUTPUT_DIR = "output/"
8OUTPUT_DIR_NR = "output/non_readable/"
9
10os.makedirs(OUTPUT_DIR_NR, exist_ok = True)
11

Choose the prompt from the first step based on your chosen company, or write a prompt from scratch

 
      
        1print(image_prompt)
2print(chain_answer)
3print(answer)
4

 
      
        1prompt_text = "a Deep Lake in the forest"
2prompt_text_negative = "ugly, bad, artifacts"
3# must be in the ComfyUI/input/ folder
4input_image_path = "activeloop_qr.jpg"
5
6cliptextencode_6, cliptextencode_7, loadimage_14 = load_text_and_image(prompt_text, prompt_text_negative, input_image_path)
7

Run the inference function and check the generated images in the output folder

 
      
        1run_inference(cliptextencode_6, cliptextencode_7, loadimage_14, tile_controlnet_values=0.5, brightness_controlnet_values=0.35, depth_controlnet_values=1.0, number_of_examples=1)
2

Depending on the CheckpointLoaderSimple model loaded (see load_checkpoints function), you will get different results with the same prompt.

Here, for example, we have done different tests with different models and we can see that with the same prompt some models generate images rich in details while others generate a QR code that is little different from the original. This makes us understand that the prompts we write are bound to the chosen model.

With v1-5-pruned-emaonly.safetensors

With dreamshaper_8.safetensors

With revAnimated_v122EOL.safetensors

Keep only truly scannable QR codes

 
      
        1import os
2from qreader import QReader
3import cv2
4import shutil
5from PIL import Image
6qreader = QReader()
7
8def keep_readable_qrcodes(folder):
9    images = []
10    for filename in os.listdir(folder):
11        path = folder + filename
12        img = cv2.imread(folder + filename)
13
14        if img is not None:
15          decoded_text = qreader.detect_and_decode(image=img)
16          if decoded_text:
17            print(decoded_text)
18          else:
19            print(f"non readable: {filename}")
20            shutil.move(path, OUTPUT_DIR_NR + filename)
21
22    return images
23

 
      
        1#images = keep_readable_qrcodes("/content/ComfyUI/output")
2keep_readable_qrcodes(OUTPUT_DIR)
3

Apply logo to generated images to make them more intriguing

 
      
        1generated_image = Image.open("<YOUR_GENERATED_IMAGE_PATH>")
2img_with_logo = qr_with_logo("activeloop_logo.jpg", generated_image, "generated_image_with_logo.jpg")
3img_with_logo
4

Limitations of Our Approach

Overall, the ControlNet model required extensive manual tuning of parameters. There are many methods to control the QR code generation process, but none are entirely reliable. The problem intensifies when you want to account for the input product image as well.
Adding an image to the input might offer more control and bring about various use-cases, but it significantly restricts the possibilities of stable diffusion. This usually only results in changes to the image’s style without fitting much of the QR structure. Moreover, we saw greater success with text-to-image compared to image-to-image with logo masks. However, the former wasn’t as desirable because we believe logos are essential in product QR codes.
From our examples, it’s evident that the generated products don’t exactly match the actual products one-to-one. If the goal is to advertise a specific product, even a minor mismatch could be misleading. Nonetheless, we believe that LoRA models or a different type of preprocessor model could address these issues.
Automated image prompts can sometimes be confusing, drawing focus to unimportant details within the context. This is particularly problematic if we don’t have enough relevant textual information to build upon. This presents an opportunity to further use the Deep Lake Vector Store to analyze the image bind embeddings for a better understanding of the content on e-commerce websites.

Conclusion: Scalable Prompt Generation Achieved, QR Code Generation Remains Unreliable

Deep Lake combined with LangChain can significantly reduce the costs of analyzing the contents of a website to provide image descriptions in a scalable way. Thanks to the Deep Lake Vector Store, we can save a large number of documents and images along with their embeddings. This allows us to iteratively adjust the image prompts and efficiently filter based on embedding similarities. Taking into account all of the limitations we’ve discussed, we believe that there needs to be more experimenting with ControlNet, in order to generated product QR codes that are reliable and applicable for real-world businesses. The choice of checkpoint remains very important and must be made based on the specific use case for which it is being generated.

I hope that you find this useful and already have many ideas on how to further build on this. Thank you for reading and I wish you a great day and see you in the next one.

FAQs

What is prompt engineering?

Prompt engineering is the practice of carefully crafting inputs (prompts) to be given to AI models, particularly language models, in order to elicit the desired output. It involves understanding how the model interprets inputs and using that knowledge to achieve more accurate, relevant, or creative responses.

What is Stable Diffusion?

Stable Diffusion refers to a specific generative artificial intelligence model designed for text-to-image synthesis. This model has the capability to generate photorealistic images based on textual input. It empowers users to create stunning artwork quickly and autonomously. Additionally, besides images, Stable Diffusion can also be used for image-to-image generation or to create videos and animations. It was originally launched in 2022.

Is Stable Diffusion free to use?

Stable Diffusion is an open-source project, which means it is freely available for anyone to use. You can access and use Stable Diffusion without any cost, subject to the terms of its open-source license.

What is ControlNet?

ControlNet is an innovative neural network architecture that integrates extra conditions to manage the control of diffusion models. These techniques include edge and line detection, human poses, image segmentation, depth maps, image styles, or simple user scribbles, allowing for conditioned output images.

What is AUTOMATIC1111?

AUTOMATIC1111 is a robust web-based user interface (WebUI) tailored for Stable Diffusion, an AI model for text-to-image generation. It provides an intuitive platform for creating remarkable images from textual prompts.

How to use ComfyUI?

ComfyUI is a powerful and modular stable diffusion GUI. To use it, you need to clone the ComfyUI repository, install its requirements, and run the main.py file. It allows for the creation of schemas from the GUI and transforms these schemas into code for image generation tasks.

What is QR code and how it works?

A QR code, or Quick Response code, is a two-dimensional barcode that stores data. It works by encoding information in black squares arranged on a white grid. When scanned by a QR code reader or smartphone camera, the encoded data is decoded and can trigger actions such as opening a website or displaying text.

How to make artistic QR codes?

Making artistic QR codes involves incorporating design elements, colors, and sometimes logos into the QR code without compromising its scan-ability. This can be achieved through specialized software or online tools that allow for the customization of QR codes while ensuring they remain functional.

What is image synthesis?

Image synthesis is the process of generating new images from textual descriptions, existing images, or a combination of both, using artificial intelligence and machine learning models. It involves creating visually coherent and contextually relevant images based on the input provided.

What are LoRA models?

LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these are trained. In the context of AI and machine learning, particularly concerning stable diffusion models, LoRA models are mentioned as potentially useful for addressing issues in generating product QR codes that match actual products more closely.

- Table of Contents
- Summary
- Steps
- Step 1: Scraping the Content From a Website and Splitting It Into Documents
- Step 2: Saving the Documents Along With Their Embeddings to Deep Lake
- Step 3: Extracting the Most Relevant Documents
- Step 4: Creating Prompts to Generate an Image Based on Documents
- Option 1: Custom Summary Prompt with LLMChain
- Option 2: Retrieval Question-Answering with LLM
- Step 5: Summarizing the Created Prompts
- 1. Content Prompt
- 2. Deep Lake Prompt
- Step 6: Generating Simple QR From URL and Inserting Custom Logo
- Step 7: Generating Artistic QR Codes for Activeloop
- Download and Run AUTOMATIC1111
- Integrate ControlNet in AUTOMATIC1111
- Txt2Img - Generating QR Code From a Simple QR and Previously Created Prompt
- Content prompt
- Deep Lake prompt
- Img2Img with logo - Generating QR Code From a QR with Logo and Previously Created Prompt
- Content prompt
- Deep Lake prompt
- Step 8: Generating Artistic QR Codes for E-commerce
- Img2Img with Logo - Generating Tommy Hilfiger QR Code
- Img2Img with logo - Generating Patagonia QR Code
- Step 9: Hands On with ComfyUI
- With v1-5-pruned-emaonly.safetensors
- With dreamshaper_8.safetensors
- With revAnimated_v122EOL.safetensors
- Keep only truly scannable QR codes
- Apply logo to generated images to make them more intriguing
- Limitations of Our Approach
- Conclusion: Scalable Prompt Generation Achieved, QR Code Generation Remains Unreliable
- FAQs
- What is prompt engineering?
- What is Stable Diffusion?
- Is Stable Diffusion free to use?
- What is ControlNet?
- What is AUTOMATIC1111?
- How to use ComfyUI?
- What is QR code and how it works?
- How to make artistic QR codes?
- What is image synthesis?
- What are LoRA models?

- Previous
- - - Blog
    - News
  - Low AWS GPU usage? Achieve up to 95% GPU utilization in SageMaker with Hub
  - on Dec 3, 2021
- Next
- - - Blog
    - LangChain
  - Use OpenAI CLIP, LangGraph, & RAG to Generate Competitive Restaurant Insights
  - on Mar 21, 2024

Build an AI QR Code Generator with ControlNet, Stable Diffusion, and LangChain

Summary

Steps

Step 1: Scraping the Content From a Website and Splitting It Into Documents

Step 2: Saving the Documents Along With Their Embeddings to Deep Lake

Step 3: Extracting the Most Relevant Documents

Step 4: Creating Prompts to Generate an Image Based on Documents

Option 1: Custom Summary Prompt with LLMChain

Option 2: Retrieval Question-Answering with LLM

Step 5: Summarizing the Created Prompts

1. Content Prompt

2. Deep Lake Prompt

Step 6: Generating Simple QR From URL and Inserting Custom Logo

Step 7: Generating Artistic QR Codes for Activeloop

Download and Run AUTOMATIC1111

Integrate ControlNet in AUTOMATIC1111

Txt2Img - Generating QR Code From a Simple QR and Previously Created Prompt

Content prompt

Deep Lake prompt

Img2Img with logo - Generating QR Code From a QR with Logo and Previously Created Prompt

Content prompt

Deep Lake prompt

Step 8: Generating Artistic QR Codes for E-commerce

Img2Img with Logo - Generating Tommy Hilfiger QR Code

Img2Img with logo - Generating Patagonia QR Code

Step 9: Hands On with ComfyUI

With v1-5-pruned-emaonly.safetensors

With dreamshaper_8.safetensors

With revAnimated_v122EOL.safetensors

Keep only truly scannable QR codes

Apply logo to generated images to make them more intriguing

Limitations of Our Approach

Conclusion: Scalable Prompt Generation Achieved, QR Code Generation Remains Unreliable

FAQs

What is prompt engineering?

What is Stable Diffusion?

Is Stable Diffusion free to use?

What is ControlNet?

What is AUTOMATIC1111?

How to use ComfyUI?

What is QR code and how it works?

How to make artistic QR codes?

What is image synthesis?

What are LoRA models?

Low AWS GPU usage? Achieve up to 95% GPU utilization in SageMaker with Hub

Use OpenAI CLIP, LangGraph, & RAG to Generate Competitive Restaurant Insights