Offset: 0.0s
Space Play/Pause

How to Train a Z-Image-Turbo LoRA with AI Toolkit

In this tutorial, we’ll explore how to train a Z-Image Turbo LoRA using the AI Toolkit. This process leverages powerful diffusion models to generate images from text descriptions and can be fur…

5 min read

Mastering Z-Image-Turbo: A Guide to Training LoRA with AI Toolkit

Welcome to an in-depth guide on training the powerful Z-Image-Turbo model using the AI Toolkit. If you’ve been eager to fine-tune this incredibly fast and efficient image generation model, you’re in the right place. We’ll walk through the entire process, from understanding the model’s unique properties to creating your own custom LoRA.

[0:08.835]

The Z-Image-Turbo is a groundbreaking model that has recently captured the attention of the AI art community. It’s a powerful tool for generating high-quality images with remarkable speed. In this tutorial, we will explore how to train a custom LoRA on this model using the user-friendly AI Toolkit.

[0:15.543]

Z-Image-Turbo is a distilled version of the Z-Image model, optimized for speed. It can generate images in as few as 8 steps. While the developers plan to release Z-Image-Base, the ideal model for training, the Turbo version can be trained with a specific approach. This model excels at photorealistic image generation and is surprisingly compact with only 6 billion parameters, making it much smaller than contemporaries like Flux.

“Z-Image is a powerful and highly efficient image generation model with 6B parameters. Currently there are three variants:

  • Z-Image-Turbo - A distilled version of Z-image that matches or exceeds leading competitions with only 8 NFEs (Number of Function Evaluations).
  • Z-Image-Base - The non-distilled foundation model.
  • Z-Image-Edit - A variant fine-tuned on Z-Image specifically for image editing tasks.”

[1:02.903]

Training directly on a distilled or “turbo” model is tricky. The distillation process, which makes the model fast, breaks down during traditional training. A similar challenge was presented with the Flux Schnell model. To solve this, a special training adapter was developed. This adapter allows for direct training on the turbo model without compromising its distilled nature, a technique that has now been adapted for Z-Image-Turbo.

[3:04.975]

To train Z-Image-Turbo before its base model is released, we need a Z-Image-Turbo Training Adapter. This specialized LoRA essentially “de-distills” the model during the training process. It allows your new LoRA to learn a specific style or character without the base model’s distillation breaking down and creating artifacts.

[3:19.863]

Here’s a visual breakdown of why the adapter is crucial. The Base image shows the model’s default output. Training without an adapter results in a degraded image with strange artifacts because the model is trying to revert to a non-distilled state. However, training with the adapter preserves the image quality while successfully applying the new information from your LoRA. When you’re ready to generate images (inference), you simply remove the training adapter, leaving your newly trained LoRA to work perfectly with the fast, distilled Z-Image-Turbo model.

[9:00.675]

To begin, we’ll create a dataset. For this example, we’re using a collection of children’s drawings. The goal is to teach the model this specific, “bad art” style. Instead of using a trigger word, we’ll rely on descriptive captions for each image. This method teaches the model to associate common concepts (like “a house” or “a person”) with the new drawing style.

[10:10.875]

In the AI Toolkit, setting up the job is straightforward.

  1. Name your training job (e.g., “z_image_turbo_childrens_drawings”).
  2. For the Model Architecture, select Z-Image Turbo (w/ Training Adapter). This automatically configures the base model and the necessary training adapter LoRA.
  3. Most default settings are fine, but for a drastic style change like this, a few advanced tweaks can help. We will increase the steps to 5000 to give the model more time to learn.

It’s crucial not to set the learning rate higher than the default 1e-4, as higher rates have been shown to “explode” the model during training.

[11:47.885]

An experimental but powerful feature is Differential Guidance. In normal training, the model’s knowledge gradually approaches the target but never fully reaches it. Differential Guidance amplifies the difference between the current state and the target, essentially creating an “overshot” target. This encourages the model to learn more aggressively, resulting in a LoRA that masters the target style more thoroughly. For this training, we enable it with a scale of 3.

[20:01.375]

Since we are teaching the model an entirely new style, which affects the entire image composition, it’s beneficial to focus the training on the early, high-noise steps. By setting the Timestep Bias to High Noise, we tell the model to prioritize learning the overall structure and feel of the children’s drawings, leading to a more coherent and effective style transfer.

[21:22.844]

After about 3,000 steps, the results are fantastic. The model has successfully learned the desired style. What were once photorealistic images are now charmingly crude drawings. The LoRA has effectively overridden the model’s base knowledge, transforming its output to match our dataset.

[23:34.197]

With our trained LoRA, we can head over to ComfyUI to generate some images. We load the base Z-Image-Turbo model and then apply our new LoRA. The beauty of this setup is that we can still use the turbo settings: just 8 steps and a guidance scale of 1.

[24:30.337]

The results speak for themselves. A prompt like “wide angle view of a dog flying a plane in the pilot seat” generates a perfect child’s drawing of the scene. The model now interprets all prompts through the lens of our trained style.

[24:54.401]

This technique opens up a world of creative possibilities. By training Z-Image-Turbo on any style you can imagine, you can combine its incredible speed with a completely custom aesthetic. Whether you’re aiming for a specific artistic movement or just want to make a photorealistic model draw like a child, this method provides a fast and effective way to do it.