Sure, here’s the markdown article based on the YouTube video content you provided:
Seamlessly Morph Your Photos with Stable Diffusion and ComfyUI
This video tutorial dives into the powerful capabilities of ComfyUI and how it integrates with Stable Diffusion models to achieve remarkable image transformations. With this setup, you can take a single input image and drastically alter its context, style, and even blend multiple subjects seamlessly.
[00:01] The core technology discussed is Flux Kontext, a model designed to accept image inputs and understand natural language prompts. This allows for the creation of a consistent character that can be placed in diverse environments or rendered in various artistic styles, making it incredibly versatile.
[00:02] The workflow begins with the “Load Image” node. Here, you can upload your desired base image. The video demonstrates using an image of the speaker as an example.
[00:04] Next, you’ll need to “Set size of output”. This is controlled by the “Size” node, where you can input the desired width and height for your generated image.
[00:05] The crucial step of “Write prompt” is handled by the “CLIP Text Encode (Positive Prompt)” node. This is where you describe the scene or transformation you envision. The video shows an example prompt: “This man is sleeping in a bed.”
[00:06] Following the prompt, the “FluxGuidance” node allows you to adjust the “guidance” parameter, influencing how closely the output adheres to your prompt. A value of 2.5 is used in the demonstration.
[00:07] Finally, the “Generate” step, represented by the “Save Image” node, processes these inputs to produce the final output. The video showcases the initial image of the person in bed.
[00:11] The presenter then illustrates the flexibility of this system by showing several generated images. These include transforming the subject into an “anime character” in various settings:
- [00:20] On a tropical beach.
- [00:21] In a neon-lit city.
- [00:23] As a comic book character with a red, monstrous companion.
- [00:24] Engaged in a comic book style fight.
- [00:25] Eating a meal in a comic book style restaurant.
- [00:26] Fishing for a boot in a serene lake.
- [00:28] As a character playing a video game.
[00:39] The key takeaway is that you can manipulate the “style” of the generated image, transforming it from realistic to anime and back again, demonstrating the model’s adaptability.
[00:46] The video then transitions to explaining how to “Install models”, specifically focusing on the workflow within ComfyUI.
[00:54] For those who need it, there’s a “text guide” available in the description, with links to download the necessary models.
[01:06] To get Flux Kontext running, you’ll need to “Update to latest ComfyUI version”.
[01:18] You’ll need to download the Flux Kontext model and place it in your ComfyUI installation’s /models/diffusion_models/ directory. The video provides specific download links for different model sizes:
- Full 24GB:
https://huggingface.co/black-forest-labs/FLUX-1-Kontext-dev/tree/main - fp8 11GB:
https://huggingface.co/comfyanonymous/FLUX-1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors - GGUF 4-12GB:
https://huggingface.co/QuantStack/FLUX-1-Kontext-dev-GGUF/tree/main
[01:38] The choice of model depends on your “GPU” and its available “VRAM”. The GGUF models generally require less VRAM.
[02:08] The video then demonstrates how to install the models using the “ComfyUI Manager”. You need to install the “missing custom nodes”.
[02:11] Within the manager, you’ll find a list of available nodes.
[02:43] To find the necessary models, you can search for “all submodels” in the manager. The presenter highlights the specific “ae.safetensors” model for the VAE.
[02:53] If the models aren’t automatically recognized, you can press ‘R’ and then “refresh” to update the list.
[03:01] The workflow itself starts by loading the image using the “Load Image” node.
[03:06] Then, you set the desired output dimensions using the “Size” node. The presenter changes the dimensions to 1344x768.
[03:17] The prompt is then written in the “CLIP Text Encode (Positive Prompt)” node. The presenter changes the prompt to “This man is riding a motorcycle near a beautiful lake.”
[03:30] After these settings are configured, the generation process begins. The “SampleCustomAdvanced” node is used for sampling, and the output is then decoded by the “VAE Decode” node.
[03:43] The model accurately depicts the character with the specified attire, though the helmet is missing, as it wasn’t explicitly mentioned in the prompt.
[04:11] The presenter then experiments with a new prompt: “He’s a comic book character eating sushi in a restaurant in Japan. The style is expressionism oil painting.”
[04:51] For more complex scenarios, you can use the “Image Concatenate” node to combine multiple images. This allows for generating images with multiple subjects, like the two people walking together in New York City.
[07:57] The Flux Kontext model is demonstrated to be incredibly powerful for “realizing your prompts” and creating diverse stylistic outputs.
[09:25] This flexibility allows for generating images of characters in various scenarios, such as a “comic book character eating sushi” or two characters “walking together in New York.”
[09:35] If you prefer not to work with ComfyUI directly, the workflow can also be run from the “command line”.
This comprehensive approach to image generation using Flux Kontext in ComfyUI offers immense creative potential for artists and developers alike.