Image-to-Image Prompting
Transform existing images through AI by providing reference inputs combined with text instructions — controlling style, content, and the degree of transformation to produce targeted visual outputs.
Introduced: Image-to-image (img2img) generation became widely accessible with the release of Stable Diffusion in 2022. The technique uses an existing image as a starting point for the diffusion process, allowing the AI to transform, restyle, or enhance images while retaining structural elements from the original. Rather than generating from pure noise, the model partially destructs the input image by adding controlled noise, then reconstructs it guided by a text prompt. The denoising strength parameter controls how much the output deviates from the input — a concept that gave users precise, slider-based control over the balance between faithfulness and creative transformation.
Modern LLM Status: Image-to-image is a standard workflow in all major image generation platforms including Stable Diffusion, Midjourney, and DALL-E. The technique forms the foundation for more specialized approaches like inpainting (editing specific regions), outpainting (extending image boundaries), and ControlNet workflows (guiding generation with structural maps). Every serious image generation pipeline now supports img2img as a core capability, and it remains the primary method for iterative visual refinement, style transfer, and concept art pipelines.
Start from an Image, Not from Noise
Standard text-to-image generation begins with pure random noise and gradually refines it into a coherent image guided only by a text prompt. Image-to-image flips this starting point: instead of noise, the model begins with an existing image. It partially destructs that image by adding a controlled amount of noise, then reconstructs it under the guidance of a new text prompt. The result is a transformation that preserves structural elements from the original while applying the creative direction specified in the prompt.
The critical control is the denoising strength slider. This single parameter determines how much the output can deviate from the source image. At low values (0.2–0.4), the model makes subtle adjustments — cleaning up lines, refining details, or applying minor stylistic shifts while preserving most of the original composition. At high values (0.7–0.9), the model dramatically transforms the image, keeping only the rough layout and proportions while reimagining everything else. This gives practitioners precise, predictable control over the balance between faithfulness to the original and the degree of creative transformation.
Think of it like a painter working over a pencil sketch. A light touch preserves the sketch’s lines while adding color and detail. A heavy hand covers the sketch entirely, using it only as a loose compositional guide for something entirely new.
Text-to-image generation is inherently unpredictable — the same prompt can produce wildly different compositions across runs. Image-to-image solves this by anchoring generation to a known starting point. You control the composition, the subject placement, and the overall layout through your reference image, while the prompt controls the aesthetic and thematic transformation. This combination of visual anchoring and textual direction makes img2img far more controllable than pure text-to-image generation for iterative creative workflows.
The Image-to-Image Process
Four stages from reference image to transformed output
Provide Reference Image
Upload the source image that will serve as the structural foundation for the transformation. This can be anything from a rough pencil sketch, a photograph, a previous AI generation, or a digital mockup. The reference image defines the composition, spatial layout, and proportional relationships that the model will work from.
A hand-drawn pencil sketch of a two-story building with large windows, a pitched roof, and surrounding landscaping.
Write Transformation Prompt
Describe the desired output, emphasizing the changes you want from the original rather than restating what is already in the image. Focus on the target style, medium, lighting, color palette, and any specific modifications. The prompt steers the reconstruction — think of it as instructions for how the model should reinterpret the reference during the denoising phase.
“Photorealistic architectural rendering, modern glass and steel building, golden hour lighting, lush green landscaping, professional photography, 8K resolution.”
Set Denoising Strength
Control how much the output can deviate from the source image. This is the most important parameter in the entire img2img workflow. Low values (0.2–0.4) preserve most of the original’s detail and make only subtle adjustments. Medium values (0.4–0.6) allow significant stylistic changes while maintaining the overall structure. High values (0.7–0.9) dramatically transform the image, retaining only rough composition and proportions.
For sketch-to-rendering: start at 0.65 to allow enough creative freedom for the model to add photorealistic detail while keeping the building’s layout intact.
Iterate with Adjustments
Review the output and refine by adjusting the denoising strength, modifying the prompt, or using a previous output as the new input for another pass. Iteration is central to the img2img workflow — each pass can bring the result closer to your vision. You can also feed the output back as a new reference for progressive refinement, gradually building toward the final result across multiple generations.
First pass at 0.65 produces a good rendering but the windows are too small. Lower denoising to 0.3 and add “large floor-to-ceiling windows” to the prompt. Run again using the first output as the new reference.
See the Difference
How denoising strength transforms the same sketch
Low Denoising (0.3)
A rough pencil sketch of a building with large windows and surrounding trees.
The sketch’s lines are cleaned up and refined. Pencil strokes become smoother, proportions are corrected slightly, and basic shading is added. The output still clearly reads as a sketch — the medium and style are preserved while the execution is improved. Fine details from the original, including line weight and hatching patterns, remain recognizable.
High Denoising (0.75)
“Photorealistic architectural rendering, modern glass and steel building, golden hour lighting, lush landscaping, professional photography.”
A fully photorealistic architectural rendering that preserves the sketch’s overall layout and proportions — the building’s footprint, window placement, and tree positions match the original composition. But every surface is now rendered with realistic materials: glass reflecting the sky, steel beams catching golden hour light, detailed landscaping with individual leaves and grass blades. The pencil sketch is gone; only its structure remains.
Image-to-Image in Action
Three practical transformation workflows
A hand-drawn pencil sketch of a fantasy castle on a cliff, with turrets, a drawbridge, and a winding path leading to the entrance. The sketch is rough but clearly conveys the spatial arrangement and scale of the structure.
Prompt: “Detailed digital illustration, fantasy castle perched on dramatic cliffs, epic scale, volumetric lighting, atmospheric fog, rich color palette, concept art quality, matte painting style.”
Denoising Strength: 0.70
Result: The rough sketch transforms into a polished digital illustration. The castle’s position on the cliff, the turret placement, and the winding path all match the original layout. But now every surface has texture — weathered stone walls, moss-covered battlements, volumetric fog rolling through the valley below. The sketch provided the composition; the prompt and high denoising provided the finish.
A photograph of a tree-lined suburban street in full summer — green canopy, bright sunlight, green lawns, a clear blue sky. The street has parked cars, houses with front porches, and a sidewalk running along both sides.
Prompt: “Same street scene in deep winter, heavy snowfall, bare tree branches, snow-covered roofs and lawns, overcast sky, warm light glowing from house windows, fresh tire tracks in the snow.”
Denoising Strength: 0.55
Result: The street layout, house positions, car placement, and sidewalk structure remain intact from the original photograph. But the season has changed entirely: green leaves become bare branches, lawns are blanketed in snow, the sky shifts from blue to overcast grey, and warm interior light spills from the windows. The moderate denoising strength preserves the exact spatial arrangement while allowing the seasonal transformation to feel natural and complete.
A portrait photograph of a person sitting in a garden, natural lighting, the subject centered in the frame with flowering bushes and a wooden fence in the background. Standard photographic quality with sharp focus on the subject.
Prompt: “Oil painting in the style of the Impressionists, visible brushstrokes, soft edges, vibrant dappled light, rich color palette with blues and warm yellows, canvas texture, gallery-quality fine art painting.”
Denoising Strength: 0.60
Result: The photograph’s composition is preserved — the subject’s pose, the garden layout, and the spatial relationships remain the same. But the photographic medium is replaced entirely with oil painting characteristics: visible brushstrokes define the flowering bushes, the subject’s features are softened with Impressionist handling, dappled light plays across the scene with broken color technique, and the entire surface has a canvas-like texture. The moderate denoising allows the style to change completely while the composition stays anchored to the original photograph.
When to Use Image-to-Image
Best for controlled transformations with a known starting point
Perfect For
When you have an image that is close to what you need but requires refinement, style adjustment, or quality improvement — img2img lets you evolve rather than start over.
Converting rough hand-drawn sketches, wireframes, or doodles into polished digital illustrations, renderings, or photorealistic outputs while preserving the original composition.
Transforming photographs into different artistic media — oil paintings, watercolors, anime, pixel art — while maintaining the subject, composition, and spatial relationships.
Applying the same stylistic transformation across a series of source images for consistent visual output — such as converting a product catalog to a unified illustration style.
Skip It When
If you have no reference image and want the model to create a scene entirely from your text description, standard text-to-image is the correct approach.
If the source image has nothing worth preserving — no useful composition, layout, or subject placement — then img2img adds complexity without benefit over pure text-to-image.
When you need specific pixels preserved exactly as they are, img2img will always introduce some variation. For precise edits to specific regions, use inpainting instead.
Use Cases
Where image-to-image delivers the most value
Concept Art Pipeline
Artists sketch rough compositions by hand, then use img2img to rapidly explore different rendering styles, lighting conditions, and color palettes — iterating from thumbnail to finished concept in a fraction of traditional timelines.
Photo Enhancement
Improve photograph quality by using low denoising strength to refine lighting, sharpen details, reduce noise, or subtly adjust the mood of existing photos without altering their composition or subject matter.
Seasonal Marketing Variants
Transform a single product or brand image across seasons — summer to winter, day to night, spring to autumn — creating campaign-ready visual variants from one source photograph while maintaining brand-consistent composition.
Architectural Sketch to Render
Convert architectural hand-drawn sketches or simple 3D wireframes into photorealistic building renderings, allowing architects and clients to visualize designs before committing to full 3D modeling.
Design Iteration
Use each generation as the input for the next pass, progressively refining details, adjusting elements, and converging on the final design through multiple controlled iterations rather than one-shot generation.
Historical Photo Colorization
Transform black-and-white or faded historical photographs into vivid, colorized versions using low denoising strength and prompts specifying realistic color palettes appropriate to the era and subject matter.
Where Image-to-Image Fits
Image-to-image bridges pure generation and precise editing
The most effective img2img workflows use multiple passes at different denoising strengths. Start with a high-denoising pass (0.7–0.8) to establish the overall look and feel, then feed that output back as the reference for a lower-denoising pass (0.3–0.4) to refine details without losing the composition you have established. This progressive approach gives you both creative freedom and fine-grained control in a single pipeline.
Related Techniques
Explore connected image generation techniques
Transform Your Images
Apply image-to-image techniques to your creative workflow or explore other visual generation frameworks in the Praxis library.