WAN Prompts Are Natural Language
Unlike image models trained on Danbooru tags (like Stable Diffusion 1.5 or SDXL), WAN video models respond to natural language descriptions. Write prompts as sentences, not comma-separated tags.
Image model style (don't do this for WAN):
1girl, long hair, blue eyes, school uniform, wind, smile, masterpiece, best quality
WAN style (do this):
A girl with long hair and blue eyes wearing a school uniform. The wind blows her hair gently as she smiles at the camera. Anime illustration style.
Tags like masterpiece and best quality have no effect on WAN — they are training artifacts from Danbooru-tagged image models.
Describe Motion Explicitly
The biggest difference between image and video prompts is that video prompts must describe what moves.
Weak (no motion description):
A girl standing in a field of flowers. Anime style.
This might produce a nearly static video because you didn't describe any movement.
Better (explicit motion):
A girl standing in a field of flowers. The wind blows the flowers and her hair sways gently. She turns her head slightly to the right. Anime illustration style.
Motion Keywords That Work
- Subtle: "hair sways", "breathing", "blinking", "slight head tilt", "fabric rippling"
- Moderate: "turns to look", "walks forward", "reaches out hand", "nods"
- Dynamic: "running", "jumping", "dancing", "spinning around"
Start with subtle motion. More complex motion is harder for the model to get right.
Camera Work
You can describe camera movement in your prompt:
- Safe choices: "static camera", "the camera slowly pans to the right", "tracking shot following the character"
- Risky for short clips: "the camera pushes in", "dolly zoom", "camera orbits around" — these can cause issues in clips under 10 seconds because the camera runs out of range and reverses direction
For your first videos, use static camera or no camera instruction at all.
Specify the Art Style
WAN models can produce a range of styles from anime to semi-realistic. If you want anime output, say so explicitly:
- "anime illustration style"
- "cel-shaded, flat colors"
- "2d anime"
Without a style instruction, the model may drift toward a more realistic look, especially for faces.
Negative Prompts
Keep negative prompts simple and focused on what actually goes wrong:
live action, realistic, photo, 3d render, blurry, distorted, ugly, low quality, text, watermark
Don't overload negative prompts with dozens of tags — WAN responds better to clear, specific negatives than long kitchen-sink lists.
Common Mistakes
Too many instructions at once
The model handles one scene, one subject, one action well. Asking for complex multi-character interactions or scene transitions in a single 5-10 second clip usually fails.
Conflicting motion
Don't combine contradictory instructions: "standing still" + "walking forward", or "static camera" + "camera orbits around".
Overly long prompts
Keep prompts concise — 2-4 sentences is usually enough. Very long prompts dilute the model's attention and produce generic results.
Expecting image-model quality tags to work
masterpiece, best quality, ultra-detailed, absurdres — these do nothing for WAN. Describe what you want to see instead.
T2V vs I2V Prompts
If you are using Image-to-Video (I2V), your prompt should describe only the motion, not the appearance. The appearance is already defined by the input image.
- T2V prompt: "A girl with long blue hair in a school uniform stands on a rooftop. The wind blows her hair. She looks at the camera. Anime style."
- I2V prompt: "The wind blows her hair gently. She blinks and tilts her head slightly toward the camera."
In I2V, describing the character's appearance in the prompt can actually cause conflicts with the reference image and produce worse results.