Reference images & multi-character shots

Use generateImage's referenceImages to keep a character visually consistent across images and to compose several characters into one image (a two-shot of both speakers, a group scene). Generate one canonical portrait per character, store its URL in state, then pass it as a reference for every later shot.

Generated characters drift: ask for "Mara, the red-haired smuggler" twice and you get two different women. The fix is to generate each character once, keep that image's URL as their canonical reference, and pass it via referenceImages for every later shot. The model preserves what the reference depicts — identity, outfit, style — while the prompt directs the new pose, expression, or scene.

This recipe covers the two uses:

One reference — the same character in a new pose, expression, or situation.
Several references — several characters composed into a single image: a two-shot of both speakers in a dialogue, a group scene, a confrontation.

Canonical references

Generate the reference when the character first appears and store the URL in state:

// types.ts
export interface Character {
  name: string
  referenceUrl: string
}

export interface State {
  cast: Record<string, Character>
}

// On first appearance — a clean, neutral portrait makes the best reference
const { url } = await io.activities.generateImage(`ref-${name}`, {
  prompt: `Portrait of ${visualDescription}, neutral expression, facing the viewer, plain background`,
  width: 768,
  height: 1024,
})
state.set(draft => { draft.cast[name] = { name, referenceUrl: url } })

Pre-authored games can use a game asset as the reference instead — assets["Mara Portrait"].url works anywhere a generateImage URL does.

Single-character shots

Identity comes from the reference, so the prompt should name only the change — re-describing the character's appearance fights the reference instead of reinforcing it:

const mara = state.get().cast['Mara']
const { url } = await io.activities.generateImage(`shot-${seq}`, {
  prompt: 'She leans against the doorframe, arms crossed, a wry half-smile',
  referenceImages: [mara.referenceUrl],
})

Multi-character shots

Pass each character's reference and describe the scene. The output composes the referenced subjects into one image — this is what makes dialogue and drama scenes work, since the two speakers can finally share a frame:

const { cast } = state.get()
const { url } = await io.activities.generateImage(`duel-${seq}`, {
  prompt: 'The woman from image 1 and the man from image 2 face each other across a ' +
    'rain-slick alley at night, she stepping forward, he holding his ground. ' +
    'Keep their identities and the illustration style of the reference images.',
  referenceImages: [cast['Mara'].referenceUrl, cast['Dorian'].referenceUrl],
})

Tips for multi-reference prompts:

Refer to subjects by reference index — "the woman from image 1", "the man from image 2" — matching the order of referenceImages. This is the official FLUX.2 convention; every official multi-reference example uses it.
Never mix character names into the prompt. Names mean nothing to the model, so "the woman from image 2 ... Mara's hand on her shoulder" reads as two different people. If an LLM writes your prompts, it will leak names from the story context — tell it not to, and consider a deterministic name→index rewrite as a backstop.
Describe one clear interaction. Crowded prompts with several simultaneous actions degrade composition.
Wider aspect ratios (e.g. 1344×768) give two characters room to share the frame.
If the output's style drifts from the references, add an explicit style clause — "keep the illustration style of the reference images" (or assign one: "apply the style of image 1 to the whole scene").

Changing how a character looks (outfits, conditions)

The reference carries the character's look as much as their identity — so when the story changes their look (an evening gown, rain-soaked, hair down), don't state the change per shot. In a multi-subject prompt, a mid-sentence change-instruction ("the woman from image 1, now wearing an emerald gown, ...") frequently binds to the wrong subject — the other character ends up in the gown.

Instead, bake the change into the reference once and use the result for every later shot:

// One edit over the base reference when the look changes…
const { url } = await io.activities.generateImage(`look-${seq}`, {
  prompt: 'Now wearing a deep emerald evening gown. Keep the person from image 1\'s ' +
    'identity, framing, and the illustration style of the reference image. Change nothing else.',
  referenceImages: [mara.referenceUrl],
})
state.set(draft => { draft.cast['Mara'].currentReferenceUrl = url })

// …then every shot just uses the current reference: nothing to misbind.
referenceImages: [mara.currentReferenceUrl ?? mara.referenceUrl]

Keep the base reference and phrase each look change relative to it (rather than chaining edits off the previous look) so changes don't accumulate drift, and reverting is just dropping currentReferenceUrl.

Notes

Reference URLs must be from this platform: a previous generateImage result or a game asset URL.
Up to 10 references, though 2–4 compose most reliably.
Each reference adds a credit surcharge on top of the image's pixel-count cost.
Flux models only.