AI comic drawing

Today, I came across a project released by Nankai University and ByteDance called StoryDiffusion. By proposing a consistency self-attention mechanism, it can generate comics in various styles while maintaining the consistency of character style and costumes, thereby achieving coherent storytelling.

Feature

Cartoon character generation

StoryDiffusion can create stunningly consistent cartoon-style characters.

Multi-character generation

StoryDiffusion can simultaneously maintain the identity consistency of multiple characters and generate consistent characters across a series of images.

Long video generation

StoryDiffusion uses an image semantic motion predictor to generate high-quality videos by using either generated consistent images or user-input images as conditions.

Video editing demonstration

the performance of the motion predictor.

Method

Structure of consistency self-attention

StoryDiffusion's generation pipeline is used for generating theme-consistent images.

To create theme-consistent images that describe stories, StoryDiffusion integrates a consistency self-attention mechanism into a pre-trained text-to-image diffusion model.

StoryDiffusion divides the story text into multiple prompts and uses these prompts to batch-generate images.

Consistency self-attention establishes connections between multiple images generated in batches to maintain thematic consistency.

Structure of the motion predictor

StoryDiffusion's method pipeline is used to generate transition videos to obtain theme-consistent images, as described in Section 3.1.

To effectively simulate large-scale character movements, StoryDiffusion encodes the conditional image into the image semantic space to encode spatial information and predict transition embeddings.

These predicted embeddings are then decoded using a video generation model and serve as control signals in cross-attention to guide the generation of each frame.

Example

I ran an example with Huacheng myself: