Add text to images - AnyText

Today I tried an open-source project from Alibaba called AnyText, and found it very interesting. It can achieve text generation and text editing functions.

Effect

The following is a demonstration of the results after my run:

Features

Supports various angles
Supports various languages

Technology

AnyText consists of a diffusion pipeline, mainly including two parts: the auxiliary latent module and the text embedding module. The former generates latent features for text generation or editing using inputs such as text glyphs, positions, and mask images. The latter uses an OCR model to encode stroke data into embeddings, which are then fused with image caption embeddings generated by the tokenizer, thus generating text that seamlessly blends with the background. AnyText is trained using text-controlled diffusion loss and text-aware loss to further improve writing accuracy.

Comparison

The effect comparison of different technical solutions is as follows: