Today I tried an open-source project from Alibaba called AnyText, and found it very interesting. It can achieve text generation and text editing functions.
Effect
The following is a demonstration of the results after my run:
Features
Supports various angles Supports various languages
Technology
AnyText consists of a diffusion pipeline, mainly including two parts: the auxiliary latent module and the text embedding module. The former generates latent features for text generation or editing using inputs such as text glyphs, positions, and mask images. The latter uses an OCR model to encode stroke data into embeddings, which are then fused with image caption embeddings generated by the tokenizer, thus generating text that seamlessly blends with the background. AnyText is trained using text-controlled diffusion loss and text-aware loss to further improve writing accuracy.
Comparison
The effect comparison of different technical solutions is as follows: