Monday, July 22, 2024

You type what you want and Google generates the image for you. Impressive Imagen results


There is a competition between Microsoft and Google to generate images from text descriptions. If Microsoft Recently opened to niche DALL-E 2Google launched its own Imagen program: You write what you want and the form creates an image that matches what is written.

Imagen was designed by the Google Brain team, a research group that basically has a carte blanche on the projects to be developed, but mainly focuses on some branches of machine learning: among them are also Possibility to get an image of its description script.

In the case of Imagen, the textual description can be the most exotic, like this: “An alien octopus floating through a portal reading a newspaper. Imagen’s result is what you see in the following image.

These models trained on image databases take the English name “text-to-image publishing model”, which can be translated as “text-to-image publishing model”.

A diffusion model is usually a generative model that is used to generate data similar to what it is being trained on. The most common example is graphic noise added to an image followed by the reverse process, in which the model learns to recover the starting image from seemingly indistinguishable noise.

source:. Assembly AI

To ensure that the diffusion model running on the text is able to generate data other than the original data, as for binomial text images, Usually, data sets made of text pairs are used: This is an image with its script description.

At the moment it is not open to the public as it is “dangerous”

See also  Start date and time revealed, Warzone - event confirmed

Google researchers have realized that you can get excellent results like Imagen نتائج Using pre-trained text forms, such as Google’s “T5 text-to-text” framework (derived from the five “T’s” in the name “Text-To-Text Transfer Transformer”). This does not examine the words of the sentence sequentially, but only performs a fixed small number of steps (selected empirically) between words. At each step, he applies a self-attention mechanism that directly models the relationships between all the words in the sentence, regardless of their location.

According to the Brain team, Increasing the language model size in Imagen increases sample resolution and image text alignment Much more than increasing the size of the image spread pattern.

Results have been published On the Imagen beta site It’s really excellent, and to demonstrate the capabilities of the new spread model, Google has created a benchmark for evaluating text image models called DrawBench. Human evaluators preferred Imagen to other models over direct comparisons, in terms of sample quality and convergence between image and text. The model was compared with VQ-GAN + CLIP, Latent Diffusion Models and DALL-E 2.

Open the original file

Imagen is currently only accessible in the site demo because the Brain team said: “It relies on text codecs trained on unprocessed web-scale data and thus inherits the social biases and limitations of large language models. As a result, there is a risk that Imagen will encode harmful stereotypes and representations, which explains our decision not to release Imagen for public use without further safeguards.. “


More like this