Images Becoming Worlds. World Models, Visual Embeddings, and Visual Relations

Giulia Flenghi, Ph.D.

Recent developments in artificial intelligence — from visual embeddings and self-supervised systems to so-called “world models” — are transforming the ways machines organize, relate, and interpret images. These systems do not merely recognize objects or classify content; rather, they construct complex relational spaces in which images are associated according to geometric, semantic, material, or compositional similarities.

My research investigates how different AI architectures, such as JEPA, DINO, CLIP, and recent generative models such as SANA-WM, construct forms of visual knowledge through the computational representation of images. Through datasets drawn from art history, visual culture, and architectural representation — including geometric mosaics, Euclidean diagrams, marble surfaces, stylistic variations, and images containing other images — the research explores the logics through which artificial systems organize visual experience and establish relationships between forms, materials, and representations.

Particular attention is devoted to questions of geometry, abstraction, and “twofoldness,” namely the capacity of computational systems to understand images not only as transparent windows onto the world, but also as objects existing within it. In this sense, embedding spaces may be understood as emerging forms of visual and cultural organization, capable of revealing biases, hierarchies, and implicit relational structures embedded within the models.

This research builds upon my previous studies on Byzantine mosaic decorations, in which I employed AI-based methodologies for the automatic classification of geometric ornamental patterns and for the analysis of visual structures within cultural heritage datasets. More broadly, my work explores the relationship between artificial intelligence, visual representation, and the construction of historical-artistic knowledge.