Speaker: Lorenzo Baraldi

Multimodal and Embodied AI for Digital Humanities

Research Seminar
Recent progress in the Computer Vision and Natural Language Processing communities have made it possible to connect Vision and Language together in a variety of different tasks which lie at the intersection of Vision, Language, and Embodied AI. Those tasks range from retrieving images or part of images given textual queries, to generating meaningful descriptions of images, answering questions and navigating agents in unseen environments via natural language instructions. [more]
