Contact Us

I-X Seminar Series: Sequential Modeling Enables Scalable Learning for Large Vision Models with Yutong Bai

Key Details:

Time: 16.00 – 17.00
Date: Tuesday 16 November
Location: Livestreamed

Registration is
now closed

Speaker

Yutong Bai

Yutong is a 5th-year CS PhD student at Johns Hopkins University advised by Prof. Alan Yuille. And currently a visiting student at UC Berkeley, advised by Prof. Alyosha Efros. She used to intern at Meta AI (FAIR Labs) and Google Brain, and is selected as 2023 Apple Scholar and EECS Rising Star.

Talk Title

Sequential Modeling Enables Scalable Learning for Large Vision Models

Talk Summary

In this presentation, Yutong will introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, she will define a common format, “visual sentences”, in which she can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

More Events

Jan
13

This workshop aims to bring together researchers in stochastic analysis, statistics and theoretical machine learning for an exchange of ideas at the forefront of the field. The

Jan
08

Join the winter edition of Multi-Service Networks workshop, which will cover all aspects of networked systems.

Jan
08

In his Inaugural Lecture, Professor Hamed Haddadi discusses his academic journey towards building networked systems.