Contact Us

I-X Seminar Series: How to train your Vicuna – finetuning & evaluating LLMs in the wild with Hao Zhang

Key Details:

Time: 14.00 – 15.30
Date: Tuesday 3 October
Location: Online and In Person

I-X 5 Meeting Room, Level 5
Translation and Innovation Hub (I-HUB)
Imperial White City Campus
84 Wood Lane
London W12 7SL

Registration is
now closed

Speaker

Hao Zhang from Halıcıoğlu Data Science Institute and the Department of Computer Science and Engineering at UC San Diego.

Hao Zhang is an Assistant Professor in Halıcıoğlu Data Science Institute and the Department of Computer Science and Engineering at UC San Diego. Before joining UCSD, Hao was a postdoctoral researcher at UC Berkeley working with Ion Stoica (2021 – 2023). Hao completed his Ph.D. in Computer Science at Carnegie Mellon University with Eric Xing (2014 – 2020). During his Ph.D., Hao took on leave and worked for the ML startup company Petuum (2016 – 2021).

Hao’s research interest is in the intersection area of machine learning and systems. Hao’s past work includes Vicuna, FastChat, Alpa, vLLM, Poseidon, Petuum. Hao’s research has been recognized with the Jay Lepreau best paper award at OSDI’21 and an NVIDIA pioneer research award at NeurIPS’17. Parts of Hao’s research have been commercialized at multiple start-ups including Petuum and AnyScale.

Talk Title

How to train your Vicuna – finetuning & evaluating LLMs in the wild

Talk Summary

Post the release of Meta’s Llama weights, the open source development of large language models (LLMs) are seeing rapid progress almost every day. This talk will share our experience with serving and evaluating 20+ LLM-based Chatbots, including Vicuna, within the Chatbot Arena I will start by briefly explaining Vicuna, an open source chatbot we finetuned from Llama, and the Chatbot Arena platform we developed to evaluate the quality of such models in the wild. I will then discuss the underlying system challenges we faced: how to serve many LLMs, achieving high throughput and low latency, given only a limited amount of university-donated GPUs. I’ll cover two key enabling techniques behind the scene: paged attention (vLLM, SOSP’23) and statistical multiplexing with model parallelism (AlpaServe, OSDI’23). This is joint work with members of the LMSYS Org team.

More Events

Jan
08

In his Inaugural Lecture, Professor Hamed Haddadi discusses his academic journey towards building networked systems.

Jan
08

Join the winter edition of Multi-Service Networks workshop, which will cover all aspects of networked systems.

Jan
13

This workshop aims to bring together researchers in stochastic analysis, statistics and theoretical machine learning for an exchange of ideas at the forefront of the field. The