Contact Us

I-X Seminar Series: How to train your Vicuna – finetuning & evaluating LLMs in the wild with Hao Zhang

Key Details:

Time: 14.00 – 15.30
Date: Tuesday 3 October
Location: Online and In Person

I-X 5 Meeting Room, Level 5
Translation and Innovation Hub (I-HUB)
Imperial White City Campus
84 Wood Lane
London W12 7SL

Registration is
now closed
Recorded Event

Speaker

Hao Zhang from Halıcıoğlu Data Science Institute and the Department of Computer Science and Engineering at UC San Diego.

Hao Zhang is an Assistant Professor in Halıcıoğlu Data Science Institute and the Department of Computer Science and Engineering at UC San Diego. Before joining UCSD, Hao was a postdoctoral researcher at UC Berkeley working with Ion Stoica (2021 – 2023). Hao completed his Ph.D. in Computer Science at Carnegie Mellon University with Eric Xing (2014 – 2020). During his Ph.D., Hao took on leave and worked for the ML startup company Petuum (2016 – 2021).

Hao’s research interest is in the intersection area of machine learning and systems. Hao’s past work includes Vicuna, FastChat, Alpa, vLLM, Poseidon, Petuum. Hao’s research has been recognized with the Jay Lepreau best paper award at OSDI’21 and an NVIDIA pioneer research award at NeurIPS’17. Parts of Hao’s research have been commercialized at multiple start-ups including Petuum and AnyScale.

Talk Title

How to train your Vicuna – finetuning & evaluating LLMs in the wild

Talk Summary

Post the release of Meta’s Llama weights, the open source development of large language models (LLMs) are seeing rapid progress almost every day. This talk will share our experience with serving and evaluating 20+ LLM-based Chatbots, including Vicuna, within the Chatbot Arena I will start by briefly explaining Vicuna, an open source chatbot we finetuned from Llama, and the Chatbot Arena platform we developed to evaluate the quality of such models in the wild. I will then discuss the underlying system challenges we faced: how to serve many LLMs, achieving high throughput and low latency, given only a limited amount of university-donated GPUs. I’ll cover two key enabling techniques behind the scene: paged attention (vLLM, SOSP’23) and statistical multiplexing with model parallelism (AlpaServe, OSDI’23). This is joint work with members of the LMSYS Org team.

Event Recording

Watch on Youtube

More Events

Mar
03

Join our Virtual Open Day to learn more about the MSc in Artificial Intelligence Applications and Innovation.

Feb
17

This talk discusses verification of neural systems in safety-critical applications.