RuntheAI
Posts
Training a language model on 1 single GPU and in 1 day of training.

Training a language model on 1 single GPU and in 1 day of training.

RUNTHE AI
January 02, 2023

Good Morning AI Runners

Here's what we've got for you today:

Google AI: General Purpose Robotics.
Training a language model on 1 single GPU and in 1 day of training.

Google AI: General Purpose Robotics

Researchers at Google AI have developed a new model called Robotics Transformer 1 (RT-1) that is designed to improve the capabilities of robots in various tasks.

RT-1 is a transformer-based robotic model that can execute over 700 real world instructions at 97% success rate.

RT-1 was trained on a dataset of over 130,000 episodes of robotic data collected over 17 months using 13 different robots.

In simple terms: A robot can now learn from large and task-neutral data sets thanks to a new model class that Google AI has effectively invented. Robots can now learn from vast amounts of data in the same way that machine learning models do.

Why is it significant? Watch this video to see a robot following the command "bring me the rice chips from the drawer." This kitchen is brand-new to the robot!

If you want to learn more about RT-1, check out the announcement thread below:

Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate!

Generalizes to new tasks✅
Robust to new environments and objects✅
Fast inference for real time control✅
Can absorb multi-robot data✅
Powers SayCan✅
🧵👇
— Karol Hausman (@hausman_k)
5:43 PM • Dec 13, 2022

Training a language model on 1 single GPU and in 1 day of training.

While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day?

This paper explores how far a language model can go despite being trained on a single GPU and shows that performance follows observed scaling laws in large-compute settings.

It provides a long list of techniques to train a language model in a single day on a single, modest GPU.

This could potentially expand the pool of people who can train models and increase how much a model can learn on a tight budget unlike big labs with multiple billion dollar budgets.

Cramming: Training a Language Model on a Single GPU in One Day

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.

https://arxiv.org/abs/2212.14034

Pic of the day:

That's it from RuntheAI for today.

If you enjoyed reading today's post please share it with a friend.

THANK YOU FOR READING AND SEE YOU TOMORROW, SUBSCRIBE TO STAY UPDATED!