Learning to play Minecraft with Video PreTraining
The internet contains an enormous amount of publicly available videos that we…
DALL·E 2 pre-training mitigations
We observed that our internal predecessors to DALL·E 2 would sometimes reproduce…
A hazard analysis framework for code synthesis large language models
Codex, a large language model (LLM) trained on a variety of codebases,…
Efficient training of language models to fill in the middle
We show that autoregressive language models can learn to infill text after…
Introducing Whisper
Other existing approaches frequently use smaller, more closely paired audio-text training datasets,…
Scaling laws for reward model overoptimization
In reinforcement learning from human feedback, it is common to optimize against…
A system for generating 3D point clouds from complex prompts
While recent work on text-conditional 3D object generation has shown promising results,…
Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk
As generative language models improve, they open up new possibilities in fields…
An early look at the labor market impact potential of large language models
We investigate the potential implications of Generative Pre-trained Transformer (GPT) models and…


