Foundation models, which are large neural networks trained on massive datasets, have shown
impressive generalization in both the language and the vision domain. While fine-tuning foundation
models for new tasks at test-time is impractical due to billions of parameters in those models, prompts
have been employed to re-purpose models for test-time tasks on the fly. In this report, we ideate the equivalent foundation model for motion generation and the corresponding formats of prompt that can condition such a model. The central goal is to learn a behavior prior for motion generation that can be re-used in a novel scene.
CSAIL NSF MI project – 6939398