MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research

Core advantages (features)

MMF is powered by PyTorch and features:

  • Model Zoo: Reference implementations for state-of-the-art vision and language models including VisualBERTViLBERTM4C (SoTA on TextVQA and TextCaps), Pythia (VQA 2018 challenge winner), and many others. See the full list of projects in MMF here.
  • Multi-Tasking: Support for training on multiple datasets together.
  • Datasets: Includes built-in support for various datasets including VQA, VizWiz, TextVQA, Visual Dialog and COCO Captioning. Running a single command automatically downloads and sets up the dataset for you.
  • Modules: Provides implementations of many commonly used layers in vision and language.
  • Distributed: Support for distributed training using DistributedDataParallel. With our hyperparameter sweep support, you can scale your model to any number of nodes.
  • Unopinionated: Unopinionated about the dataset and model implementations built on top of it.
  • Customization: Custom losses, metrics, scheduling, optimizers, tensorboard; suits all your custom needs.
