Mixture-of-Transformers: Sparse and Scalable Architecture for Multi-Modal Models arxiv.org 2 points by mfiguiere a day ago