logo
0
0
Login
Sijie Zhu<Zilence006@users.noreply.huggingface.co>
Update README.md

Homepage: https://bytedance.github.io/vidi-website/

Github: https://github.com/bytedance/vidi

Demo: https://vidi.byteintl.com/

We introduce Vidi, a family of Large Multimodal Models (LMMs) for a wide range of video understanding and editing (VUE) scenarios. The first release focuses on temporal retrieval (TR), i.e., identifying the time ranges in input videos corresponding to a given text query.

This model is the first release for temporal retrieval.

Please find the inference and evaluation code on https://github.com/bytedance/vidi.

Citation

If you find Vidi useful for your research and applications, please cite using this BibTeX:

@article{Vidi2025vidi, title={Vidi: Large Multimodal Models for Video Understanding and Editing}, author={Vidi Team, Celong Liu, Chia-Wen Kuo, Dawei Du, Fan Chen, Guang Chen, Jiamin Yuan, Lingxi Zhang, Lu Guo, Lusha Li, Longyin Wen, Qingyu Chen, Rachel Deng, Sijie Zhu, Stuart Siew, Tong Jin, Wei Lu, Wen Zhong, Xiaohui Shen, Xin Gu, Xing Mei, Xueqiong Qu}, journal={arXiv preprint arXiv:2504.15681}, year={2025} }