⚙️ Supports 6 Modalities:
Interestingly, only some modalities had labels, yet ImageBind learned to align them through self-supervised learning.
..No need for paired data (e.g., images and audio don’t have to be aligned)..Leverages contrastive learning for learning joint embedding space
..Competes with CLIP and AudioCLIP, but with better accuracy and coverage..Enables zero-shot retrieval (e.g., finding relevant video using just a sentence)
#ImageBind #MultimodalAI #MetaAI #DeepLearning #SelfSupervised
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces GENMO, a unified generalist model for human motion that seamlessly combines motion estimation and generation within a single framework. GENMO supports conditioning on videos, 2D keypoints, text, music, and 3D keyframes, enabling highly versatile motion understanding and synthesis.
Currently, no official code release is available.
Review:
https://t.ly/Q5T_Y
Paper:
https://lnkd.in/ds36BY49
Project Page:
https://lnkd.in/dAYHhuFU
#NVIDIA #GENMO #HumanMotion #DeepLearning #AI #ComputerVision #MotionGeneration #MachineLearning #MultimodalAI #3DReconstruction
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3