Question 11 (Expert):
In Vision Transformers (ViT), how are image patches typically converted into input tokens for the transformer encoder?
A) Raw pixel values are used directly
B) Each patch is flattened and linearly projected
C) Patches are processed through a CNN first
D) Edge detection is applied before projection
#Python #ViT #ComputerVision #DeepLearning #Transformers
✅ By: https://yangx.top/DataScienceQ
In Vision Transformers (ViT), how are image patches typically converted into input tokens for the transformer encoder?
A) Raw pixel values are used directly
B) Each patch is flattened and linearly projected
C) Patches are processed through a CNN first
D) Edge detection is applied before projection
#Python #ViT #ComputerVision #DeepLearning #Transformers
✅ By: https://yangx.top/DataScienceQ
❤1
Are you preparing for AI interviews or want to test your knowledge in Vision Transformers (ViT)?
Basic Concepts (Q1–Q15)
Architecture & Components (Q16–Q30)
Attention & Transformers (Q31–Q45)
Training & Optimization (Q46–Q55)
Advanced & Real-World Applications (Q56–Q65)
Answer Key & Explanations
#VisionTransformer #ViT #DeepLearning #ComputerVision #Transformers #AI #MachineLearning #MCQ #InterviewPrep
✉️ Our Telegram channels: https://yangx.top/addlist/0f6vfFbEMdAwODBk📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2