Python Data Science Jobs & Interviews

Question 11 (Expert):
In Vision Transformers (ViT), how are image patches typically converted into input tokens for the transformer encoder?

A) Raw pixel values are used directly
B) Each patch is flattened and linearly projected
C) Patches are processed through a CNN first
D) Edge detection is applied before projection

#Python #ViT #ComputerVision #DeepLearning #Transformers

✅ By: https://yangx.top/DataScienceQ

❤1

937 viewsedited 09:05

🔥

Master Vision Transformers with 65+ MCQs!

🔥

Are you preparing for AI interviews or want to test your knowledge in Vision Transformers (ViT)?

🧠 Dive into 65+ curated Multiple Choice Questions covering the fundamentals, architecture, training, and applications of ViT — all with answers!

🌐 Explore Now: https://hackmd.io/@husseinsheikho/vit-mcq

🔹 Table of Contents
Basic Concepts (Q1–Q15)
Architecture & Components (Q16–Q30)
Attention & Transformers (Q31–Q45)
Training & Optimization (Q46–Q55)
Advanced & Real-World Applications (Q56–Q65)
Answer Key & Explanations

#VisionTransformer #ViT #DeepLearning #ComputerVision #Transformers #AI #MachineLearning #MCQ #InterviewPrep

✉️ Our Telegram channels: https://yangx.top/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤2

1.38K views10:22

About

Blog

Apps

Platform