Complete Statistical Theory of Learning
https://www.youtube.com/watch?v=Ow25mjFjSmg
#statistics #machine_learning
#theory
https://www.youtube.com/watch?v=Ow25mjFjSmg
#statistics #machine_learning
#theory
YouTube
Complete Statistical Theory of Learning (Vladimir Vapnik) | MIT Deep Learning Series
Lecture by Vladimir Vapnik in January 2020, part of the MIT Deep Learning Lecture Series.
Slides: http://bit.ly/2ORVofC
Associated podcast conversation: https://www.youtube.com/watch?v=bQa7hpUpMzM
Series website: https://deeplearning.mit.edu
Playlist: ht…
Slides: http://bit.ly/2ORVofC
Associated podcast conversation: https://www.youtube.com/watch?v=bQa7hpUpMzM
Series website: https://deeplearning.mit.edu
Playlist: ht…
Backward Feature Correction: How Deep Learning Performs Deep Learning
Summary: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of hierarchical learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically by applying SGD. On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how deep neural networks can be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding, and their simple combinations) are efficient. We establish a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. We believe this is the key to understand the deep learning process in multi-layer neural networks.
Paper: https://arxiv.org/pdf/2001.04413.pdf
#theory #deep_learning
Summary: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of hierarchical learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically by applying SGD. On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how deep neural networks can be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding, and their simple combinations) are efficient. We establish a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. We believe this is the key to understand the deep learning process in multi-layer neural networks.
Paper: https://arxiv.org/pdf/2001.04413.pdf
#theory #deep_learning