Noam Chomsky: Language, Cognition, and Deep Learning | Artificial Intelligence
Noam Chomsky is one of the greatest minds of our time and is one of the most cited scholars in history. He is a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. He has spent over 60 years at MIT and recently also joined the University of Arizona. This conversation is part of the Artificial Intelligence podcast.
https://www.youtube.com/watch?v=cMscNuSUy0I
#natural_language_processing #deep_learning
Noam Chomsky is one of the greatest minds of our time and is one of the most cited scholars in history. He is a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. He has spent over 60 years at MIT and recently also joined the University of Arizona. This conversation is part of the Artificial Intelligence podcast.
https://www.youtube.com/watch?v=cMscNuSUy0I
#natural_language_processing #deep_learning
YouTube
Noam Chomsky: Language, Cognition, and Deep Learning | Lex Fridman Podcast #53
Dive into Deep Learning (D2L Book)
Dive into Deep Learning: an interactive deep learning book with code, math, and discussions, based on the NumPy interface
https://github.com/d2l-ai/d2l-en
#deep_learning
Dive into Deep Learning: an interactive deep learning book with code, math, and discussions, based on the NumPy interface
https://github.com/d2l-ai/d2l-en
#deep_learning
GitHub
GitHub - d2l-ai/d2l-en: Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities…
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge. - d2l-ai/d2l-en
An Overview of Recent State of the Art Deep Learning Algorithms/Architectures
Lecture on most recent research and developments in deep learning, and hopes for 2020. This is not intended to be a list of SOTA benchmark results, but rather a set of highlights of machine learning and AI innovations and progress in academia, industry, and society in general. This lecture is part of the MIT Deep Learning Lecture Series.
https://www.youtube.com/watch?v=0VH1Lim8gL8&t=999s
#deep_learning #artificial_intelligence
Lecture on most recent research and developments in deep learning, and hopes for 2020. This is not intended to be a list of SOTA benchmark results, but rather a set of highlights of machine learning and AI innovations and progress in academia, industry, and society in general. This lecture is part of the MIT Deep Learning Lecture Series.
https://www.youtube.com/watch?v=0VH1Lim8gL8&t=999s
#deep_learning #artificial_intelligence
YouTube
Deep Learning State of the Art (2020) | MIT Deep Learning Series
Lecture on most recent research and developments in deep learning, and hopes for 2020. This is not intended to be a list of SOTA benchmark results, but rathe...
Deep Reasoning Papers
A repository which contains recent papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, natural language reasoning and any other topics connecting deep learning and reasoning.
https://github.com/floodsung/Deep-Reasoning-Papers
#reasoning #deep_learning #artificial_intelligence
A repository which contains recent papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, natural language reasoning and any other topics connecting deep learning and reasoning.
https://github.com/floodsung/Deep-Reasoning-Papers
#reasoning #deep_learning #artificial_intelligence
GitHub
GitHub - floodsung/Deep-Reasoning-Papers: Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning…
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning - floodsung/Deep-Reasoning-Papers
An overview of gradient descent optimization algorithms
Abstract: Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent
https://arxiv.org/pdf/1609.04747.pdf
#deep_learning #optimization
Abstract: Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent
https://arxiv.org/pdf/1609.04747.pdf
#deep_learning #optimization
A cool 3D representation of the structure of BERT language model
Blogpost: https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/bert-encoder
#NLP #deep_learning
Blogpost: https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/bert-encoder
#NLP #deep_learning
Critique of Honda Prize for Dr. Hinton
Summary: Hinton has made significant contributions to artificial neural networks (NNs) and deep learning, but Honda credits him for fundamental inventions of others whom he did not cite. Science must not allow corporate PR to distort the academic record. Sec. I: Modern backpropagation was created by Linnainmaa (1970), not by Rumelhart & Hinton & Williams (1985). Ivakhnenko's deep feedforward nets (since 1965) learned internal representations long before Hinton's shallower ones (1980s). Sec. II: Hinton's unsupervised pre-training for deep NNs in the 2000s was conceptually a rehash of my unsupervised pre-training for deep NNs in 1991. And it was irrelevant for the deep learning revolution of the early 2010s which was mostly based on supervised learning - twice my lab spearheaded the shift from unsupervised pre-training to pure supervised learning (1991-95 and 2006-11). Sec. III: The first superior end-to-end neural speech recognition was based on two methods from my lab: LSTM (1990s-2005) and CTC (2006). Hinton et al. (2012) still used an old hybrid approach of the 1980s and 90s, and did not compare it to the revolutionary CTC-LSTM (which was soon on most smartphones). Sec. IV: Our group at IDSIA had superior award-winning computer vision through deep learning (2011) before Hinton's (2012). Sec. V: Hanson (1990) had a variant of "dropout" long before Hinton (2012). Sec. VI: In the 2010s, most major AI-based services across the world (speech recognition, language translation, etc.) on billions of devices were mostly based on our deep learning techniques, not on Hinton's. Repeatedly, Hinton omitted references to fundamental prior art (Sec. I & II & III & V). However, as Elvis Presley put it, "Truth is like the sun. You can shut it out for a time, but it ain't goin' away."
http://people.idsia.ch/~juergen/critique-honda-prize-hinton.html
#deep_learning
Summary: Hinton has made significant contributions to artificial neural networks (NNs) and deep learning, but Honda credits him for fundamental inventions of others whom he did not cite. Science must not allow corporate PR to distort the academic record. Sec. I: Modern backpropagation was created by Linnainmaa (1970), not by Rumelhart & Hinton & Williams (1985). Ivakhnenko's deep feedforward nets (since 1965) learned internal representations long before Hinton's shallower ones (1980s). Sec. II: Hinton's unsupervised pre-training for deep NNs in the 2000s was conceptually a rehash of my unsupervised pre-training for deep NNs in 1991. And it was irrelevant for the deep learning revolution of the early 2010s which was mostly based on supervised learning - twice my lab spearheaded the shift from unsupervised pre-training to pure supervised learning (1991-95 and 2006-11). Sec. III: The first superior end-to-end neural speech recognition was based on two methods from my lab: LSTM (1990s-2005) and CTC (2006). Hinton et al. (2012) still used an old hybrid approach of the 1980s and 90s, and did not compare it to the revolutionary CTC-LSTM (which was soon on most smartphones). Sec. IV: Our group at IDSIA had superior award-winning computer vision through deep learning (2011) before Hinton's (2012). Sec. V: Hanson (1990) had a variant of "dropout" long before Hinton (2012). Sec. VI: In the 2010s, most major AI-based services across the world (speech recognition, language translation, etc.) on billions of devices were mostly based on our deep learning techniques, not on Hinton's. Repeatedly, Hinton omitted references to fundamental prior art (Sec. I & II & III & V). However, as Elvis Presley put it, "Truth is like the sun. You can shut it out for a time, but it ain't goin' away."
http://people.idsia.ch/~juergen/critique-honda-prize-hinton.html
#deep_learning
people.idsia.ch
Critique of Honda Prize for Dr. Hinton
Honda credits Hinton for inventions of others whom he did not cite. Science must not allow corporate PR to distort the academic record.
The Cost of Training NLP Models: A Concise Overview
Abstract: We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as well as non-practitioners trying to make sense of the economics of modern-day Natural Language Processing (NLP).
https://arxiv.org/abs/2004.08900
#nlp #deep_learning
Abstract: We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as well as non-practitioners trying to make sense of the economics of modern-day Natural Language Processing (NLP).
https://arxiv.org/abs/2004.08900
#nlp #deep_learning
Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks
Abstract: Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose Grad-CAM++ to provide better visual explanations of CNN model predictions (when compared to Grad-CAM), in terms of better localization of objects as well as explaining occurrences of multiple objects of a class in a single image. We provide a mathematical explanation for the proposed method, Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ indeed provides better visual explanations for a given CNN architecture when compared to Grad-CAM.
https://arxiv.org/pdf/1710.11063.pdf
#deep_learning #computer_vision
Abstract: Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose Grad-CAM++ to provide better visual explanations of CNN model predictions (when compared to Grad-CAM), in terms of better localization of objects as well as explaining occurrences of multiple objects of a class in a single image. We provide a mathematical explanation for the proposed method, Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ indeed provides better visual explanations for a given CNN architecture when compared to Grad-CAM.
https://arxiv.org/pdf/1710.11063.pdf
#deep_learning #computer_vision
Towards Biologically Plausible Deep Learning
Abstract: Neuroscientists have long criticized deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learning rule believed to govern synaptic weight updates (Spike-Timing-Dependent Plasticity) arises out of a simple update rule that makes a lot of sense from a machine learning point of view and can be interpreted as gradient descent on some objective function so long as the neuronal dynamics push firing rates towards better values of the objective function (be it supervised, unsupervised, or reward-driven). The second main idea is that this corresponds to a form of the variational EM algorithm, i.e., with approximate rather than exact posteriors, implemented by neural dynamics. Another contribution of this paper is that the gradients required for updating the hidden states in the above variational interpretation can be estimated using an approximation that only requires propagating activations forward and backward, with pairs of layers learning to form a denoising auto-encoder. Finally, we extend the theory about the probabilistic interpretation of auto-encoders to justify improved sampling schemes based on the generative interpretation of denoising auto-encoders, and we validate all these ideas on generative learning tasks.
https://arxiv.org/abs/1502.04156
#deep_learning #neuroscience
Abstract: Neuroscientists have long criticized deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learning rule believed to govern synaptic weight updates (Spike-Timing-Dependent Plasticity) arises out of a simple update rule that makes a lot of sense from a machine learning point of view and can be interpreted as gradient descent on some objective function so long as the neuronal dynamics push firing rates towards better values of the objective function (be it supervised, unsupervised, or reward-driven). The second main idea is that this corresponds to a form of the variational EM algorithm, i.e., with approximate rather than exact posteriors, implemented by neural dynamics. Another contribution of this paper is that the gradients required for updating the hidden states in the above variational interpretation can be estimated using an approximation that only requires propagating activations forward and backward, with pairs of layers learning to form a denoising auto-encoder. Finally, we extend the theory about the probabilistic interpretation of auto-encoders to justify improved sampling schemes based on the generative interpretation of denoising auto-encoders, and we validate all these ideas on generative learning tasks.
https://arxiv.org/abs/1502.04156
#deep_learning #neuroscience
arXiv.org
Towards Biologically Plausible Deep Learning
Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation...
AlphaGo - The Movie | Full Documentary
Summary: with more board configurations than there are atoms in the universe, the ancient Chinese game of Go has long been considered a grand challenge for artificial intelligence. On March 9, 2016, the worlds of Go and artificial intelligence collided in South Korea for an extraordinary best-of-five-game competition, coined The DeepMind Challenge Match. Hundreds of millions of people around the world watched as a legendary Go master took on an unproven AI challenger for the first time in history.
https://www.youtube.com/watch?v=WXuK6gekU1Y
#artificial_intelligence #reinforcement_learning #deep_learning
Summary: with more board configurations than there are atoms in the universe, the ancient Chinese game of Go has long been considered a grand challenge for artificial intelligence. On March 9, 2016, the worlds of Go and artificial intelligence collided in South Korea for an extraordinary best-of-five-game competition, coined The DeepMind Challenge Match. Hundreds of millions of people around the world watched as a legendary Go master took on an unproven AI challenger for the first time in history.
https://www.youtube.com/watch?v=WXuK6gekU1Y
#artificial_intelligence #reinforcement_learning #deep_learning
YouTube
AlphaGo - The Movie | Full award-winning documentary
With more board configurations than there are atoms in the universe, the ancient Chinese game of Go has long been considered a grand challenge for artificial intelligence.
On March 9, 2016, the worlds of Go and artificial intelligence collided in South…
On March 9, 2016, the worlds of Go and artificial intelligence collided in South…
Keras Website Has been Updated
Quote from its developer (François Chollet): Keras has a new website, which includes a 100% refreshed list of developer guides and code examples.
https://keras.io/
#deep_learning #programming
Quote from its developer (François Chollet): Keras has a new website, which includes a 100% refreshed list of developer guides and code examples.
https://keras.io/
#deep_learning #programming
keras.io
Keras: Deep Learning for humans
Keras documentation
Ilya Sutskever: Deep Learning
Brief Biography: Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic.
https://www.youtube.com/watch?v=13CZPWmke6A
#deep_learning #artificial_intelligence #reinforcement_learning
Brief Biography: Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic.
https://www.youtube.com/watch?v=13CZPWmke6A
#deep_learning #artificial_intelligence #reinforcement_learning
YouTube
Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94
Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this…
SIREN: Implicit Neural Representations with Periodic Activation Functions
Abstract: Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.
Paper: https://arxiv.org/abs/2006.09661
Website: https://vsitzmann.github.io/siren/
Explanatory Video: https://youtu.be/Q5g3p9Zwjrk
#deep_learning #neural_network
Abstract: Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.
Paper: https://arxiv.org/abs/2006.09661
Website: https://vsitzmann.github.io/siren/
Explanatory Video: https://youtu.be/Q5g3p9Zwjrk
#deep_learning #neural_network
Grounding Language in Play: A scalable approach for controlling robots with natural language
https://language-play.github.io/
#nlp #reinforcement_learning #deep_learning
https://language-play.github.io/
#nlp #reinforcement_learning #deep_learning
PyTorch Internals
Summary: This article is for those of you who have used PyTorch, and thought to yourself, "It would be great if I could contribute to PyTorch," but were scared by PyTorch's behemoth of a C++ codebase. I'm not going to lie: the PyTorch codebase can be a bit overwhelming at times. The purpose of this talk is to put a map in your hands: to tell you about the basic conceptual structure of a "tensor library that supports automatic differentiation", and give you some tools and tricks for finding your way around the codebase. I'm going to assume that you've written some PyTorch before, but haven't necessarily delved deeper into how a machine learning library is written.
http://blog.ezyang.com/2019/05/pytorch-internals/
#pytorch #deep_learning
Summary: This article is for those of you who have used PyTorch, and thought to yourself, "It would be great if I could contribute to PyTorch," but were scared by PyTorch's behemoth of a C++ codebase. I'm not going to lie: the PyTorch codebase can be a bit overwhelming at times. The purpose of this talk is to put a map in your hands: to tell you about the basic conceptual structure of a "tensor library that supports automatic differentiation", and give you some tools and tricks for finding your way around the codebase. I'm going to assume that you've written some PyTorch before, but haven't necessarily delved deeper into how a machine learning library is written.
http://blog.ezyang.com/2019/05/pytorch-internals/
#pytorch #deep_learning
Neural Architecture Search without Training
Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU.
Explanatory Video: https://www.youtube.com/watch?v=a6v92P0EbJc
GitHub Repo: https://github.com/BayesWatch/nas-without-training
Paper: https://arxiv.org/abs/2006.04647
#deep_learning #neural_architecture_search
Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU.
Explanatory Video: https://www.youtube.com/watch?v=a6v92P0EbJc
GitHub Repo: https://github.com/BayesWatch/nas-without-training
Paper: https://arxiv.org/abs/2006.04647
#deep_learning #neural_architecture_search
YouTube
Neural Architecture Search without Training (Paper Explained)
#ai #research #machinelearning
Neural Architecture Search is typically very slow and resource-intensive. A meta-controller has to train many hundreds or thousands of different models to find a suitable building plan. This paper proposes to use statistics…
Neural Architecture Search is typically very slow and resource-intensive. A meta-controller has to train many hundreds or thousands of different models to find a suitable building plan. This paper proposes to use statistics…
An Introduction to Deep Reinforcement Learning
Abstract: Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decisionmaking tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
Paper: https://arxiv.org/pdf/1811.12560.pdf
#reinforcement_learning
#deep_learning
Abstract: Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decisionmaking tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
Paper: https://arxiv.org/pdf/1811.12560.pdf
#reinforcement_learning
#deep_learning
Backward Feature Correction: How Deep Learning Performs Deep Learning
Summary: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of hierarchical learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically by applying SGD. On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how deep neural networks can be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding, and their simple combinations) are efficient. We establish a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. We believe this is the key to understand the deep learning process in multi-layer neural networks.
Paper: https://arxiv.org/pdf/2001.04413.pdf
#theory #deep_learning
Summary: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of hierarchical learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically by applying SGD. On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how deep neural networks can be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding, and their simple combinations) are efficient. We establish a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. We believe this is the key to understand the deep learning process in multi-layer neural networks.
Paper: https://arxiv.org/pdf/2001.04413.pdf
#theory #deep_learning
New Deep Learning Course by Yann LeCun & Alfredo Canziani (Recommended)
Course Intro: This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition.
Additional Info: This course is available in 11 languages such as Persian, and I personally translated some of the materials of this course to Persian :).
https://atcold.github.io/pytorch-Deep-Learning/
#deep_learning #course
Course Intro: This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition.
Additional Info: This course is available in 11 languages such as Persian, and I personally translated some of the materials of this course to Persian :).
https://atcold.github.io/pytorch-Deep-Learning/
#deep_learning #course