ββNot Enough Data? Deep Learning to the Rescue!
The authors use a powerful pre-trained #NLP NN model to artificially synthesize new #labeled #data for #supervised learning. They mainly focus on cases with scarce labeled data. Their method referred to as language-model-based data augmentation (#LAMBADA), involves fine-tuning a SOTA language generator to a specific task through an initial training phase on the existing (usually small) labeled data.
Using the fine-tuned model and given a class label, new sentences for the class are generated. Then they filter these new sentences by using a classifier trained on the original data.
In a series of experiments, they show that LAMBADA improves classifiers performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the SOTA techniques for data augmentation, specifically those applicable to text classification tasks with little data.
paper: https://arxiv.org/abs/1911.03118
The authors use a powerful pre-trained #NLP NN model to artificially synthesize new #labeled #data for #supervised learning. They mainly focus on cases with scarce labeled data. Their method referred to as language-model-based data augmentation (#LAMBADA), involves fine-tuning a SOTA language generator to a specific task through an initial training phase on the existing (usually small) labeled data.
Using the fine-tuned model and given a class label, new sentences for the class are generated. Then they filter these new sentences by using a classifier trained on the original data.
In a series of experiments, they show that LAMBADA improves classifiers performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the SOTA techniques for data augmentation, specifically those applicable to text classification tasks with little data.
paper: https://arxiv.org/abs/1911.03118
Media is too big
VIEW IN TELEGRAM
StyleGANv2 π:
- significantly better samples (better FID scores & reduced artifacts)
- no more progressive growing
- improved Style-mixing
- smoother interpolations (extra regularization)
- faster training
paper: https://arxiv.org/abs/1912.04958
code: https://github.com/NVlabs/stylegan2
origin video: https://www.youtube.com/watch?v=c-NJtV9Jvp0&feature=emb_logo
- significantly better samples (better FID scores & reduced artifacts)
- no more progressive growing
- improved Style-mixing
- smoother interpolations (extra regularization)
- faster training
paper: https://arxiv.org/abs/1912.04958
code: https://github.com/NVlabs/stylegan2
origin video: https://www.youtube.com/watch?v=c-NJtV9Jvp0&feature=emb_logo
AR-Net: A simple autoregressive NN for #timeSeries
AR-Net, has 2 distinct advantages over its traditional counterpart:
* scales well to large orders, making it possible to estimate long-range dependencies (important in high-resolution monitoring applications, such as those in the data center domain);
* automatically selects and estimates the important coefficients of a sparse AR process, eliminating the need to know the true order of the AR process
To overcome the scalability challenge, they train a NN with #SGD to learn the AR (#autoregression) coefficients. AR-Net effectively learns near-identical weights as classic AR implementations & is equally good at predicting the next value of the time series.
Also, AR-Net automatically learns the relevant weights, even if the underlying data is generated by a noisy & extremely sparse AR process.
blog: https://ai.facebook.com/blog/ar-net-a-simple-autoregressive-neural-network-for-time-series/
paper: https://arxiv.org/abs/1911.03118
AR-Net, has 2 distinct advantages over its traditional counterpart:
* scales well to large orders, making it possible to estimate long-range dependencies (important in high-resolution monitoring applications, such as those in the data center domain);
* automatically selects and estimates the important coefficients of a sparse AR process, eliminating the need to know the true order of the AR process
To overcome the scalability challenge, they train a NN with #SGD to learn the AR (#autoregression) coefficients. AR-Net effectively learns near-identical weights as classic AR implementations & is equally good at predicting the next value of the time series.
Also, AR-Net automatically learns the relevant weights, even if the underlying data is generated by a noisy & extremely sparse AR process.
blog: https://ai.facebook.com/blog/ar-net-a-simple-autoregressive-neural-network-for-time-series/
paper: https://arxiv.org/abs/1911.03118
ProteinNet: a standardized data set for machine learning of protein structure
Link: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2932-0
Github: https://github.com/aqlaboratory/proteinnet
#biolearning #medical #dl
Link: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2932-0
Github: https://github.com/aqlaboratory/proteinnet
#biolearning #medical #dl
BioMed Central
ProteinNet: a standardized data set for machine learning of protein structure - BMC Bioinformatics
Background Rapid progress in deep learning has spurred its application to bioinformatics problems including protein structure prediction and design. In classic machine learning problems like computer vision, progress has been driven by standardized data setsβ¦
ββThe 2019 AI Index report!
Stanford University Human-Centered Artificial Intelligence released an annual report about the state of #AI this year. 160 pages of text with various metrics and measurements. Good to read with a cup of coffee ;)
TL;DR by TheVerge in the picture but inside report more interesting!
site of the report: https://hai.stanford.edu/ai-index/2019
Stanford University Human-Centered Artificial Intelligence released an annual report about the state of #AI this year. 160 pages of text with various metrics and measurements. Good to read with a cup of coffee ;)
TL;DR by TheVerge in the picture but inside report more interesting!
site of the report: https://hai.stanford.edu/ai-index/2019
ββA Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern
New work from #DeepMind built in top of Loss Landscape Sightseeing with Multi-Point Optimization
ArXiV: https://arxiv.org/abs/1912.07559
Predecessorβs github: https://github.com/universome/loss-patterns
New work from #DeepMind built in top of Loss Landscape Sightseeing with Multi-Point Optimization
ArXiV: https://arxiv.org/abs/1912.07559
Predecessorβs github: https://github.com/universome/loss-patterns
ββSingle Headed Attention RNN
tl;dr: stop thinking with your (#attention) head :kekeke:
* obtain strong results on a byte level #languageModeling dataset (enwik8) in under 24 hours on a single GPU (12GB Titan V)
* support long-range dependencies (up to 5000 tokens) without increasing compute time or memory usage substantially by using a simpler attention mechanism
* avoid the fragile training process required by standard #Transformer models such as a long warmup
* back off toward a standard #LSTM allowing you to drop retained memory states (needed for a Transformer model) if memory becomes a major constraint
* provide a smaller model that features only standard components such as the LSTM, single-headed attention, and feed-forward modules such that they can easily be productionized using existing optimized tools and exported to various formats (i.e. ONNX)
paper: https://arxiv.org/abs/1911.11423
code: https://github.com/Smerity/sha-rnn
tweet: https://twitter.com/Smerity/status/1199529360954257408?s=20
tl;dr: stop thinking with your (#attention) head :kekeke:
* obtain strong results on a byte level #languageModeling dataset (enwik8) in under 24 hours on a single GPU (12GB Titan V)
* support long-range dependencies (up to 5000 tokens) without increasing compute time or memory usage substantially by using a simpler attention mechanism
* avoid the fragile training process required by standard #Transformer models such as a long warmup
* back off toward a standard #LSTM allowing you to drop retained memory states (needed for a Transformer model) if memory becomes a major constraint
* provide a smaller model that features only standard components such as the LSTM, single-headed attention, and feed-forward modules such that they can easily be productionized using existing optimized tools and exported to various formats (i.e. ONNX)
paper: https://arxiv.org/abs/1911.11423
code: https://github.com/Smerity/sha-rnn
tweet: https://twitter.com/Smerity/status/1199529360954257408?s=20
ββSpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Abstract: CNN typically encodes an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that are learned on an object detection task by Neural Architecture Search. SpineNet achieves the SOTA performance of a one-stage object detector on COCO with 60% less computation and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.
So, by Google's beloved method of creating a new SOTA, there is a new one! They just permute ResNet layers by NAS with adding resample cross-scale connections for correct connection scales output between layers. It seems that no need FPN cause the whole backbone is FPN. They train from scratch on RetinaNet just replace ResNet backbone with SpineNet and get SOTA. On two-stage detectors, there is the same result by replacing the backbone with SpineNet. If you want just classify something with that backbone it is performed very well too. So new architecture for any application!
Good job.
paper: https://arxiv.org/abs/1912.05027
code: Very wanted, but not release yet
#CV #ObjectDetection #GoogleResearch #NAS #SOTA
Abstract: CNN typically encodes an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that are learned on an object detection task by Neural Architecture Search. SpineNet achieves the SOTA performance of a one-stage object detector on COCO with 60% less computation and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.
So, by Google's beloved method of creating a new SOTA, there is a new one! They just permute ResNet layers by NAS with adding resample cross-scale connections for correct connection scales output between layers. It seems that no need FPN cause the whole backbone is FPN. They train from scratch on RetinaNet just replace ResNet backbone with SpineNet and get SOTA. On two-stage detectors, there is the same result by replacing the backbone with SpineNet. If you want just classify something with that backbone it is performed very well too. So new architecture for any application!
Good job.
paper: https://arxiv.org/abs/1912.05027
code: Very wanted, but not release yet
#CV #ObjectDetection #GoogleResearch #NAS #SOTA
ODS breakfast in Paris! π«π· As an exception, because of the manifestations, this Saturday, we meet at 9:45 at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts. We can have lunch after that as well. See you tomorrow.
ββNew tutorial on QA task
The T5 team competed against T5 in a "pub quiz" on (context-free) questions from the TriviaQA/NQ validation sets.
Result: team got 20% right; T5 got 35%.
Colab link: https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
Source: twitter
#NLU #NLP #Colab #Tutorial #QA
The T5 team competed against T5 in a "pub quiz" on (context-free) questions from the TriviaQA/NQ validation sets.
Result: team got 20% right; T5 got 35%.
Colab link: https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
Source: twitter
#NLU #NLP #Colab #Tutorial #QA
ββYour Classifier is Secretly an Energy-Based Model and You Should Treat it Like One
Classifiers are secretly energy-based models! Every softmax giving
The authors did math tricks to the joint Energy-Based Model (EBM) to a usual #classifier with #softmax. So it turns that in this usual classifier hiding the EBM.
Our key observation in this work is that one can slightly re-interpret the logits obtained from
In their result, they show that approach gives more adversarial robustness and for generated images maximize class confidence than its classified. It also gives more unify the density of confidence classes of all data. And of course, it can generate pictures. Interesting result by math trick!
paper: http://arxiv.org/abs/1912.03263
tweet: http://twitter.com/DavidDuvenaud/status/1204143678865866752
github: https://github.com/wgrathwohl/JEM
Classifiers are secretly energy-based models! Every softmax giving
p(c|x)
has an unused degree of freedom, which we use to compute the input density p(x).
This makes classifiers into #generative models without changing the architecture.The authors did math tricks to the joint Energy-Based Model (EBM) to a usual #classifier with #softmax. So it turns that in this usual classifier hiding the EBM.
Our key observation in this work is that one can slightly re-interpret the logits obtained from
fΞΈ
to define p(x, y)
and p(x)
as well. Without changing fΞΈ,
one can re-use the logits to define an energy-based model of the joint distribution of data point x
and labels y.
The normalizing constant cancels out, yielding the standard Softmax parameterization. Thus, we have found a generative model hidden within every standard discriminative model!In their result, they show that approach gives more adversarial robustness and for generated images maximize class confidence than its classified. It also gives more unify the density of confidence classes of all data. And of course, it can generate pictures. Interesting result by math trick!
paper: http://arxiv.org/abs/1912.03263
tweet: http://twitter.com/DavidDuvenaud/status/1204143678865866752
github: https://github.com/wgrathwohl/JEM
Story of Parisian ODS Breakfasts (and tips for setting up one at your city)
ODS breakfasts are informal regular meetings of data science enthusiasts to discuss professional themes as long with personal ones. It is like coffee break after a conference but without any conference.
Link: https://medium.com/@isprokin/ods-breakfast-is-coming-to-your-city-paris-and-562b1244febd
P.S. please press the clap button in the bottom of medium article to give the article 50 claps, to help promoting data breakfasts.
ODS breakfasts are informal regular meetings of data science enthusiasts to discuss professional themes as long with personal ones. It is like coffee break after a conference but without any conference.
Link: https://medium.com/@isprokin/ods-breakfast-is-coming-to-your-city-paris-and-562b1244febd
P.S. please press the clap button in the bottom of medium article to give the article 50 claps, to help promoting data breakfasts.
Medium
ODS breakfast is coming to your city! Paris, andβ¦
β¦it depends on you!
#NLP #News (by Sebastian Ruder):
* 2020 NLP wish lists
* #HuggingFace + #fastai
* #NeurIPS 2019
* #GPT2 things
* #ML Interviews
blog post: http://newsletter.ruder.io/archive/211277
* 2020 NLP wish lists
* #HuggingFace + #fastai
* #NeurIPS 2019
* #GPT2 things
* #ML Interviews
blog post: http://newsletter.ruder.io/archive/211277
ODS breakfast in Paris! βοΈ π«π· See you this Saturday at 10:30 (many people come around 11:00) at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts.
ββBETO: Spanish BERT
#BETO trained on a big Spanish corpus (3B number of tokens).
Similar to a BERT-Base & was trained with the Whole Word Masking technique.
Weight available for tensorflow & pytorch (cased & uncased versions)
blog post: https://medium.com/dair-ai/beto-spanish-bert-420e4860d2c6
github: https://github.com/dccuchile/beto
#BETO trained on a big Spanish corpus (3B number of tokens).
Similar to a BERT-Base & was trained with the Whole Word Masking technique.
Weight available for tensorflow & pytorch (cased & uncased versions)
blog post: https://medium.com/dair-ai/beto-spanish-bert-420e4860d2c6
github: https://github.com/dccuchile/beto
ββDriverless DeLorean drifting
#Stanford researchers taught autonomous car driving AI to drift to handle hazardous conditions better.
Link: https://news.stanford.edu/2019/12/20/autonomous-delorean-drives-sideways-move-forward/
#Autonomous #selfdriving #RL #CV #DL #DeLorean
#Stanford researchers taught autonomous car driving AI to drift to handle hazardous conditions better.
Link: https://news.stanford.edu/2019/12/20/autonomous-delorean-drives-sideways-move-forward/
#Autonomous #selfdriving #RL #CV #DL #DeLorean
Becoming an Independent Researcher and getting published in ICLR with spotlight
* It is possible to get published as an independent researcher, but it is really HARD.
* Now you need a top tier publication (ACL/EMNLP/CVPR/ICCV/ICLR/NeurIPS or ICML paper) to get accepted into PhD program.
* Mind acceptance rate of 20% and keep on grinding.
Link: https://medium.com/@andreas_madsen/becoming-an-independent-researcher-and-getting-published-in-iclr-with-spotlight-c93ef0b39b8b
#Academia #PhD #conference #learning
* It is possible to get published as an independent researcher, but it is really HARD.
* Now you need a top tier publication (ACL/EMNLP/CVPR/ICCV/ICLR/NeurIPS or ICML paper) to get accepted into PhD program.
* Mind acceptance rate of 20% and keep on grinding.
Link: https://medium.com/@andreas_madsen/becoming-an-independent-researcher-and-getting-published-in-iclr-with-spotlight-c93ef0b39b8b
#Academia #PhD #conference #learning
Medium
Becoming an Independent Researcher and getting published in ICLR with spotlight
Why I became an Independent Researcher, why you should avoid it, and advice for those who make the difficult choice anyway.
ββtop podcast episodes from 2k19 by @lexfridman:
β«οΈ on nature:
Glenn Villeneuve on @joerogan #1395 | link
β«οΈ on perception:
Donald Hoffman on @SamHarrisOrg's Making Sense #178 | link
β«οΈ on physics:
Garrett Lisi on @EricRWeinstein's The Portal #15 | link
β«οΈ on consciousness:
Philip Goff on @seanmcarroll's Mindscape #71 | link
β«οΈ on nature:
Glenn Villeneuve on @joerogan #1395 | link
β«οΈ on perception:
Donald Hoffman on @SamHarrisOrg's Making Sense #178 | link
β«οΈ on physics:
Garrett Lisi on @EricRWeinstein's The Portal #15 | link
β«οΈ on consciousness:
Philip Goff on @seanmcarroll's Mindscape #71 | link