Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
The approach casts every language problem as a text-to-text task. For example, English-to-German translation – input: "translate English to German: That is good." target: "Das ist gut." or sentiment ID – input: "sentiment: This movie is terrible!", target: "negative"
Transfer learning for NLP usually uses unlabeled data for pre-training, so they assembled the "Colossal Clean Crawled Corpus" (C4), ~750GB of cleaned text from Common Crawl.
Compared to different architectural variants including encoder-decoder models and language models in various configurations and with various objectives. The encoder-decoder architecture performed best in our text-to-text setting.
More at the thread by the tweet: https://twitter.com/colinraffel/status/1187161460033458177?s=20
Paper: https://arxiv.org/abs/1910.10683
Code/models/data/etc: https://github.com/google-research/text-to-text-transfer-transformer
#NLP #DL #transformer
The approach casts every language problem as a text-to-text task. For example, English-to-German translation – input: "translate English to German: That is good." target: "Das ist gut." or sentiment ID – input: "sentiment: This movie is terrible!", target: "negative"
Transfer learning for NLP usually uses unlabeled data for pre-training, so they assembled the "Colossal Clean Crawled Corpus" (C4), ~750GB of cleaned text from Common Crawl.
Compared to different architectural variants including encoder-decoder models and language models in various configurations and with various objectives. The encoder-decoder architecture performed best in our text-to-text setting.
More at the thread by the tweet: https://twitter.com/colinraffel/status/1187161460033458177?s=20
Paper: https://arxiv.org/abs/1910.10683
Code/models/data/etc: https://github.com/google-research/text-to-text-transfer-transformer
#NLP #DL #transformer
How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Article on how business task can be decomposed to ML problem
Link: https://eng.uber.com/uber-eats-trip-optimization/
#Uber #ml #taskdesign #analytics
Article on how business task can be decomposed to ML problem
Link: https://eng.uber.com/uber-eats-trip-optimization/
#Uber #ml #taskdesign #analytics
Uber Blog
How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats | Uber Blog
Using GPS and sensor data from Android phones, Uber engineers develop a state model for trips taken by Uber Eats delivery-partners, helping to optimize trip timing for delivery-partners and eaters alike.
Two papers stating random architecture search is a competitive (in some cases superior) baseline for NAS methods.
These are papers demonstrating that Neural Architecture Search can be stohastic.
Paper 1: https://arxiv.org/abs/1902.08142
Paper 2: https://arxiv.org/abs/1902.07638
#NAS #nn #DL
These are papers demonstrating that Neural Architecture Search can be stohastic.
Paper 1: https://arxiv.org/abs/1902.08142
Paper 2: https://arxiv.org/abs/1902.07638
#NAS #nn #DL
arXiv.org
Evaluating the Search Phase of Neural Architecture Search
Neural Architecture Search (NAS) aims to facilitate the design of deep networks for new tasks. Existing techniques rely on two stages: searching over the architecture space and validating the best...
ICCV 2019 papers
ICCV 2019 – one of the major tier A conferences on Computer Vision. These are papers presented at the conference. We are definitely going to post short descriptions of the most influential ones, but if you don't want to wait, here is the link:
Link: http://openaccess.thecvf.com/ICCV2019.py
#CV #Papers
ICCV 2019 – one of the major tier A conferences on Computer Vision. These are papers presented at the conference. We are definitely going to post short descriptions of the most influential ones, but if you don't want to wait, here is the link:
Link: http://openaccess.thecvf.com/ICCV2019.py
#CV #Papers
This media is not supported in your browser
VIEW IN TELEGRAM
FUNIT: Few-Shot Unsupervised Image-to-Image Translation
A team of NVIDIA researchers has defined new AI techniques that give computers enough smarts to see a picture of one animal and recreate its expression and pose on the face of any other creature. The work is powered in part by generative adversarial networks (GANs), an emerging AI technique that pits one neural network against another.
Blog: https://blogs.nvidia.com/blog/2019/10/27/ai-gans-pets-ganimals/
Paper: https://arxiv.org/abs/1905.01723
Сode: https://github.com/NVlabs/FUNIT
GANimal app: http://nvidia-research-mingyuliu.com/ganimal/
#CV #GAN #ICCV
A team of NVIDIA researchers has defined new AI techniques that give computers enough smarts to see a picture of one animal and recreate its expression and pose on the face of any other creature. The work is powered in part by generative adversarial networks (GANs), an emerging AI technique that pits one neural network against another.
Blog: https://blogs.nvidia.com/blog/2019/10/27/ai-gans-pets-ganimals/
Paper: https://arxiv.org/abs/1905.01723
Сode: https://github.com/NVlabs/FUNIT
GANimal app: http://nvidia-research-mingyuliu.com/ganimal/
#CV #GAN #ICCV
YOLACT_ Real-Time Instance Segmentation [ICCV Trailer].mp4
19.2 MB
YOLACT: Real-time Instance Segmentation
Fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. They obtain this result after training on only one GPU.
video: https://www.youtube.com/watch?v=0pMfmo8qfpQ
paper: https://arxiv.org/abs/1904.02689
code: https://github.com/dbolya/yolact
#yolo #instance_segmentation #segmentation #real_time
Fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. They obtain this result after training on only one GPU.
video: https://www.youtube.com/watch?v=0pMfmo8qfpQ
paper: https://arxiv.org/abs/1904.02689
code: https://github.com/dbolya/yolact
#yolo #instance_segmentation #segmentation #real_time
🎃Moscow Data Halloween on the 31st of October
It’s gonna be one of the most unusual data science meetups!
We will have several Black ML talks, Data Science PPT Karaoke from Hell, costume contest with prizes, lots of fun and afterparty.
Registration link: https://corp.mail.ru/ru/press/events/678/
It’s gonna be one of the most unusual data science meetups!
We will have several Black ML talks, Data Science PPT Karaoke from Hell, costume contest with prizes, lots of fun and afterparty.
Registration link: https://corp.mail.ru/ru/press/events/678/
corp.mail.ru
Data Halloween
31 октября 2019 Mail.ru Group и сообщество Open Data Science приглашают на Data Halloween!
NLP News: Deep Learning Indaba, EurNLP, ML echo chamber, Pretrained LMs, Reproducibility papers
The famous Sebastion Ruder (Research scientist @ DeepMindAI) wrote an interesting article about the last NLP news
article: http://newsletter.ruder.io/issues/deep-learning-indaba-eurnlp-ml-echo-chamber-pretrained-lms-reproducibility-papers-199557
tweet: https://twitter.com/seb_ruder/status/1186567939232817153?s=20
#NLP #News #Conference
The famous Sebastion Ruder (Research scientist @ DeepMindAI) wrote an interesting article about the last NLP news
article: http://newsletter.ruder.io/issues/deep-learning-indaba-eurnlp-ml-echo-chamber-pretrained-lms-reproducibility-papers-199557
tweet: https://twitter.com/seb_ruder/status/1186567939232817153?s=20
#NLP #News #Conference
🏆 Moscow ML Trainings meetup on the 2nd of November
ML Trainings are based on Kaggle and other platform competitions and are held regularly with free attendance and a live stream. Winners and top-performing participants discuss competition tasks, share their solutions, and results.
Program and the registration link - https://corp.mail.ru/ru/press/events/682/
Live stream link - https://youtu.be/VNsXzK4C7gg
* Note: this time all the talks will be in Russian. Usually, we have one talk in English. @mltrainings
ML Trainings are based on Kaggle and other platform competitions and are held regularly with free attendance and a live stream. Winners and top-performing participants discuss competition tasks, share their solutions, and results.
Program and the registration link - https://corp.mail.ru/ru/press/events/682/
Live stream link - https://youtu.be/VNsXzK4C7gg
* Note: this time all the talks will be in Russian. Usually, we have one talk in English. @mltrainings
vk.company
VK / Тренировка по машинному обучению
Тренировка по машинному обучению – это открытый митап, на который мы приглашаем участников соревнований по анализу данных, чтобы познакомиться, рассказать про задачи, обменяться опытом участия и пообщаться.
ODS breakfast in Paris! See you this Saturday (2nd of November) at 10:30 at Malongo Café, 50 Rue Saint-André des Arts.
This media is not supported in your browser
VIEW IN TELEGRAM
6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
It's deep learning approach to category-level 6D object pose tracking on RGB-D data. this method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching.
These keypoints are learned end-to-end without manual supervision to be most effective for tracking. Their experiments show that the method substantially outperforms existing methods on the NOCS category-level 6D pose estimation benchmark and supports a physical robot to perform simple vision-based closed-loop manipulation tasks.
preprint: https://arxiv.org/abs/1910.10750
code: https://github.com/j96w/6-PACK
tweet: https://twitter.com/RobobertoMM/status/1187617487837257733?s=20
video: https://www.youtube.com/watch?v=INBjNZsnfy4
#CV #DL #PatternRecognition
It's deep learning approach to category-level 6D object pose tracking on RGB-D data. this method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching.
These keypoints are learned end-to-end without manual supervision to be most effective for tracking. Their experiments show that the method substantially outperforms existing methods on the NOCS category-level 6D pose estimation benchmark and supports a physical robot to perform simple vision-based closed-loop manipulation tasks.
preprint: https://arxiv.org/abs/1910.10750
code: https://github.com/j96w/6-PACK
tweet: https://twitter.com/RobobertoMM/status/1187617487837257733?s=20
video: https://www.youtube.com/watch?v=INBjNZsnfy4
#CV #DL #PatternRecognition
Keras Tuner
Fully-featured, scalable, easy-to-use hyperparameter tuning for Keras & beyond.
It supports RandomSearch, BayesianOptimization, and Hyperband. It can run locally or in a distributed setting. It's possible to have both multi-device single-model training (one machine training one model over 8 GPUs) and distributed search (many models in parallel) at the same time
documentation: https://keras-team.github.io/keras-tuner/
tweet: https://twitter.com/fchollet/status/1189992078991708160?s=21
#DL #keras #Tuning #BayesianOptimization
Fully-featured, scalable, easy-to-use hyperparameter tuning for Keras & beyond.
It supports RandomSearch, BayesianOptimization, and Hyperband. It can run locally or in a distributed setting. It's possible to have both multi-device single-model training (one machine training one model over 8 GPUs) and distributed search (many models in parallel) at the same time
documentation: https://keras-team.github.io/keras-tuner/
tweet: https://twitter.com/fchollet/status/1189992078991708160?s=21
#DL #keras #Tuning #BayesianOptimization
🔥DeepMind’s AlphaStar beats top human players at strategy game StarCraft II
AlphaStar by Google’s DeepMind can now play StarCraft 2 so well that it places in the 99.8 percentile on the European server. In other words, way better than even great human players, achieving performance similar to gods of StarCraft.
Solution basically combines reinforcement learning with a quality-diversity algorithm, which is similar to an evolutionary algorithm.
What’s difficult about StarCraft and how is it different to recent #Go and #Chess AI solutions: even finding winning strategy (StarCraft is famouse to closeness to rock-scissors-paper, not-so-transitive game design, as chess and go), is not enough to win, since the result depends on execution on different macro and micro levels at different timescales.
How that is applicable in real world: basically, it is running logistics, manufacture, research with complex operations and different units.
Why this matters: it brings AI one step closer to running real business.
Blog post: https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning
Nature: https://www.nature.com/articles/d41586-019-03298-6
ArXiV: https://arxiv.org/abs/1902.01724
Nontechnical video: https://www.youtube.com/watch?v=6eiErYh_FeY
#Google #GoogleAI #AlphaStar #Starcraft #Deepmind #nature #AlphaZero
AlphaStar by Google’s DeepMind can now play StarCraft 2 so well that it places in the 99.8 percentile on the European server. In other words, way better than even great human players, achieving performance similar to gods of StarCraft.
Solution basically combines reinforcement learning with a quality-diversity algorithm, which is similar to an evolutionary algorithm.
What’s difficult about StarCraft and how is it different to recent #Go and #Chess AI solutions: even finding winning strategy (StarCraft is famouse to closeness to rock-scissors-paper, not-so-transitive game design, as chess and go), is not enough to win, since the result depends on execution on different macro and micro levels at different timescales.
How that is applicable in real world: basically, it is running logistics, manufacture, research with complex operations and different units.
Why this matters: it brings AI one step closer to running real business.
Blog post: https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning
Nature: https://www.nature.com/articles/d41586-019-03298-6
ArXiV: https://arxiv.org/abs/1902.01724
Nontechnical video: https://www.youtube.com/watch?v=6eiErYh_FeY
#Google #GoogleAI #AlphaStar #Starcraft #Deepmind #nature #AlphaZero
YouTube
The AI that mastered Starcraft II
Google’s DeepMind artificial intelligence researchers have already mastered games like Pong, Chess and Go but their latest triumph is on another planet. AlphaStar is an artificial intelligence trained to play the science fiction video game StarCraft II.
…
…
SinGan: Learning a Generative Model from a Single Natural Image
Best Paper Award at #ICCV2019. A generative model, which learns from a single natural image, and then generates random samples.
ArXiV: https://arxiv.org/pdf/1905.01164v2.pdf
Github: https://github.com/tamarott/SinGAN
#GAN #ICCV #BestPaperAward
Best Paper Award at #ICCV2019. A generative model, which learns from a single natural image, and then generates random samples.
ArXiV: https://arxiv.org/pdf/1905.01164v2.pdf
Github: https://github.com/tamarott/SinGAN
#GAN #ICCV #BestPaperAward
Matus Telgarsky’s Deep Learning Theory course
Course syllabus, lecture handout materials from Illinois university.
Link: http://mjt.cs.illinois.edu/courses/dlt-f19/
#MOOC #DL #Theory #Course
Course syllabus, lecture handout materials from Illinois university.
Link: http://mjt.cs.illinois.edu/courses/dlt-f19/
#MOOC #DL #Theory #Course
Prescribed Generative Adversarial Networks
Adding noise to the generator's output prevent common model collapse in GANs, and also allows to approximate log-likelihood evaluation.
#GAN
Link: https://arxiv.org/abs/1910.04302
Adding noise to the generator's output prevent common model collapse in GANs, and also allows to approximate log-likelihood evaluation.
#GAN
Link: https://arxiv.org/abs/1910.04302
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
It's the method for pre-training seq2seq models by de-noising text.
BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
They evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
BART matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, Q&A, and summarization tasks, with gains of up to 6 ROUGE.
Paper: https://arxiv.org/abs/1910.13461
#nlp #bert
It's the method for pre-training seq2seq models by de-noising text.
BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
They evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
BART matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, Q&A, and summarization tasks, with gains of up to 6 ROUGE.
Paper: https://arxiv.org/abs/1910.13461
#nlp #bert
Function-Space Distributions over Kernels
With a function-space approach to kernel learning helps to incorporate interpretable inductive biases, manage uncertainty, and discover rich representations of data.
ArXiV: https://arxiv.org/abs/1910.13565
#gaussianprocess #NeurIPS #NeurIPS2019 #FKL #kernellearning
With a function-space approach to kernel learning helps to incorporate interpretable inductive biases, manage uncertainty, and discover rich representations of data.
ArXiV: https://arxiv.org/abs/1910.13565
#gaussianprocess #NeurIPS #NeurIPS2019 #FKL #kernellearning
Forwarded from Spark in me (Alexander)
The current state of "DIY" ML hardware
(i.e. that you can actually assemble and maintain and use in a small team)
Wanted to write a large post, but decided to just a TLDR.
In case you need a super-computer / cluster / devbox with 4 - 16 GPUs.
The bad
- Nvidia DGX and similar - 3-5x overpriced (sic!)
- Cloud providers (Amazon) - 2-3x overpriced
The ugly
- Supermicro GPU server solutions. This server hardware is a bit overpriced, but its biggest problem is old processor sockets
- Custom shop buit machines (with water) - very nice, but (except for water) you just pay US$5 - 10 - 15k for work you can do yourself in one day
- 2 CPU professional level motherboards - very cool, but powerful Intel Xeons are also very overpriced
The good
- Powerful AMD processor with 12-32 cores + top tier motherboard. This will support 4 GPUs on x8 speed and have a 10 Gb/s ethernet port
- Just add more servers with 10 Gb/s connection and probably later connect them into a ring ... cheap / powerful / easy to maintain
More democratization soon?
Probably the following technologies will untie our hands
- Single slot GPUs - Zotac clearly thought about it, maybe it will become mainstream in the professional market
- PCIE 4.0 => enough speed for ML even on cheaper motherboards
- New motherboards for AMD processors => maybe more PCIE slots will become normal
- Intel optane persistent memory => slow and expensive now, maybe RAM / SSD will merge (imagine having 2 TB of cheap RAM on your box)
Good chat in ODS on same topic.
#hardware
(i.e. that you can actually assemble and maintain and use in a small team)
Wanted to write a large post, but decided to just a TLDR.
In case you need a super-computer / cluster / devbox with 4 - 16 GPUs.
The bad
- Nvidia DGX and similar - 3-5x overpriced (sic!)
- Cloud providers (Amazon) - 2-3x overpriced
The ugly
- Supermicro GPU server solutions. This server hardware is a bit overpriced, but its biggest problem is old processor sockets
- Custom shop buit machines (with water) - very nice, but (except for water) you just pay US$5 - 10 - 15k for work you can do yourself in one day
- 2 CPU professional level motherboards - very cool, but powerful Intel Xeons are also very overpriced
The good
- Powerful AMD processor with 12-32 cores + top tier motherboard. This will support 4 GPUs on x8 speed and have a 10 Gb/s ethernet port
- Just add more servers with 10 Gb/s connection and probably later connect them into a ring ... cheap / powerful / easy to maintain
More democratization soon?
Probably the following technologies will untie our hands
- Single slot GPUs - Zotac clearly thought about it, maybe it will become mainstream in the professional market
- PCIE 4.0 => enough speed for ML even on cheaper motherboards
- New motherboards for AMD processors => maybe more PCIE slots will become normal
- Intel optane persistent memory => slow and expensive now, maybe RAM / SSD will merge (imagine having 2 TB of cheap RAM on your box)
Good chat in ODS on same topic.
#hardware
AnandTech
ZOTAC’s GeForce RTX 2080 Ti ArcticStorm: A Single-Slot Water Cooled GeForce RTX 2080 Ti
Ultra-high-end graphics cards these days all seem to either come with a very large triple fan cooler, or more exotically, a hybrid cooling system based around a large heatsink with fans and a liquid cooling block. Naturally, these cards use two or more slots…
Forwarded from Spark in me (Alexander)
Open STT v1.0 release
Finally we released open STT v1.0 =)
Highlights
- 20 000 hours of annotated data
- 2 new large and diverse domains
- 12k speakers (to be released soon)
- Overall quality improvement
- See below posts and releases for more details
How can I help?
- Share our dataset
- Share / publish your dataset - the more domains the better
- Upvote on habr
- Upvote on TDS (when released)
- We have an Open Collective page for donations
Links
- Open STT https://github.com/snakers4/open_stt
- Release https://github.com/snakers4/open_stt/releases
- Open TTS https://github.com/snakers4/open_tts
- Habr https://habr.com/ru/post/474462/
- Towards Data Science (coming soon)
- Bloghttps://spark-in.me/post/open-stt-release-v10
- Open collective https://opencollective.com/open_stt (edited)
Finally we released open STT v1.0 =)
Highlights
- 20 000 hours of annotated data
- 2 new large and diverse domains
- 12k speakers (to be released soon)
- Overall quality improvement
- See below posts and releases for more details
+---------------+------+--------+------+
| Domain | Utts | Hours | GB |
+---------------+------+--------+------+
| Radio | 8,3М | 11,996 | 1367 |
+---------------+------+--------+------+
| Public Speech | 1,7M | 2,709 | 301 |
+---------------+------+--------+------+
| Youtube | 2,6М | 2,117 | 346 |
+---------------+------+--------+------+
| Books | 1,3М | 1,632 | 180 |
+---------------+------+--------+------+
| Calls | 695K | 819 | 91 |
+---------------+------+--------+------+
| Other | 1.9M | 835 | 95 |
+---------------+------+--------+------+
How can I help?
- Share our dataset
- Share / publish your dataset - the more domains the better
- Upvote on habr
- Upvote on TDS (when released)
- We have an Open Collective page for donations
Links
- Open STT https://github.com/snakers4/open_stt
- Release https://github.com/snakers4/open_stt/releases
- Open TTS https://github.com/snakers4/open_tts
- Habr https://habr.com/ru/post/474462/
- Towards Data Science (coming soon)
- Bloghttps://spark-in.me/post/open-stt-release-v10
- Open collective https://opencollective.com/open_stt (edited)
GitHub
GitHub - snakers4/open_stt: Open STT
Open STT. Contribute to snakers4/open_stt development by creating an account on GitHub.