Data Science by ODS.ai 🦜
46.1K subscribers
663 photos
77 videos
7 files
1.75K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加ε…₯钑道
Amazon’s SageMaker Object2Vec, a highly customizable algorithm that can learn embeddings of various types high-dimensional objects.

Link: https://aws.amazon.com/ru/blogs/machine-learning/introduction-to-amazon-sagemaker-object2vec/

#Object2Vec #Amazon #Embeddings
Prototypical Clustering Networks for Dermatological Disease Diagnosis

Paper will be presented at the ML4D workshop at #NIPS2018

Link: https://arxiv.org/abs/1811.03066

#nn #bio #medical
❀1
Monitor Your PyTorch Models With Five Extra Lines of Code



Ever felt like manually managing your Visdom / TensorBoard server and logs is a pain across experiments, projects and teams?
Weights & Biases provides a simple cloud-based experiment logging and plotting system, with easy integration for PyTorch models.

Link: https://www.wandb.com/blog/monitor-your-pytorch-models-with-five-extra-lines-of-code

#pytorch
New paper on Lipschitz neural net architectures. Uses sorting as an activation function, with matrix norm constrained weights. Universal Lipschitz function approx. Enforce adversarial robustness (margin) using hinge loss.

Link: https://arxiv.org/abs/1811.05381

#nn #lipschitz
​​Neural network 3D visualization framework. Very nice in-depth visualizations.

Now you can actually see how the layers look.

Github: https://github.com/tensorspace-team/tensorspace
LiveDemo (!): https://tensorspace.org/html/playground/vgg16.html

#visualization #nn
​​Really interesting talk at MLconfSF by Franziska Bell on how #Uber uses NLP for customer experience. Most of what was described are recent advances in their COTA platform.

Link: https://eng.uber.com/cota/
​​DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution

Using GANs to generate MasterFingerPrints that unlock 22-78% phones sensors (dep. on security level of sensor). It doesn't get much more "adversarial" than that.

This work can be potentially used to create fingerprint which can be used to match 22-78% of fingerprints in the wild, creating Skeleton key, fitting any security system, including home alarm or phone lock.

ArXiV: https://arxiv.org/pdf/1705.07386.pdf

#GAN #security #fingerprint
Sptoify announced its new Data Science Challenge

Spotify Sequential Skip Prediction Challenge is a part of #WSDM Cup 2019. The dataset comprises 130M Spotify listening sessions, and the task is to predict if a track is skipped. The challenge is live today, and runs until Jan 4.

Link: https://www.crowdai.org/challenges/spotify-sequential-skip-prediction-challenge

#kaggle #CompetitiveDataScience #Spotify
ImageNet/ResNet-50 Training speed dramatically (6.6 min -> 224 sec) reduced

ResNet-50 on ImageNet now (allegedly) down to 224sec (3.7min) using 2176 V100s. Increasing batch size schedule, LARS, 5 epoch LR warmup, synch BN without mov avg. (mixed) fp16 training. "2D-Torus" all-reduce on NCCL2, with NVLink2 & 2 IB EDR interconnect.

1.28M images over 90 epochs with 68K batches, so the entire optimization is ~1700 updates to converge.

ArXiV: https://arxiv.org/abs/1811.05233

#ImageNet #ResNet
Gradient Descent Provably Optimizes Over-parameterized Neural Networks

Paper shows that the loss of two-layer neural networks can be optimized to zero in polynomial time using gradient descent.

ArXiV: https://arxiv.org/pdf/1810.02054.pdf

#nn #dl
And the same for #ResNet, #RNN and feed-forward #nn without residual connections.

Gradient Descent Finds Global Minima of Deep Neural Networks
ArXiV: https://arxiv.org/pdf/1811.03804.pdf

On the Convergence Rate of Training Recurrent Neural Networks
ArXiV: https://arxiv.org/pdf/1810.12065.pdf

A Convergence Theory for Deep Learning via Over-Parameterization
ArXiV: https://arxiv.org/pdf/1811.03962.pdf

#dl
​​Interpolations between a pomeranian and a pomegranate.

#GAN #vizualization
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

HMTL is a Hierarchical Multi-Task Learning model which combines a set of four carefully selected semantic tasks. The model achieves state-of-the-art results on Named Entity Recognition, Entity Mention Detection and Relation Extraction. Using SentEval, we show that as we move from the bottom to the top layers of the model, the model tend to learn more complex semantic representation.

ArXiV: https://arxiv.org/abs/1811.06031
Github: https://github.com/huggingface/hmtl

#SOTA #NLP #MultiTask
This media is not supported in your browser
VIEW IN TELEGRAM
California wildfire #visualization

How weather conditions during California's fire season have evolved over time.
Nice paper from the #GoogleAI team, grading prostate cancer in prostatectomy specimens.

The model outperforms humans on the silver standard labels (panel of experts), but there is no clear winner for outcome prediction in the K-M plot/c-index.

Β«the mean accuracy among 29 general pathologists was 0.61. The DLS achieved an... accuracy of 0.70 (p=0.002) and trended towards better patient risk stratificationΒ»

Post: https://ai.googleblog.com/2018/11/improved-grading-of-prostate-cancer.html
ArXiV: https://arxiv.org/abs/1811.06497

#DL #medical #cancer
Difference between machine learning and AI:

If it is written in Python, it's probably machine learning

If it is written in PowerPoint, it's probably AI