Data Science by ODS.ai 🦜
46.1K subscribers
663 photos
77 videos
7 files
1.75K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加入频道
​​DeepPrivacy model for making people on photoes unrecognizable (by humans)


ArXiV: https://arxiv.org/pdf/1909.04538.pdf

#MaskRCNN #DeepPrivacy #CV #DL
Self-supervised QA from Facebook AI

The researchers from Facebook AI published a paper with the results of exploring the idea of unsupervised extractive question answering and the following training of the supervised question answering model. This approach achieves 56.41F1 on SQuAD2 dataset.


Original paper: https://research.fb.com/wp-content/uploads/2019/07/Unsupervised-Question-Answering-by-Cloze-Translation.pdf?
Code for experiments: https://github.com/facebookresearch/UnsupervisedQA


#NLP #BERT #FacebookAI #SelfSupervised
Simple, Scalable Adaptation for Neural Machine Translation

Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. Researchers from Google propose a simple yet efficient approach for adaptation in #NMT. Their proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously.

Guess it can be applied not only in #NMT but in many other #NLP, #NLU and #NLG tasks.

Paper: https://arxiv.org/pdf/1909.08478.pdf

#BERT #NMT #FineTuning
Communication-based Evaluation for Natural Language Generation (#NLG) that's dramatically out-performed standard n-gram-based methods.

Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.

Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.

Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval

#NLP #NLU
Can't agree more. Worst release of #python
#python 3.8 is released. The worst python release so far. :=

I hope, that python4 will concentrate on removing useless stuff from the core, performance, and extending typing support.

Ideally, asyncio should be moved to a separate package, := should be undone. We all make mistakes.

https://docs.python.org/3/whatsnew/3.8.html
​​Generative Image Translation for Data Augmentation in Colorectal Histopathology Images

#GAN that generates near-real #histology images according to a Turing test with 4 pathologists. The results can be used for training #DL models for detecting rare histological patterns.

ArXiV: https://arxiv.org/abs/1910.05827
Code: https://github.com/BMIRDS/HistoGAN

#CV #healthlearning #biolearning #medical
ODS breakfast in Paris! See you this Saturday (19th) at 10:30 at Malongo Café, 50 Rue Saint-André des Arts.
🎓 Reinforcement Learning Course from OpenAI

Reinforcement Learning becoming significant part of the data scientist toolbox.
OpenAI created and published one of the best courses in #RL. Algorithms implementation written in #Tensorflow.
But if you are more comfortable with #PyTorch, we have found #PyTorch implementation of this algs

OpenAI Course: https://spinningup.openai.com/en/latest/
Tensorflow Code: https://github.com/openai/spinningup
PyTorch Code: https://github.com/kashif/firedup

#MOOC #edu #course #OpenAI
Applying deep learning and Tensorflow to improve brain MRI images quality

Taking brain MRI images is complicated procedure as the orientation, location, and coverage needs to be correct in all three spatial dimentsions. The quality and consistency of positioning and orientation of the slices relies heavily on the skill and experience of the scan operator. This process can be time-consuming and difficult, especially for complex anatomies. As a result, there can be inconsistencies from scan operator to scan operator. This lack of consistency can make the job of the radiologist in interpreting these images more difficult especially when a patient is being scanned as a follow up to previous MRI exam and they are trying to identify subtle changes in anatomy or disease progression over time.

The researchers from GE Healthcare Magnetic Resonance Imaging team developed an approach to aid the scan operator. The approach is based on 3 deep neural networks, can be adopted to take MRI images of the other body parts and achieves 99.2% accuracy score. The researchers notice that Tensorflow significantly helped them to develop and deliver the approach to the production.

Medium article: https://medium.com/tensorflow/intelligent-scanning-using-deep-learning-for-mri-36dd620882c4
GE Helthcare website: https://www.gehealthcare.com

#Tensorflow #medicine #casestudy #DL #CV
​​Using open repositories to create ageing mirror

@Genekogan on Twitter reported working on a prototype, which is capable of #aging person's image in real time, developing the trend started by #FaceApp

Github: https://github.com/genekogan/glow/
Client: https://github.com/genekogan/ofxRunway

#GAN #DL #CV #WIP
Applying deep learning to Airbnb search

Story of how #Airbnb research team moved from using #GBDT (gradient boosting) to #NN (neural networks) for search, with all the metrics and hypothesises.

Link: https://blog.acolyer.org/2019/10/09/applying-deep-learning-to-airbnb-search/
ODS breakfast in Paris! See you this Saturday (26th) at 10:30 at Malongo Café, 50 Rue Saint-André des Arts.
Efficient multi-lingual language model fine-tuning

Most of the world’s text is not in English. To enable researchers and practitioners to build impactful solutions in their domains, understanding how our NLP architectures fare in many languages needs to be more than an afterthought.
In this post, we introduce our latest paper that studies multilingual text classification and introduces #MultiFiT, a novel method based on #ULMFiT.

MultiFiT, trained on 100 labeled documents in the target language, outperforms multi-lingual BERT. It also outperforms the cutting-edge LASER algorithm-even though LASER requires a corpus of parallel texts, and MultiFiT does not.

Post: http://nlp.fast.ai/classification/2019/09/10/multifit.html
Paper: https://arxiv.org/abs/1909.04761
Tweet: https://twitter.com/seb_ruder/status/1186744388908654597?s=20

#NLP #DL #FineTuning
Learning a unified embeding for visual search at #Pinterest

How Pinterest created unified embeddings for images from different fields searching instead three different by use #multitask approach.

Link: https://blog.acolyer.org/2019/10/11/learning-a-unified-embedding-for-visual-search-at-pinterest/

#Search #CV #embeddings