Data Science by ODS.ai 🦜
46K subscribers
664 photos
77 videos
7 files
1.75K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加ε…₯钑道
Communication-based Evaluation for Natural Language Generation (#NLG) that's dramatically out-performed standard n-gram-based methods.

Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.

Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.

Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval

#NLP #NLU
Can't agree more. Worst release of #python
#python 3.8 is released. The worst python release so far. :=

I hope, that python4 will concentrate on removing useless stuff from the core, performance, and extending typing support.

Ideally, asyncio should be moved to a separate package, := should be undone. We all make mistakes.

https://docs.python.org/3/whatsnew/3.8.html
​​Generative Image Translation for Data Augmentation in Colorectal Histopathology Images

#GAN that generates near-real #histology images according to a Turing test with 4 pathologists. The results can be used for training #DL models for detecting rare histological patterns.

ArXiV: https://arxiv.org/abs/1910.05827
Code: https://github.com/BMIRDS/HistoGAN

#CV #healthlearning #biolearning #medical
ODS breakfast in Paris! See you this Saturday (19th) at 10:30 at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts.
πŸŽ“ Reinforcement Learning Course from OpenAI

Reinforcement Learning becoming significant part of the data scientist toolbox.
OpenAI created and published one of the best courses in #RL. Algorithms implementation written in #Tensorflow.
But if you are more comfortable with #PyTorch, we have found #PyTorch implementation of this algs

OpenAI Course: https://spinningup.openai.com/en/latest/
Tensorflow Code: https://github.com/openai/spinningup
PyTorch Code: https://github.com/kashif/firedup

#MOOC #edu #course #OpenAI
Applying deep learning and Tensorflow to improve brain MRI images quality

Taking brain MRI images is complicated procedure as the orientation, location, and coverage needs to be correct in all three spatial dimentsions. The quality and consistency of positioning and orientation of the slices relies heavily on the skill and experience of the scan operator. This process can be time-consuming and difficult, especially for complex anatomies. As a result, there can be inconsistencies from scan operator to scan operator. This lack of consistency can make the job of the radiologist in interpreting these images more difficult especially when a patient is being scanned as a follow up to previous MRI exam and they are trying to identify subtle changes in anatomy or disease progression over time.

The researchers from GE Healthcare Magnetic Resonance Imaging team developed an approach to aid the scan operator. The approach is based on 3 deep neural networks, can be adopted to take MRI images of the other body parts and achieves 99.2% accuracy score. The researchers notice that Tensorflow significantly helped them to develop and deliver the approach to the production.

Medium article: https://medium.com/tensorflow/intelligent-scanning-using-deep-learning-for-mri-36dd620882c4
GE Helthcare website: https://www.gehealthcare.com

#Tensorflow #medicine #casestudy #DL #CV
​​Using open repositories to create ageing mirror

@Genekogan on Twitter reported working on a prototype, which is capable of #aging person's image in real time, developing the trend started by #FaceApp

Github: https://github.com/genekogan/glow/
Client: https://github.com/genekogan/ofxRunway

#GAN #DL #CV #WIP
Applying deep learning to Airbnb search

Story of how #Airbnb research team moved from using #GBDT (gradient boosting) to #NN (neural networks) for search, with all the metrics and hypothesises.

Link: https://blog.acolyer.org/2019/10/09/applying-deep-learning-to-airbnb-search/
ODS breakfast in Paris! See you this Saturday (26th) at 10:30 at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts.
Efficient multi-lingual language model fine-tuning

Most of the world’s text is not in English. To enable researchers and practitioners to build impactful solutions in their domains, understanding how our NLP architectures fare in many languages needs to be more than an afterthought.
In this post, we introduce our latest paper that studies multilingual text classification and introduces #MultiFiT, a novel method based on #ULMFiT.

MultiFiT, trained on 100 labeled documents in the target language, outperforms multi-lingual BERT. It also outperforms the cutting-edge LASER algorithm-even though LASER requires a corpus of parallel texts, and MultiFiT does not.

Post: http://nlp.fast.ai/classification/2019/09/10/multifit.html…
Paper: https://arxiv.org/abs/1909.04761
Tweet: https://twitter.com/seb_ruder/status/1186744388908654597?s=20

#NLP #DL #FineTuning
Learning a unified embeding for visual search at #Pinterest

How Pinterest created unified embeddings for images from different fields searching instead three different by use #multitask approach.

Link: https://blog.acolyer.org/2019/10/11/learning-a-unified-embedding-for-visual-search-at-pinterest/

#Search #CV #embeddings
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)

The approach casts every language problem as a text-to-text task. For example, English-to-German translation – input: "translate English to German: That is good." target: "Das ist gut." or sentiment ID – input: "sentiment: This movie is terrible!", target: "negative"

Transfer learning for NLP usually uses unlabeled data for pre-training, so they assembled the "Colossal Clean Crawled Corpus" (C4), ~750GB of cleaned text from Common Crawl.

Compared to different architectural variants including encoder-decoder models and language models in various configurations and with various objectives. The encoder-decoder architecture performed best in our text-to-text setting.

More at the thread by the tweet: https://twitter.com/colinraffel/status/1187161460033458177?s=20

Paper: https://arxiv.org/abs/1910.10683
Code/models/data/etc: https://github.com/google-research/text-to-text-transfer-transformer

#NLP #DL #transformer
​​ICCV 2019 papers

ICCV 2019 – one of the major tier A conferences on Computer Vision. These are papers presented at the conference. We are definitely going to post short descriptions of the most influential ones, but if you don't want to wait, here is the link:

Link: http://openaccess.thecvf.com/ICCV2019.py

#CV #Papers