DensePose: Dense Human Pose Estimation In The Wild
Facebook AI Research group presented a paper on pose estimation. That will help Facebook with better understanding of the processed videos.
NEW: DensePose-COCO, a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on 50K COCO images.
Project website: http://densepose.org/
Arxiv: https://arxiv.org/abs/1802.00434
#facebook #fair #cvpr #cv #CNN #dataset
Facebook AI Research group presented a paper on pose estimation. That will help Facebook with better understanding of the processed videos.
NEW: DensePose-COCO, a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on 50K COCO images.
Project website: http://densepose.org/
Arxiv: https://arxiv.org/abs/1802.00434
#facebook #fair #cvpr #cv #CNN #dataset
arXiv.org
DensePose: Dense Human Pose Estimation In The Wild
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense...
ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
Latest segmentation and detection approaches (DeepLabV3+, FasterRCNN) applied to street fashion images. Arxiv paper contains information about both: net and dataset.
Arxiv link: https://arxiv.org/abs/1807.01394
Paperdoll dataset: http://vision.is.tohoku.ac.jp/~kyamagu/research/paperdoll/
#segmentation #dataset #fashion #sv
Latest segmentation and detection approaches (DeepLabV3+, FasterRCNN) applied to street fashion images. Arxiv paper contains information about both: net and dataset.
Arxiv link: https://arxiv.org/abs/1807.01394
Paperdoll dataset: http://vision.is.tohoku.ac.jp/~kyamagu/research/paperdoll/
#segmentation #dataset #fashion #sv
vision.is.tohoku.ac.jp
Kota Yamaguchi - PaperDoll Parsing
Kota Yamaguchi's website
Hey, our fellow colleagues at OpenDataScience community are labeling a meme dataset. You can help them with the markup just by viewing memes in this bot: @MemezoidBot
#DataSet #labeling
#DataSet #labeling
27.23TB of research data in torrents! Includes dataset such as:
- Breast Cancer Cell Segmentation
- Liver Tumor Segmentation
- MRI Lesion Segmentation in Multiple Sclerosis
- Electron Microscopy, Hippocampus
- Digital Surface & Digital Terrain Model
And courses recordings, including:
- Introduction to Computer Science [CS50x] [Harvard] [2018]
- Artificial Intelligence(EDX)
- Richard Feynman's Lectures on Physics (The Messenger Lectures) (๐ฅ)
- [Coursera] Machine Learning (Stanford University) (ml)
- [Coursera] Natural Language Processing (Stanford University) (nlp)
- [Coursera] Neural Networks for Machine Learning (University of Toronto) (neuralnets)
http://academictorrents.com/
#course #torrent #dataset
- Breast Cancer Cell Segmentation
- Liver Tumor Segmentation
- MRI Lesion Segmentation in Multiple Sclerosis
- Electron Microscopy, Hippocampus
- Digital Surface & Digital Terrain Model
And courses recordings, including:
- Introduction to Computer Science [CS50x] [Harvard] [2018]
- Artificial Intelligence(EDX)
- Richard Feynman's Lectures on Physics (The Messenger Lectures) (๐ฅ)
- [Coursera] Machine Learning (Stanford University) (ml)
- [Coursera] Natural Language Processing (Stanford University) (nlp)
- [Coursera] Neural Networks for Machine Learning (University of Toronto) (neuralnets)
http://academictorrents.com/
#course #torrent #dataset
Academic Torrents
A distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.
#Google introduced Conceptual Captions, a new dataset and challenge for image captioning consisting of ~3.3 million image/caption pairs for the machine learning community to train and evaluate their own image captioning models.
Link: https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html
#dataset
Link: https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html
#dataset
Google AI Blog
Conceptual Captions: A New Dataset and Challenge for Image Captioning
Posted by Piyush Sharma, Software Engineer and Radu Soricut, Research Scientist, Google AI The web is filled with billions of images, help...
And #Google also launched #DataSet search. This is a huge breakthrough for the DS community, because now it will be easier to access some interesting data.
https://toolbox.google.com/datasetsearch
https://toolbox.google.com/datasetsearch
โโGoogle announced the updated YouTube-8M dataset
Updated set now includes a subset with verified 5-s segment level labels, along with the 3rd Large-Scale Video Understanding Challenge and Workshop at #ICCV19.
Link: https://ai.googleblog.com/2019/06/announcing-youtube-8m-segments-dataset.html
#Google #YouTube #CV #DL #Video #dataset
Updated set now includes a subset with verified 5-s segment level labels, along with the 3rd Large-Scale Video Understanding Challenge and Workshop at #ICCV19.
Link: https://ai.googleblog.com/2019/06/announcing-youtube-8m-segments-dataset.html
#Google #YouTube #CV #DL #Video #dataset
โโNew dataset with adversarial examples
Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months.
ArXiV: https://arxiv.org/abs/1907.07174
Dataset and code: https://github.com/hendrycks/natural-adv-examples
#Dataset #Adversarial
Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months.
ArXiV: https://arxiv.org/abs/1907.07174
Dataset and code: https://github.com/hendrycks/natural-adv-examples
#Dataset #Adversarial
โโThe Open Images Dataset V4 by GoogleAI
#GoogleAI present #OpenImagesV4, a #dataset of 9.2M images with unified annotations for:
โ image #classification
โ object #detection
โ visual relationship detection
30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes
paper: https://arxiv.org/abs/1811.00982v2
#GoogleAI present #OpenImagesV4, a #dataset of 9.2M images with unified annotations for:
โ image #classification
โ object #detection
โ visual relationship detection
30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes
paper: https://arxiv.org/abs/1811.00982v2
โโRethinking Generalization of Neural Models: A Named Entity Recognition Case Study
Authors use the NER task to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement.
The authors also release two datasets for future research: ReCoNLL and PLONER.
The main findings of the paper:
โ the performance of existing models (including the state-of-the-art model) heavily influenced by the degree to which test entities have been seen in the training set with the same label
โ the proposed measure enables to detect human annotation errors.
Once these errors are fixed, previous models can achieve new state-of-the-art results
โ authors introduce two measures to characterize the data bias and the cross-dataset generalization experiment shows that the performance of NER systems is influenced not only by whether the test entity has been seen in the training set but also by whether the context of the test entity has been observed
โ providing more training samples is not a guarantee of better results. A targeted increase in training samples will make it more profitable
โ the relationship between entity categories influences the difficulty of model learning, which leads to some hard test samples that are difficult to solve using common learning methods
Paper: https://arxiv.org/abs/2001.03844
Github: https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers
Website: http://pfliu.com/InterpretNER/
#nlp #generalization #NER #annotations #dataset
Authors use the NER task to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement.
The authors also release two datasets for future research: ReCoNLL and PLONER.
The main findings of the paper:
โ the performance of existing models (including the state-of-the-art model) heavily influenced by the degree to which test entities have been seen in the training set with the same label
โ the proposed measure enables to detect human annotation errors.
Once these errors are fixed, previous models can achieve new state-of-the-art results
โ authors introduce two measures to characterize the data bias and the cross-dataset generalization experiment shows that the performance of NER systems is influenced not only by whether the test entity has been seen in the training set but also by whether the context of the test entity has been observed
โ providing more training samples is not a guarantee of better results. A targeted increase in training samples will make it more profitable
โ the relationship between entity categories influences the difficulty of model learning, which leads to some hard test samples that are difficult to solve using common learning methods
Paper: https://arxiv.org/abs/2001.03844
Github: https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers
Website: http://pfliu.com/InterpretNER/
#nlp #generalization #NER #annotations #dataset
โโMLSUM: The Multilingual Summarization Corpus
The first large-scale MultiLingual SUMmarization dataset, comprising over 1.5M article/summary pairs in French, German, Russian, Spanish, and Turkish. Its complementary nature to the CNN/DM summarization dataset for English.
For each language, they selected an online newspaper from 2010 to 2019 which met the following requirements:
0 being a generalist newspaper: ensuring that a broad range of topics is represented for each language allows minimizing the risk of training topic-specific models, a fact which would hinder comparative cross-lingual analyses of the models.
1 having a large number of articles in their public online archive.
2 Providing human written highlights/summaries for the articles that can be extracted from the HTML code of the web page.
Also, in this paper, you can remember about similar other datasets
paper: https://arxiv.org/abs/2004.14900
github: https://github.com/recitalAI/MLSUM
Instructions and code will soon.
#nlp #corpus #dataset #multilingual
The first large-scale MultiLingual SUMmarization dataset, comprising over 1.5M article/summary pairs in French, German, Russian, Spanish, and Turkish. Its complementary nature to the CNN/DM summarization dataset for English.
For each language, they selected an online newspaper from 2010 to 2019 which met the following requirements:
0 being a generalist newspaper: ensuring that a broad range of topics is represented for each language allows minimizing the risk of training topic-specific models, a fact which would hinder comparative cross-lingual analyses of the models.
1 having a large number of articles in their public online archive.
2 Providing human written highlights/summaries for the articles that can be extracted from the HTML code of the web page.
Also, in this paper, you can remember about similar other datasets
paper: https://arxiv.org/abs/2004.14900
github: https://github.com/recitalAI/MLSUM
Instructions and code will soon.
#nlp #corpus #dataset #multilingual