Data Science by ODS.ai 🦜
46.2K subscribers
649 photos
75 videos
7 files
1.74K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加入频道
Albumentation – fast & flexible image augmentations

Image Augmentations is a powerful technique to improve model robustness and performance. There are many image augmentations libraries on the market: torchvision, imgaug, DALI, Augmentor, SOLT, etc.
In all of them, authors focussed on variety at the cost of speed, or the speed at the cost of flexibility.

Requirements for augmentations:
* Variety: they want to have a large set of standard and exotic augmentation for image classification, segmentation, and detection in one place.
* Performance: transforms should be as fast as possible.
* Flexibility: it should be easy to add new transforms or new types of transforms.
* Conciseness: all complexity of implementation should be hidden behind the API.

Albumentations were born out of necessity. The authors were actively participating in various Deep Learning competitions. To get to the top they needed something better than what was already available. All of them, independently, started working on more powerful augmentation pipelines. Later they merged their efforts and released the code in the form of the library.

To date Albumentations has more than 70 transforms and supports image classification, #segmentation, object and keypoint detection tasks.

The library was adopted by academics, Kaggle, and other communities.

ODS: #tool_albumentations
Link: https://albumentations.ai/
Github: https://github.com/albumentations-team/albumentations
Paper: https://www.mdpi.com/2078-2489/11/2/125

P.S. Following trend setup by #Catalyst team, we provide extensive description of project with the help of its creators.

#guestpost #augmentation #CV #DL #imageprocessing #ods #objectdetection #imageclassification #tool
​​Data Version Control
open-source version control system for ML projects

DVC is a new type of experiment management software that has been built on top of the existing engineering toolset particularly on a source code version control system (currently Git). DVC reduces the gap between existing tools and data science needs, allowing users to take advantage of experiment management software while reusing existing skills and intuition.

Key features:
[0] simple command line Git-like experience. It does not require installing and maintaining any databases. It does not depend on any proprietary online services
[1] management and versioning of datasets and ML models. Data is saved in S3, Google Cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID
[2] makes projects reproducible and shareable; helping to answer questions about how a model was built
[3] helps manage experiments with Git tags/branches and metrics tracking

The main commands :feelsgoodmeme:
$ dvc add <name_file>
$ dvc run <name_file>
$ dvc [push/pull]


webpage: https://dvc.org
docs: https://dvc.org/doc
github: https://github.com/iterative/dvc
:ods: channel: #tool_dvc

#dvc #version #control #ml #projects #system #git