Data Science by ODS.ai 🦜
46.1K subscribers
663 photos
77 videos
7 files
1.75K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加入频道
​​HiPlot: High-dimensional interactive plots made easy

Interactive parameters' performance #visualization tool. This new Facebook AI's release enables researchers to more easily evaluate the influence of their hyperparameters, such as learning rate, regularizations, and architecture.

Link: https://ai.facebook.com/blog/hiplot-high-dimensional-interactive-plots-made-easy
Github: https://github.com/facebookresearch/hiplot
Demo: https://facebookresearch.github.io/hiplot/_static/demo/demo_basic_usage.html
Pip: pip install hiplot

#hyperopt #facebook #opensource
Data Science by ODS.ai 🦜
☺️526 responses collected thanks to you! Now we are looking for a volunteer to perform an #exploratory analysis of responses an publish it as a an example on github in a form of #jupyter notebook. If you are familiar with git, jupyter, basics of #exploratory…
Our channel audience data

On 9th of February we announced that we are going to share the results of the audience research with you. And here is the release. Please feel free to open issues, suggest improvements or corrections and submit pull requests.

Stay tuned for further releases, we are going to develop concept of Ultimate posts in the form of updated github repositories, containing all the best information, insights and materials on various topics.

Project github pages site: https://open-data-science.github.io/ods_channel_stats_eda/
Github: https://github.com/open-data-science/ods_channel_stats_eda
Non-verbous audience stats: https://open-data-science.github.io/ods_channel_stats_eda/research_eda_concise_version.html

#audience #eda #opensource #introspect
​​Ultimate post on where to start learning DS

Most common request we received through the years was to share insights and advices on how to start career in data science and to recommend decent cources. Apparently, using hashtag #wheretostart wasn't enough so we were sharing some general advices.

So we assembled a through guide on how to start learning machine learning and created another #ultimatepost (in a form of a github repo, so it will be keep updated and anyone can submit worthy piece of advice to it).

We welcome you to share your stories and advices on how to start rolling into data science, as well as to spread the link to the repo to those your friends who might benefit from it.

Link: Ultimate post

#entrylevel #beginner #junior #MOOC #learndatascience #courses #mlcourse #opensource
👍13😁21
🦜 Hi!

We are the first Telegram Data Science channel.


Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.


Ultimate Posts

* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda


Open Data Science

ODS.ai is an international community of people anyhow related to Data Science.

Website: https://ods.ai



Hashtags

Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.

#deeplearning #DL — post about deep neural networks (> 1 layer)
#cv — posts related to Computer Vision. Pictures and videos
#nlp #nlu — Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition — related to audio information processing
#ar — augmeneted reality related content
#rl — Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart — about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 — related to special architectures or frameworks
#coding #CS — content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface — hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization


Chats

- Data Science Chat https://yangx.top/datascience_chat
- ODS Slack through invite form at website

ODS resources

* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai

Feedback and Contacts

You are welcome to reach administration through telegram bot: @opendatasciencebot
👍56🔥157🥰2😁2🎉21👎1👏1
Hi, our friends @mike0sv and @agusch1n just open-sourced MLEM - a tool that helps you deploy your ML models as part of the DVC ecosystem

It’s a Python library + Command line tool.

TLDR:
📦 MLEM can package an ML model into a Docker image or a Python package, and deploy it to Heroku (we made them promise to add SageMaker, K8s and Seldon-core soon :parrot:).

⚙️ MLEM saves all model metadata to a human-readable text file: Python environment, model methods, model input & output data schema and more.

💅 MLEM helps you turn your Git repository into a Model Registry with features like ML model lifecycle management.

Read more in release blogpost: https://dvc.org/blog/MLEM-release
Also, check out the project: https://github.com/iterative/mlem
And the website: https://mlem.ai

Guys are happy to hear your feedback, discuss how this could be helpful for you, how MLEM compares to MLflow, etc.
Ask in the comments!

#mlops #opensource #deployment #dvc
🔥32👍13👎3🤔1
#opensource : RuLeanALBERT от Yandex Research
2.9B трансформер для русского, которая влезет в домашнюю ПеКарню ресерчера

Мало того, что это самая большая БЕРТ-подобная модель для русского языка, которая показывает крутые результаты в бенчмарках, так еще и с кодом для fine-tuning-а

GitHub

А в статье можете узнать, как обучалась эта модель (а-ля коллаборативное глубокое обучение) на фреймворке по децентрализованному обучению Hivemind
👍283
​​OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

The OBELICS dataset is a game-changer in the world of machine learning and AI! Unlike existing closed-source datasets, OBELICS is a vast, open-source, web-scale dataset specially curated for training large multimodal models. Boasting 141 million web pages from Common Crawl, 353 million high-quality images, and an impressive 115 billion text tokens, OBELICS sets a new standard in the richness and diversity of training data.

But it's not just about the numbers; it's about results. To prove its mettle, models with 9 and 80 billion parameters were trained on OBELICS, showcasing competitive performance across various multimodal benchmarks. Named IDEFICS, these models outperformed or matched their closed-source counterparts, proving that OBELICS isn't just a theoretical concept—it's a practical, high-impact alternative.

Paper link: https://huggingface.co/papers/2306.16527
Model card link: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct
Blogpost link: https://huggingface.co/blog/idefics

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-obelisc

#deeplearning #cv #nlp #largelanguagemodel #opensource
👍8🔥32🥰1