Data Science by ODS.ai 🦜

HiPlot: High-dimensional interactive plots made easy

Interactive parameters' performance #visualization tool. This new Facebook AI's release enables researchers to more easily evaluate the influence of their hyperparameters, such as learning rate, regularizations, and architecture.

Link: https://ai.facebook.com/blog/hiplot-high-dimensional-interactive-plots-made-easy
Github: https://github.com/facebookresearch/hiplot
Demo: https://facebookresearch.github.io/hiplot/_static/demo/demo_basic_usage.html
Pip: pip install hiplot

#hyperopt #facebook #opensource

11.8K views17:18

🤡 15 😎 35

Data Science by ODS.ai 🦜

☺️526 responses collected thanks to you! Now we are looking for a volunteer to perform an #exploratory analysis of responses an publish it as a an example on github in a form of #jupyter notebook. If you are familiar with git, jupyter, basics of #exploratory…

Our channel audience data

On 9th of February we announced that we are going to share the results of the audience research with you. And here is the release. Please feel free to open issues, suggest improvements or corrections and submit pull requests.

Stay tuned for further releases, we are going to develop concept of Ultimate posts in the form of updated github repositories, containing all the best information, insights and materials on various topics.

Project github pages site: https://open-data-science.github.io/ods_channel_stats_eda/
Github: https://github.com/open-data-science/ods_channel_stats_eda
Non-verbous audience stats: https://open-data-science.github.io/ods_channel_stats_eda/research_eda_concise_version.html

#audience #eda #opensource #introspect

GitHub

GitHub - open-data-science/ods_channel_stats_eda: Public analysis of ODS Channel questionnaire statistics

Public analysis of ODS Channel questionnaire statistics - open-data-science/ods_channel_stats_eda

12.2K views17:15

Data Science by ODS.ai 🦜

Overview of Open Source projects growth metrics

Quantative analytics of top starred repositories.

Link: https://medium.com/runacapital/open-source-growth-benchmarks-and-the-20-fastest-growing-oss-startups-d3556a669fe6

#opensource #analytics #statistics #growth

16.5K views11:35

☺️ 12 ⭐️ 30

Data Science by ODS.ai 🦜

Ultimate post on where to start learning DS

Most common request we received through the years was to share insights and advices on how to start career in data science and to recommend decent cources. Apparently, using hashtag #wheretostart wasn't enough so we were sharing some general advices.

So we assembled a through guide on how to start learning machine learning and created another #ultimatepost (in a form of a github repo, so it will be keep updated and anyone can submit worthy piece of advice to it).

We welcome you to share your stories and advices on how to start rolling into data science, as well as to spread the link to the repo to those your friends who might benefit from it.

Link: Ultimate post

#entrylevel #beginner #junior #MOOC #learndatascience #courses #mlcourse #opensource

👍13😁2❤1

27.2K viewsedited 10:54

☺️ 121 ⭐️ 110

Data Science by ODS.ai 🦜

Open Software Packaging for Science

#opensource alternative to #conda.

Mamba (drop-in replacement) direct link: https://github.com/TheSnakePit/mamba
Link: https://medium.com/@QuantStack/open-software-packaging-for-science-61cecee7fc23

#python #packagemanagement

GitHub

GitHub - mamba-org/mamba: The Fast Cross-Platform Package Manager

The Fast Cross-Platform Package Manager. Contribute to mamba-org/mamba development by creating an account on GitHub.

👍1

17.9K views12:21

Data Science by ODS.ai 🦜

Hands on ML notebook series

Updated our ultimate post with a series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Link: https://github.com/ageron/handson-ml

#wheretostart #opensource #jupyter

GitHub

GitHub - ageron/handson-ml: ⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead. - ageron/handson-ml

18.7K viewsedited 18:45

Data Science by ODS.ai 🦜

🦜 Hi!

We are the first Telegram Data Science channel.

Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.

Ultimate Posts

* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda

Open Data Science

ODS.ai is an international community of people anyhow related to Data Science.

Website: https://ods.ai

Hashtags

Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.

#deeplearning #DL — post about deep neural networks (> 1 layer)
#cv — posts related to Computer Vision. Pictures and videos
#nlp #nlu — Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition — related to audio information processing
#ar — augmeneted reality related content
#rl — Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart — about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 — related to special architectures or frameworks
#coding #CS — content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface — hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization

Chats

- Data Science Chat https://yangx.top/datascience_chat
- ODS Slack through invite form at website

ODS resources

* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai

Feedback and Contacts

You are welcome to reach administration through telegram bot: @opendatasciencebot

GitHub

ultimate_posts/where_to_start at master · open-data-science/ultimate_posts

Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts

👍56🔥15❤7🥰2😁2🎉2⚡1👎1👏1

30.8K viewsedited 11:15

Data Science by ODS.ai 🦜

Hi, our friends @mike0sv and @agusch1n just open-sourced MLEM - a tool that helps you deploy your ML models as part of the DVC ecosystem

It’s a Python library + Command line tool.

TLDR:
📦 MLEM can package an ML model into a Docker image or a Python package, and deploy it to Heroku (we made them promise to add SageMaker, K8s and Seldon-core soon :parrot:).

⚙️ MLEM saves all model metadata to a human-readable text file: Python environment, model methods, model input & output data schema and more.

💅 MLEM helps you turn your Git repository into a Model Registry with features like ML model lifecycle management.

Read more in release blogpost: https://dvc.org/blog/MLEM-release
Also, check out the project: https://github.com/iterative/mlem
And the website: https://mlem.ai

Guys are happy to hear your feedback, discuss how this could be helpful for you, how MLEM compares to MLflow, etc.
Ask in the comments!

#mlops #opensource #deployment #dvc

🔥32👍13👎3🤔1

24K views16:00

Data Science by ODS.ai 🦜

Forwarded from DataGym Channel [Power of data]

#opensource : RuLeanALBERT от Yandex Research
2.9B трансформер для русского, которая влезет в домашнюю ПеКарню ресерчера

Мало того, что это самая большая БЕРТ-подобная модель для русского языка, которая показывает крутые результаты в бенчмарках, так еще и с кодом для fine-tuning-а

GitHub

А в статье можете узнать, как обучалась эта модель (а-ля коллаборативное глубокое обучение) на фреймворке по децентрализованному обучению Hivemind

GitHub

GitHub - yandex-research/RuLeanALBERT: RuLeanALBERT is a pretrained masked language model for the Russian language that uses a…

RuLeanALBERT is a pretrained masked language model for the Russian language that uses a memory-efficient architecture. - yandex-research/RuLeanALBERT

👍28❤3

19.7K views10:14

Data Science by ODS.ai 🦜

OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

The OBELICS dataset is a game-changer in the world of machine learning and AI! Unlike existing closed-source datasets, OBELICS is a vast, open-source, web-scale dataset specially curated for training large multimodal models. Boasting 141 million web pages from Common Crawl, 353 million high-quality images, and an impressive 115 billion text tokens, OBELICS sets a new standard in the richness and diversity of training data.

But it's not just about the numbers; it's about results. To prove its mettle, models with 9 and 80 billion parameters were trained on OBELICS, showcasing competitive performance across various multimodal benchmarks. Named IDEFICS, these models outperformed or matched their closed-source counterparts, proving that OBELICS isn't just a theoretical concept—it's a practical, high-impact alternative.

Paper link: https://huggingface.co/papers/2306.16527
Model card link: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct
Blogpost link: https://huggingface.co/blog/idefics

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-obelisc

#deeplearning #cv #nlp #largelanguagemodel #opensource

👍8🔥3❤2🥰1

13.5K views04:31

About

Blog

Apps

Platform