Data Science by ODS.ai 🦜

AutoFlip: An Open Source Framework for Intelligent Video Reframing

Google released a tool for smart video cropping. Video cropping doesn't seem like a poblem until you release that object that should be in focus can be in different parts of picture. Now there is great attempt to provide one-click solution to cropping.

Interesting part: #AutoFlip is an application of #MediaPipe framework for building multimodal ML #pipelines.

Github: https://github.com/google/mediapipe/blob/master/mediapipe/docs/autoflip.md
MediaPipe: https://github.com/google/mediapipe/

#Google #GoogleAI #DL #CV

11.3K viewsedited 05:19

📹 26 🙄 11

Data Science by ODS.ai 🦜

Racial Disparities in Automated Speech Recognition

To no surprise, speech recognition tools have #bias due to the lack of diversity in the datasets. Group of explorers addressed that issue and provided their’s research results as a paper and #reproducible research repo.

Project link: https://fairspeech.stanford.edu
Paper: https://www.pnas.org/cgi/doi/10.1073/pnas.1915768117
Github: https://github.com/stanford-policylab/asr-disparities

#speechrecognition #voice #audiolearning #dl #microsoft #google #apple #ibm #amazon

9.56K views12:32

🙂 9 😧 13

Data Science by ODS.ai 🦜

Lo-Fi Player

The team from the magenta project, that does research about deep learning and music powered by TensorFlow in Google, obviously, release a new fun project lofi-player powered by their open-source library magenta.js.

So it's basically a lo-fi music generator which popular genre on youtube streams and other kinds of stuff. You can customize the vibe on your manner and wish from sad to moody, slow to fast, etc.

It is based on their earlier work MusicVae to sample latent space of music and MelodyRNN to generate music sequences from different instruments. The project is not about new research, but to show what can do with an already done library in a creative way.

They also create a stream on youtube to listen lo-fi generated by that application and users in chat can together tune lo-fi player in stream :)

#magenta #lo-fi #music #google #tensorflow #fun

Lo-Fi Player

Interactive lofi beat player.

16.2K viewsedited 13:33

🎸 65 🎼 28

Data Science by ODS.ai 🦜

Waymo started driverless tests in Phoenix

This #Google company plans to expand tests to cover whole state later.

Blog: https://blog.waymo.com/2020/10/waymo-is-opening-its-fully-driverless.html
Redditers’ experience: https://www.reddit.com/r/waymo/comments/j7rphd/4_minute_full_video_in_waymo_one_no_driver_short/

#autonomousrobots #selfdriving #rl #DL

0:31

15K views09:26

🚙 29 🤖 26

Data Science by ODS.ai 🦜

Introducing Model Search: An Open Source Platform for Finding Optimal ML Models

#Google has released an open source #AutoML framework capable of hyperparameter tuning and ensembling.

Blog post: https://ai.googleblog.com/2021/02/introducing-model-search-open-source.html
Repo: https://github.com/google/model_search

👍1

16.1K views12:06

Data Science by ODS.ai 🦜

🦜 Hi!

We are the first Telegram Data Science channel.

Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.

Ultimate Posts

* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda

Open Data Science

ODS.ai is an international community of people anyhow related to Data Science.

Website: https://ods.ai

Hashtags

Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.

#deeplearning #DL — post about deep neural networks (> 1 layer)
#cv — posts related to Computer Vision. Pictures and videos
#nlp #nlu — Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition — related to audio information processing
#ar — augmeneted reality related content
#rl — Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart — about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 — related to special architectures or frameworks
#coding #CS — content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface — hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization

Chats

- Data Science Chat https://yangx.top/datascience_chat
- ODS Slack through invite form at website

ODS resources

* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai

Feedback and Contacts

You are welcome to reach administration through telegram bot: @opendatasciencebot

GitHub

ultimate_posts/where_to_start at master · open-data-science/ultimate_posts

Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts

👍56🔥15❤7🥰2😁2🎉2⚡1👎1👏1

30.8K viewsedited 11:15

Data Science by ODS.ai 🦜

Imagen — new neural network for picture generation from Google

TLDR: Competitor of DALLE was released.

Imagen — text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. #Google key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.

Website: https://imagen.research.google

#GAN #CV #DL #Dalle

🔥38👍23🤯3❤2😱1

24.3K views10:57

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

⚡️ Gemma 3 QAT

Google DeepMind выпустили обновленные версии своих языковых моделей Gemma 3, которые стали значительно эффективнее по использованию памяти без существенной потери производительности.

Ключевая технология: QAT (Quantization-Aware Training)

Что это? QAT — это техника обучения, при которой модель во время дообучения "учится" работать с пониженной точностью вычислений (используя меньше бит для представления чисел). Это имитирует условия, в которых модель будет работать после квантизации (сжатия).

Обычная квантизация после обучения может привести к падению точности. QAT позволяет модели заранее адаптироваться к работе в низкоточном режиме, минимизируя потерю качества после финальной квантизации.

Каждая модель (1B, 4B, 12B, 27B) была дообучена примерно на 5000 шагов с имитацией низкой разрядности весов. При этом использовался приём, похожий на знание-дистилляцию: оригинальная неквантованная модель выступала в роли «учителя».

Преимущество QAT-подхода для Gemma 3 оказалось колоссальным. Официально заявлено, что квантованные модели Gemma 3 QAT сохраняют качество, практически не упало, при этом требуют в ~3 раза меньше памяти.

Например, объём памяти для хранения весов самой крупной модели на 27B параметров сократился с ~54 ГБ (в формате bfloat16) до ~14 ГБ в 4-битном целочисленном формате – это экономия памяти примерно в ~3–4 раза.

ollama run hf(.)co/google/gemma-3-4b-it-qat-q4_0-gguf

✔️HF

@ai_machinelearning_big_data

#google #gemma #AI #ML #LLM #Quantization

👍5🔥5❤1🥰1

2.9K views06:31

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

0:30

This media is not supported in your browser

VIEW IN TELEGRAM

🖥

Теперь официально Google выпустили Gemini CLI - AI-агента для работы в терминале

• Лёгкий и мощный инструмент для разработки в командной строке
• Работает на базе Gemini 2.5 Pro
• Код агента в открытом доступе (Apache 2.0)
• Поддержка контекста в 1 миллион токенов
• Бесплатный тариф: до 60 запросов в минуту и 1000 в день
• Привязка к Google Search
• Поддержка MCP
• Интеграция с VS Code (Gemini Code Assist)

Запуск в cli: npx https://github.com/google-gemini/gemini-cli

🔜

Анонс: https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/

🔜

Github: https://github.com/google-gemini/gemini-cli/

@ai_machinelearning_big_data

#AI #ML #agent #Google

Please open Telegram to view this post

VIEW IN TELEGRAM

🔥8❤7👍2

2.32K views14:31

About

Blog

Apps

Platform