On Artificial Intelligence

Solving Rubik’s Cube with a Robot Hand

This is fascinating, make sure you read it.

Summary: OpenAI team trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity.

https://openai.com/blog/solving-rubiks-cube/
#reinforcement_learning #machine_learning #robotics

Openai

Solving Rubik’s Cube with a robot hand

We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic…

1.13K viewsedited 17:09

On Artificial Intelligence

Proximal Policy Optimization

Paper:
https://openai.com/blog/openai-baselines-ppo/

YouTube Video:
https://www.youtube.com/watch?v=5P7I-xPq8u8&list=PLLO4N3-FoY3feUsA3_XZvn5sXy9Ms8ayE&index=2
#reinforcement_learning #optimization

Openai

Proximal Policy Optimization

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement…

147 viewsedited 05:59

On Artificial Intelligence

PyTorch tutorial of various RL algorithms:

actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / hindsight experience replay

https://github.com/higgsfield/RL-Adventure-2
#reinforcement_learning #pytorch

GitHub

GitHub - higgsfield-ai/higgsfield: Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed…

Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters - higgsfield-ai/higgsfield

876 viewsedited 07:15

On Artificial Intelligence

Distill and Transfer Learning for Robust Multitask Reinforcement Learning

"Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (DIStill & TRAnsfer Learning). Instead of sharing parameters between the different workers, we propose to share a distilled policy that captures common behavior across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning."

https://www.youtube.com/watch?v=scf7Przmh7c
#reinforcement_learning #multi_task_learning #transfer_learning

YouTube

Distill and transfer learning for robust multitask RL

Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where…

166 viewsedited 08:23

On Artificial Intelligence

A fruitful relationship between neuroscience and AI

https://deepmind.com/blog/article/Dopamine-and-temporal-difference-learning-A-fruitful-relationship-between-neuroscience-and-AI
#reinforcement_learning #machine_learning #neuroscience #artificial_intelligence

Google DeepMind

Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

Learning and motivation are driven by internal and external rewards. Many of our day-to-day behaviours are guided by predicting, or anticipating, whether a given action will result in a positive...

233 views12:41