Reinforcement Learning Course

Deep Learning with Yacine on MSN

DeepSeek R1 Explained: GRPO, Reinforcement Learning & SFT

Dive into DeepSeek R1 and explore GRPO, reinforcement learning, and supervised fine-tuning (SFT) in an easy-to-understand way ...

The post-training revolution: How reinforcement learning is upending the AI infra stack

TechCrunch was proud to host Scale Venture Partners at Disrupt 2025 in San Francisco. Here’s an overview of their AI Stage session. The reinforcement learning market has exploded, with enterprises ...

The Robot Report

AgiBot deploys its Real-World Reinforcement Learning system

AgiBot said its Real-World Reinforcement Learning system lets robots learn new skills in minutes on a pilot production line.

Unite.AI

How RL-as-a-Service is Unleashing a New Wave of Autonomy

Reinforcement learning has long been one of artificial intelligence's most promising yet an under explored fields. This is the technology behind the most incredible AI achievements, from algorithms ...

TMCnet

Cognizant's AI Lab Announces Breakthrough Research for Fine-Tuning LLMs and Records its 61st U.S. Patent Issuance

Cognizant (Nasdaq: CTSH) today announced a breakthrough from its AI Lab that introduces a novel, efficiency-focused method ...

Healthcare IT News

NTU leads app-based psychological first aid training in Singapore

Featuring AI-powered role-play simulations, the app allows learners to practise recognising distress and offering empathetic ...

Unite.AI

The End of Tabula Rasa: How Pre-Trained World Models are Redefining Reinforcement Learning

For a long time, the core idea in reinforcement learning (RL) was that AI agents should learn every new task from scratch, like a blank slate. This "tabula rasa" approach led to amazing achievements, ...

11d

Inside Ring-1T: Ant engineers solve reinforcement learning bottlenecks at trillion scale

Ant Group, an affiliate of Alibaba, released Ring-1T which it says is the first trillion parameter open-source model.

IEEE

DRLLog: Deep Reinforcement Learning for Online Log Anomaly Detection

Abstract: System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection ...

IEEE

Hierarchical Safe Reinforcement Learning Control for Leader-Follower Systems With Prescribed Performance

Abstract: This paper proposes a hierarchical safe reinforcement learning with prescribed performance control (HSRL-PPC) scheme to address the challenges of interconnected leader-follower systems ...

GitHub

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

We introduce the RLFR that offering a novel perspective on shaping RLVR with flow rewards derived from latent space, and thus extending RLVR with latent rewards utilization. Our approach highlight the ...

26d

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

By teaching models to reason during foundational training, the verifier-free method aims to reduce logical errors and boost ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results