Reinforcement Learning Example

How AI and QSAR Modeling Accelerate Ligand-Based Drug Design

Ligand-based drug design combines AI and QSAR modeling to prioritize drug candidates, minimizing preclinical failures and ...

22d

Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it

Alibaba's HDPO framework trains AI agents to skip unnecessary tool calls, cutting redundant invocations from 98% to 2% while boosting reasoning accuracy.

MIT Technology Review

How robots learn: A brief, contemporary history

The latest boom in robotics represents a revolution in the way machines have learned to interact with the world.

Scientific Research Publishing

Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in ...

ABSTRACT: Personalized dosing of mood stabilizers remains challenging due to substantial inter-individual variability in symptom severity, treatment responsiveness, and vulnerability to adverse ...

Microsoft

Experiential Reinforcement Learning

Reinforcement Learning is at the core of building and improving frontier AI models and products. Yet most state-of-the-art RL methods learn primarily from outcomes: a scalar reward signal that says ...

IEEE

Reinforcement Learning for Robust Dynamic Event-Driven Constrained Control

Abstract: We consider a robust dynamic event-driven control (EDC) problem of nonlinear systems having both unmatched perturbations and unknown styles of constraints. Specifically, the constraints ...

acm.org

Rediscovering Reinforcement Learning

Reinforcement learning (RL) is machine learning (ML) in which the learning system adjusts its behavior to maximize the amount of reward and minimize the amount of punishment it receives over time ...

Scientific Research Publishing

Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in ...

ABSTRACT: Depression treatment often involves a complex and lengthy trial-and-error process, where clinicians sequentially prescribe medications to identify the most ...

GitHub

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Our training pipeline is adapted from verl and rllm(DeepScaleR). The installation commands that we verified as viable are as follows: conda create -y -n rlvr_train ...

IEEE

GAME-RL: Generating Adversarial Malware Examples against API Call Based Detection via Reinforcement Learning

Abstract: The adversarial example presents new security threats to trustworthy detection systems. In the context of evading dynamic detection based on API call sequences, a practical approach involves ...

marktechpost

LLMs Can Learn Complex Math from Just One Example: Researchers from University of Washington, Microsoft, and USC Unlock the Power of 1-Shot Reinforcement Learning with ...

Recent advancements in LLMs such as OpenAI-o1, DeepSeek-R1, and Kimi-1.5 have significantly improved their performance on complex mathematical reasoning tasks. Reinforcement Learning with Verifiable ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results