🗼
PhD student at the University of Tokyo working on Reinforcement Learning and broader Machine Learning
Pinned Loading
-
gradientregularization_trl
gradientregularization_trl PublicImplementation for our paper "Gradient Regularization prevents Reward Hacking in RLHF and RLVR". Implemented TRL and for Huggingface Transformers
Python 11
-
OffPolicyCorrectedRewardModeling
OffPolicyCorrectedRewardModeling PublicImplementation for our COLM paper "Off-Policy Corrected Reward Modeling for RLHF"
Python 8
-
tf2multiagentrl
tf2multiagentrl PublicClean implementation of Multi-Agent Reinforcement Learning methods (MADDPG, MATD3, MASAC, MAD4PG) in TensorFlow 2.x
-
OfflineRLStructuredNonstationarity
OfflineRLStructuredNonstationarity PublicImplementation for RLC paper "Offline Reinforcement Learning from Datasets with Structured Non-Stationarity".
Python 7
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



