Research

My research lies in the following directions:

Diffusion model and Reinforcement Learning

Diffusion models have demonstrated remarkable success in sample generation, particularly for complex, high-dimensional distributions. Reinforcement learning, in contrast, provides a principled framework for sequential decision making under uncertainty. Integrating these two paradigms has the potential to advance both fields:

  • ICML 2025: we establish convergence guarantees for consistency models, relax standard assumptions on data distributions, and analyze multistep sampling techniques.
  • ICLR 2025: a stable and efficient imitation learning method via diffusing the states and applying score matching
  • preprint : diffusion model reward maximization by reduction to supervised learning.
  • preprint: a one-stage training procedure for expressive policies with test-time scaling.

Incentive Mechanism for Data Sharing

High-quality data is costly to collect but crucial for machine learning. While agents can reduce cost and improve the model accuracy through data sharing, a naive pool protocol may lead to free-riding. In the following line of work, we design mechanisms that incentive truthful data sharing:

  • NeurIPS 2023 : mechanism design for the normal mean estimation problem that incentivizes agents to collect sufficient data and report truthfully.
  • ICML 2025 : mechanisms for heterogeneous cost functions, addressing new fundamental challenges.
  • preprint : mechanism without strong assumptions on data distributions with both theoretical justification and empirical demonstration on language and image data.

Robust Decision Making

Data corruption introduce threat to sequential decision making. The adversary may manipulate the dataset to mislead the learning agent into making a suboptimal decisions. Our goal is to design reinforcement learning algorithms that are robust against data corruption:

  • ICML 2021 : robust policy gradient algorithm for online RL and the informational-theoretical limits.
  • AISTATS 2022 : robust algorithms for offline RL.
  • AISTATS 2023: robust RL in the distributed setting, where each data source may provide different number of data points. A portion of the data sources can be arbitrarily corrupted. We design novel robust mean estimation algorithm and apply it to RL.
  • AAAI 2024: exactly optimal policy recovery is possible by a refined instance-dependent analysis with the presence of data corruption and heavy-tailed reward distributions.