Search

Bolin Ding

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games

Published with Wowchemy — the free, open source website builder that empowers creators.