Home
News
People
Publications
Gallary
Contact
Bolin Ding
Latest
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
β
-DPO: Direct Preference Optimization with Dynamic
β
Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games
Cite
×