Search

Xue Wang

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games

Published with Wowchemy — the free, open source website builder that empowers creators.