Home
News
People
Publications
Gallary
Contact
Xue Wang
Latest
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games
Cite
×