K-order Ranking Preference Optimization for Large Language Models

Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng

May 2025

Abstract

To adapt Large Language Models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main reasons: (1) users are typically concerned with only the top-K results, making top-K ranking more important, and (2) tail items often lack precise feedback, making top-K ranking more reliable. Based on this, we propose K-order Ranking Preference Optimization (KPO) by extending the DPO’s Plackett-Luce model to accommodate top-K rankings. Additionally, recognizing that the number of important items can vary across queries, we extend KPO to dynamically determine appropriate K for different samples and introduce a curriculum learning strategy to boost training efficiency. Extensive experiments demonstrate the effectiveness of KPO, highlighting its high sample efficiency and robustness to noise.

Type

Conference paper

Publication

In ACL 2025 Findings

Citation:

@inproceedings{cai2025kpo,
  title={K-order Ranking Preference Optimization for Large Language Models},
  author={Cai, Shihao and Gao, Chongming and Zhang, Yang and Shi, Wentao and Zhang, Jizhi and Bao, Keqin and Wang, Qifan and Feng Fuli},
  booktitle={Findings of the Association for Computational Linguistics ACL 2025},
  year={2025}
}

K-order Ranking Preference Optimization for Large Language Models

Abstract

Shihao Cai

蔡仕豪

Chongming Gao

高崇铭博士后

Wentao Shi

石文焘

Jizhi Zhang

张及之

Keqin Bao

鲍克勤

Fuli Feng

冯福利教授

K-order Ranking Preference Optimization for Large Language Models

Abstract

Shihao Cai

蔡仕豪

Chongming Gao

高崇铭 博士后

Wentao Shi

石文焘

Jizhi Zhang

张及之

Keqin Bao

鲍克勤

Fuli Feng

冯福利 教授

高崇铭博士后

冯福利教授