Debias Can be Unreliable: Mitigating Bias in Evaluating Debiasing Recommendation

Abstract

Recent work has improved recommendation models remarkably by equipping them with debiasing methods. Due to the unavailability of fully-exposed datasets, most existing approaches resort to randomly-exposed datasets as a proxy for evaluating debiased models, employing traditional evaluation scheme to represent the recommendation performance. However, in this study, we reveal that traditional evaluation scheme is not suitable for randomly-exposed datasets, leading to inconsistency between the Recall performance obtained using randomly-exposed datasets and that obtained using fully-exposed datasets. Such inconsistency indicates the potential unreliability of experiment conclusions on previous debiasing techniques and calls for unbiased Recall evaluation using randomly-exposed datasets. To bridge the gap, we propose the Unbiased Recall Evaluation (URE) scheme, which adjusts the utilization of randomly-exposed datasets to unbiasedly estimate the true Recall performance on fully-exposed datasets. We provide theoretical evidence to demonstrate the rationality of URE and perform extensive experiments on real-world datasets to validate its soundness.

Publication
In WWW short 2025

Citation:

@article{wang2024debias,
  title={Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation},
  author={Wang, Chengbing and Shi, Wentao and Zhang, Jizhi and Wang, Wenjie and Pan, Hang and Feng, Fuli},
  journal={arXiv preprint arXiv:2409.04810},
  year={2024}
}
Chengbing Wang
Chengbing Wang
王城冰
Wentao Shi
Wentao Shi
石文焘
Jizhi Zhang
Jizhi Zhang
张及之
Hang Pan
Hang Pan
潘航
Fuli Feng
Fuli Feng
冯福利 教授