When using agent-task datasets to enhance agent capabilities for Large Language Mod- els (LLMs), current methodologies often treat all tokens within a sample equally. How- ever, we argue that tokens serving different roles—specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format)—differ significantly in importance and learning complexity, necessitating their disen- tanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Dis- criminator (SHAD) for adaptive token discrim- ination. SHAD classifies tokens by exploiting predictability differences observed after shuf- fling input-output combinations across samples: boilerplate tokens, due to their repetitive na- ture among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine- Tuning (RFT) method, which adaptively em- phasizes reasoning tokens during fine-tuning, yielding notable performance gains over com- mon Supervised Fine-Tuning (SFT)..
Citation:
@inproceedings{yeDisentanglingReasoningTokens,
title = {Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning},
booktitle = {Findings of the Association for Computational Linguistics, {ACL} 2025},
publisher = {Association for Computational Linguistics},
author = {Ye, Ziang and Zhang, Zhenru and Zhang, Yang and Ma, Jianxin and Lin, Junyang and Feng, Fuli},
year = {2025}
}