Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Abstract

When using agent-task datasets to enhance agent capabilities for Large Language Mod- els (LLMs), current methodologies often treat all tokens within a sample equally. How- ever, we argue that tokens serving different roles—specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format)—differ significantly in importance and learning complexity, necessitating their disen- tanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Dis- criminator (SHAD) for adaptive token discrim- ination. SHAD classifies tokens by exploiting predictability differences observed after shuf- fling input-output combinations across samples: boilerplate tokens, due to their repetitive na- ture among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine- Tuning (RFT) method, which adaptively em- phasizes reasoning tokens during fine-tuning, yielding notable performance gains over com- mon Supervised Fine-Tuning (SFT)..

Publication
In ACL 2025

Citation:

@inproceedings{yeDisentanglingReasoningTokens,
	title = {Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning},
	booktitle = {Findings of the Association for Computational Linguistics, {ACL} 2025},
	publisher = {Association for Computational Linguistics},
	author = {Ye, Ziang and Zhang, Zhenru and Zhang, Yang and Ma, Jianxin and Lin, Junyang and Feng, Fuli},
	year 		= 	{2025}
}
Fuli Feng
Fuli Feng
冯福利 教授