Large language models (LLMs) often exhibit hallucinations, producing incorrect or outdated knowledge. Hence, model editing methods have emerged to enable targeted knowledge updates. To achieve this, a prevailing paradigm is the locating-then-editing approach, which first locates influential parameters and then edits them by introducing a perturbation. While effective, current studies have demonstrated that this perturbation inevitably disrupt the originally preserved knowledge within LLMs, especially in sequential editing scenarios. To address this, we introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters. We theoretically prove that this projection ensures the output of post-edited LLMs remains unchanged when queried about the preserved knowledge, thereby mitigating the issue of disruption. Extensive experiments on various LLMs, including LLaMA3, GPT2-XL, and GPT-J, show that AlphaEdit boosts the performance of most locating-then-editing methods by an average of 36.7% with a single line of additional code for projection solely.
Citation:
@inproceedings{jin2024mcdp,
title = {AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models},
author = {Junfeng Fang and
Houcheng Jiang and
Kun Wang and
Yunshan Ma and
Xiang Wang and
Xiangnan He and
Tat{-}Seng Chua},
booktitle = {International Conference on Learning Representations},
year = {2025}
}