As Large Language Models (LLMs) demonstrate impressive capabilities, demystifying their internal mechanisms becomes increasingly vital. Neuron attribution, which attributes LLM outputs to specific neurons to reveal the semantic properties they learn, has emerged as a key interpretability approach. However, while neuron attribution has made significant progress in deciphering text-only LLMs, its application to Multimodal LLMs (MLLMs) remains less explored. To address this gap, we propose a novel Neuron Attribution method tailored for MLLMs, termed NAM. Specifically, NAM not only reveals the modality-specific semantic knowledge learned by neurons within MLLMs, but also highlights several intriguing properties of neurons, such as cross-modal invariance and semantic sensitivity. These properties collectively elucidate the inner workings mechanism of MLLMs, providing a deeper understanding of how MLLMs process and generate multi-modal content. Through theoretical analysis and empirical validation, we demonstrate the efficacy of NAM and the valuable insights it offers. Furthermore, leveraging NAM, we introduce a multi-modal knowledge editing paradigm, underscoring the practical significance of our approach for downstream applications of MLLMs.
Citation:
@inproceedings{NAM,
author = {Junfeng Fang and Zac Bi and Ruipeng Wang and Houcheng Jiang and Yuan Gao and Kun Wang and An Zhang and Jie Shi and Xiang Wang and Tat-Seng Chua},
title = { Towards Neuron Attributions in Multi-Modal Large Language Models},
booktitle = {NeurIPS},
url = {https://openreview.net/forum?id=jMJVFP4BH6}
year = {2024}
}```