Poly-semanticity - ML Notes

Definition¶

Poly-semanticity refers to the phenomenon in neural networks where individual neurons or features encode multiple distinct concepts or meanings simultaneously. This property is particularly observed in large language models and deep neural networks, where single components (neurons, attention heads, or feature dimensions) respond to or represent multiple semantic concepts, making interpretation and analysis of these networks more complex. Understanding poly-semanticity is crucial for model interpretability and optimization.

Tags¶

Interpretability, Neural Networks, Semantics, Model Analysis, Representation Learning

References¶

Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Ndousse, K., Jones, C., DasSarma, N., Hernandez, D., Drain, D., Ganguli, D., Chen, Z., Hatfield-Dodds, Z., Kernion, J., Nova, T., Lovitt, L., Sellitto, M., Kundu, S., ... Kaplan, J. (2022). Transformer Circuits Thread. https://transformer-circuits.pub/
Cammarata, N., Goh, G., Carter, S., Petrov, M., Schubert, L., Gao, C., ... & Olah, C. (2020). Thread: Circuits. Distill. Cammarata et al. (2020)

References¶

Cammarata, N., Carter, S., Goh, G., Olah, C., Petrov, M., & Schubert, L. (2020). Thread: Circuits. Distill, 5(3). 10.23915/distill.00024