Papers - ML Notes

A collection of papers with summaries and quick access links.

Paper Notes¶

Title	Tags	Full Notes
The Super Weight in Large Language Models	Model internals
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
ReFT: Representation Finetuning for Language Models	Fine-tuning, Model representations
Dropout: A Simple Way to Prevent Neural Networks from Overfitting	Model performance, Optimization, Model architecture
Observation-based unit test generation at Meta	Automated testing, Software engineering
Gecko: Versatile Text Embeddings Distilled from Large Language Models	Embeddings, Model distillation
Language Models of Code are Few-Shot Commonsense Learners	Code models, Transfer learning
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated	Efficiency, Model performance
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning	Code generation, Reinforcement learning
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models	Data pruning, Perplexity	Details
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits	Hardware optimization, Model compression
The Curse of Recursion: Training on Generated Data Makes Models Forget	Model collapse, Training data
Chain of thought prompting	Prompting strategies
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention	Attention mechanisms, Context window
TextGrad	Agent systems, Text optimization
Scalable MatMul-free Language Modeling	Attention mechanisms, Model efficiency
Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing?	AI in software development, Future of coding
Scalable Extraction of Training Data from (Production) Language Models	Data accumulation, Model performance, Curated datasets
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?	Dataset creation, Small language models
GAIA: A Benchmark for General AI Assistants	AI assistants, Benchmarking
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet	Feature extraction, Interpretability
Mixture-of-Agents Enhances Large Language Model Capabilities	Model performance, Multi-agent systems
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems	Transformers, Model architecture, Model performance
Large Language Models Understand and Can Be Enhanced by Emotional Stimuli	emotional stimuli, Model behavior
Automated Unit Test Improvement using Large Language Models at Meta	Automated testing, Software engineering
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data	Data accumulation, Model collapse prevention
A\ Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks	Reinforcement learning, Search algorithms
Large Language Models: A Survey	LLM capabilities, Survey
A Language Model’s Guide Through Latent Space	Interpretability, Latent space
Extracting Latent Steering Vectors from Pretrained Language Models	Interpretability, Latent space
Many-shot jailbreaking	Jailbreaking, Model safety
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	Transfer learning
Activation Addition: Steering Language Models Without Optimization	Activation manipulation, Model steering
Evaluating Large Language Models Trained on Code	Code generation, Model evaluation
To Believe or Not to Believe Your LLM	Hallucination detection, Uncertainty quantification
Mixture of Agents	Multi-agent systems, Prompting
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework	Efficiency, Model architecture
Deep Reinforcement Learning from Human Preferences	human feedback, Reinforcement learning
Federated Large Language Model: A Position Paper	Distributed training, Federated learning
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning	Music representation, Self-supervised learning
LLaMA: Open and Efficient Foundation Language Models	Model architecture, Open-source LLMs
Phi1: Textbooks Are All You Need	Curated datasets, Model efficiency
Layer Normalization	Model internals, Optimization
Attention Is All You Need	OG papers, Transformers
Explore the Limits of Omni-modal Pretraining at Scale	Multi-modal models, Pretraining
Gaussian Error Linear Units (GELUs)	Activation functions, Model internals
A Fast, Performant, Secure Distributed Training Framework For Large Language Model	Distributed training, Security
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Efficiency, Model distillation
Improving Language Understanding by Generative Pre-Training	OG papers, Pre-training
Training language models to follow instructions with human feedback	Instruction following, Reinforcement learning
Llama 2: Open Foundation and Fine-Tuned Chat Models	Fine-tuning, Open-source LLMs
RoFormer: Enhanced Transformer with Rotary Position Embedding	Embeddings, Model architecture
Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings \| Nature Machine Intelligence	Biological Brains
Understanding Human Cognition Through Computational Modeling - Hsiao - 2024 - Topics in Cognitive Science - Wiley Online Library	Biological Brains

References¶

Yu, M., Wang, D., Shan, Q., Reed, C., & Wan, A. (2024). The Super Weight in Large Language Models. arXiv. 10.48550/ARXIV.2411.07191
Manvi, R., Singh, A., & Ermon, S. (2024). Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation. arXiv. 10.48550/ARXIV.2410.02725
Wu, Z., Arora, A., Wang, Z., Geiger, A., Jurafsky, D., Manning, C. D., & Potts, C. (2024). ReFT: Representation Finetuning for Language Models. arXiv. 10.48550/ARXIV.2404.03592
Alshahwan, N., Harman, M., Marginean, A., Tal, R., & Wang, E. (2024). Observation-based unit test generation at Meta. arXiv. 10.48550/ARXIV.2402.06111
Lee, J., Dai, Z., Ren, X., Chen, B., Cer, D., Cole, J. R., Hui, K., Boratko, M., Kapadia, R., Ding, W., Luan, Y., Duddu, S. M. K., Abrego, G. H., Shi, W., Gupta, N., Kusupati, A., Jain, P., Jonnalagadda, S. R., Chang, M.-W., & Naim, I. (2024). Gecko: Versatile Text Embeddings Distilled from Large Language Models. arXiv. 10.48550/ARXIV.2403.20327

ML Notes

The Transformer

ML Notes

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models