Skip to article frontmatterSkip to article content

Papers

A collection of papers with summaries and quick access links.

Paper Notes

TitleTagsFull Notes
The Super Weight in Large Language ModelsModel internals
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
ReFT: Representation Finetuning for Language ModelsFine-tuning, Model representations
Dropout: A Simple Way to Prevent Neural Networks from OverfittingModel performance, Optimization, Model architecture
Observation-based unit test generation at MetaAutomated testing, Software engineering
Gecko: Versatile Text Embeddings Distilled from Large Language ModelsEmbeddings, Model distillation
Language Models of Code are Few-Shot Commonsense LearnersCode models, Transfer learning
Q-Sparse: All Large Language Models can be Fully Sparsely-ActivatedEfficiency, Model performance
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement LearningCode generation, Reinforcement learning
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference ModelsData pruning, PerplexityDetails
The Era of 1-bit LLMs: All Large Language Models are in 1.58 BitsHardware optimization, Model compression
The Curse of Recursion: Training on Generated Data Makes Models ForgetModel collapse, Training data
Chain of thought promptingPrompting strategies
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionAttention mechanisms, Context window
TextGradAgent systems, Text optimization
Scalable MatMul-free Language ModelingAttention mechanisms, Model efficiency
Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing?AI in software development, Future of coding
Scalable Extraction of Training Data from (Production) Language ModelsData accumulation, Model performance, Curated datasets
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?Dataset creation, Small language models
GAIA: A Benchmark for General AI AssistantsAI assistants, Benchmarking
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 SonnetFeature extraction, Interpretability
Mixture-of-Agents Enhances Large Language Model CapabilitiesModel performance, Multi-agent systems
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender SystemsTransformers, Model architecture, Model performance
Large Language Models Understand and Can Be Enhanced by Emotional Stimuliemotional stimuli, Model behavior
Automated Unit Test Improvement using Large Language Models at MetaAutomated testing, Software engineering
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic DataData accumulation, Model collapse prevention
A\ Search Without Expansions: Learning Heuristic Functions With Deep Q-NetworksReinforcement learning, Search algorithms
Large Language Models: A SurveyLLM capabilities, Survey
A Language Model’s Guide Through Latent SpaceInterpretability, Latent space
Extracting Latent Steering Vectors from Pretrained Language ModelsInterpretability, Latent space
Many-shot jailbreakingJailbreaking, Model safety
Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerTransfer learning
Activation Addition: Steering Language Models Without OptimizationActivation manipulation, Model steering
Evaluating Large Language Models Trained on CodeCode generation, Model evaluation
To Believe or Not to Believe Your LLMHallucination detection, Uncertainty quantification
Mixture of AgentsMulti-agent systems, Prompting
OpenELM: An Efficient Language Model Family with Open Training and Inference FrameworkEfficiency, Model architecture
Deep Reinforcement Learning from Human Preferenceshuman feedback, Reinforcement learning
Federated Large Language Model: A Position PaperDistributed training, Federated learning
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation LearningMusic representation, Self-supervised learning
LLaMA: Open and Efficient Foundation Language ModelsModel architecture, Open-source LLMs
Phi1: Textbooks Are All You NeedCurated datasets, Model efficiency
Layer NormalizationModel internals, Optimization
Attention Is All You NeedOG papers, Transformers
Explore the Limits of Omni-modal Pretraining at ScaleMulti-modal models, Pretraining
Gaussian Error Linear Units (GELUs)Activation functions, Model internals
A Fast, Performant, Secure Distributed Training Framework For Large Language ModelDistributed training, Security
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterEfficiency, Model distillation
Improving Language Understanding by Generative Pre-TrainingOG papers, Pre-training
Training language models to follow instructions with human feedbackInstruction following, Reinforcement learning
Llama 2: Open Foundation and Fine-Tuned Chat ModelsFine-tuning, Open-source LLMs
RoFormer: Enhanced Transformer with Rotary Position EmbeddingEmbeddings, Model architecture
Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings | Nature Machine IntelligenceBiological Brains
Understanding Human Cognition Through Computational Modeling - Hsiao - 2024 - Topics in Cognitive Science - Wiley Online LibraryBiological Brains
References
  1. Yu, M., Wang, D., Shan, Q., Reed, C., & Wan, A. (2024). The Super Weight in Large Language Models. arXiv. 10.48550/ARXIV.2411.07191
  2. Manvi, R., Singh, A., & Ermon, S. (2024). Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation. arXiv. 10.48550/ARXIV.2410.02725
  3. Wu, Z., Arora, A., Wang, Z., Geiger, A., Jurafsky, D., Manning, C. D., & Potts, C. (2024). ReFT: Representation Finetuning for Language Models. arXiv. 10.48550/ARXIV.2404.03592
  4. Alshahwan, N., Harman, M., Marginean, A., Tal, R., & Wang, E. (2024). Observation-based unit test generation at Meta. arXiv. 10.48550/ARXIV.2402.06111
  5. Lee, J., Dai, Z., Ren, X., Chen, B., Cer, D., Cole, J. R., Hui, K., Boratko, M., Kapadia, R., Ding, W., Luan, Y., Duddu, S. M. K., Abrego, G. H., Shi, W., Gupta, N., Kusupati, A., Jain, P., Jonnalagadda, S. R., Chang, M.-W., & Naim, I. (2024). Gecko: Versatile Text Embeddings Distilled from Large Language Models. arXiv. 10.48550/ARXIV.2403.20327