| Title | Tags | Full Notes |
|---|
| The Super Weight in Large Language Models | Model internals | |
| Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | | |
| ReFT: Representation Finetuning for Language Models | Fine-tuning, Model representations | |
| Dropout: A Simple Way to Prevent Neural Networks from Overfitting | Model performance, Optimization, Model architecture | |
| Observation-based unit test generation at Meta | Automated testing, Software engineering | |
| Gecko: Versatile Text Embeddings Distilled from Large Language Models | Embeddings, Model distillation | |
| Language Models of Code are Few-Shot Commonsense Learners | Code models, Transfer learning | |
| Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Efficiency, Model performance | |
| CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning | Code generation, Reinforcement learning | |
| Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models | Data pruning, Perplexity | Details |
| The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | Hardware optimization, Model compression | |
| The Curse of Recursion: Training on Generated Data Makes Models Forget | Model collapse, Training data | |
| Chain of thought prompting | Prompting strategies | |
| Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Attention mechanisms, Context window | |
| TextGrad | Agent systems, Text optimization | |
| Scalable MatMul-free Language Modeling | Attention mechanisms, Model efficiency | |
| Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing? | AI in software development, Future of coding | |
| Scalable Extraction of Training Data from (Production) Language Models | Data accumulation, Model performance, Curated datasets | |
| TinyStories: How Small Can Language Models Be and Still Speak Coherent English? | Dataset creation, Small language models | |
| GAIA: A Benchmark for General AI Assistants | AI assistants, Benchmarking | |
| Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet | Feature extraction, Interpretability | |
| Mixture-of-Agents Enhances Large Language Model Capabilities | Model performance, Multi-agent systems | |
| Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems | Transformers, Model architecture, Model performance | |
| Large Language Models Understand and Can Be Enhanced by Emotional Stimuli | emotional stimuli, Model behavior | |
| Automated Unit Test Improvement using Large Language Models at Meta | Automated testing, Software engineering | |
| Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data | Data accumulation, Model collapse prevention | |
| A\ Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks | Reinforcement learning, Search algorithms | |
| Large Language Models: A Survey | LLM capabilities, Survey | |
| A Language Model’s Guide Through Latent Space | Interpretability, Latent space | |
| Extracting Latent Steering Vectors from Pretrained Language Models | Interpretability, Latent space | |
| Many-shot jailbreaking | Jailbreaking, Model safety | |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Transfer learning | |
| Activation Addition: Steering Language Models Without Optimization | Activation manipulation, Model steering | |
| Evaluating Large Language Models Trained on Code | Code generation, Model evaluation | |
| To Believe or Not to Believe Your LLM | Hallucination detection, Uncertainty quantification | |
| Mixture of Agents | Multi-agent systems, Prompting | |
| OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Efficiency, Model architecture | |
| Deep Reinforcement Learning from Human Preferences | human feedback, Reinforcement learning | |
| Federated Large Language Model: A Position Paper | Distributed training, Federated learning | |
| MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning | Music representation, Self-supervised learning | |
| LLaMA: Open and Efficient Foundation Language Models | Model architecture, Open-source LLMs | |
| Phi1: Textbooks Are All You Need | Curated datasets, Model efficiency | |
| Layer Normalization | Model internals, Optimization | |
| Attention Is All You Need | OG papers, Transformers | |
| Explore the Limits of Omni-modal Pretraining at Scale | Multi-modal models, Pretraining | |
| Gaussian Error Linear Units (GELUs) | Activation functions, Model internals | |
| A Fast, Performant, Secure Distributed Training Framework For Large Language Model | Distributed training, Security | |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | Efficiency, Model distillation | |
| Improving Language Understanding by Generative Pre-Training | OG papers, Pre-training | |
| Training language models to follow instructions with human feedback | Instruction following, Reinforcement learning | |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Fine-tuning, Open-source LLMs | |
| RoFormer: Enhanced Transformer with Rotary Position Embedding | Embeddings, Model architecture | |
| Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings | Nature Machine Intelligence | Biological Brains | |
| Understanding Human Cognition Through Computational Modeling - Hsiao - 2024 - Topics in Cognitive Science - Wiley Online Library | Biological Brains | |