Title | Tags | Full Notes |
---|
The Super Weight in Large Language Models | Model internals | |
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | | |
ReFT: Representation Finetuning for Language Models | Fine-tuning, Model representations | |
Dropout: A Simple Way to Prevent Neural Networks from Overfitting | Model performance, Optimization, Model architecture | |
Observation-based unit test generation at Meta | Automated testing, Software engineering | |
Gecko: Versatile Text Embeddings Distilled from Large Language Models | Embeddings, Model distillation | |
Language Models of Code are Few-Shot Commonsense Learners | Code models, Transfer learning | |
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Efficiency, Model performance | |
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning | Code generation, Reinforcement learning | |
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models | Data pruning, Perplexity | Details |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | Hardware optimization, Model compression | |
The Curse of Recursion: Training on Generated Data Makes Models Forget | Model collapse, Training data | |
Chain of thought prompting | Prompting strategies | |
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Attention mechanisms, Context window | |
TextGrad | Agent systems, Text optimization | |
Scalable MatMul-free Language Modeling | Attention mechanisms, Model efficiency | |
Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing? | AI in software development, Future of coding | |
Scalable Extraction of Training Data from (Production) Language Models | Data accumulation, Model performance, Curated datasets | |
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? | Dataset creation, Small language models | |
GAIA: A Benchmark for General AI Assistants | AI assistants, Benchmarking | |
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet | Feature extraction, Interpretability | |
Mixture-of-Agents Enhances Large Language Model Capabilities | Model performance, Multi-agent systems | |
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems | Transformers, Model architecture, Model performance | |
Large Language Models Understand and Can Be Enhanced by Emotional Stimuli | emotional stimuli, Model behavior | |
Automated Unit Test Improvement using Large Language Models at Meta | Automated testing, Software engineering | |
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data | Data accumulation, Model collapse prevention | |
A\ Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks | Reinforcement learning, Search algorithms | |
Large Language Models: A Survey | LLM capabilities, Survey | |
A Language Model’s Guide Through Latent Space | Interpretability, Latent space | |
Extracting Latent Steering Vectors from Pretrained Language Models | Interpretability, Latent space | |
Many-shot jailbreaking | Jailbreaking, Model safety | |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Transfer learning | |
Activation Addition: Steering Language Models Without Optimization | Activation manipulation, Model steering | |
Evaluating Large Language Models Trained on Code | Code generation, Model evaluation | |
To Believe or Not to Believe Your LLM | Hallucination detection, Uncertainty quantification | |
Mixture of Agents | Multi-agent systems, Prompting | |
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Efficiency, Model architecture | |
Deep Reinforcement Learning from Human Preferences | human feedback, Reinforcement learning | |
Federated Large Language Model: A Position Paper | Distributed training, Federated learning | |
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning | Music representation, Self-supervised learning | |
LLaMA: Open and Efficient Foundation Language Models | Model architecture, Open-source LLMs | |
Phi1: Textbooks Are All You Need | Curated datasets, Model efficiency | |
Layer Normalization | Model internals, Optimization | |
Attention Is All You Need | OG papers, Transformers | |
Explore the Limits of Omni-modal Pretraining at Scale | Multi-modal models, Pretraining | |
Gaussian Error Linear Units (GELUs) | Activation functions, Model internals | |
A Fast, Performant, Secure Distributed Training Framework For Large Language Model | Distributed training, Security | |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | Efficiency, Model distillation | |
Improving Language Understanding by Generative Pre-Training | OG papers, Pre-training | |
Training language models to follow instructions with human feedback | Instruction following, Reinforcement learning | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | Fine-tuning, Open-source LLMs | |
RoFormer: Enhanced Transformer with Rotary Position Embedding | Embeddings, Model architecture | |
Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings | Nature Machine Intelligence | Biological Brains | |
Understanding Human Cognition Through Computational Modeling - Hsiao - 2024 - Topics in Cognitive Science - Wiley Online Library | Biological Brains | |