2024必读的100篇生成式AI论文清单 - 文章 - 开发者社区

2024年真是生成式人工智能研究大放异彩的一年！最让我们惊讶的是，整个领域的焦点发生了翻天覆地的变化。尤其是在 2023 年和 2024 年，情况开始变得截然不同，由于大模型模型已经能够做很多事情，因此也更加关注应用层面的研究。

论文集合地址： https://github.com/aishwaryanr/awesome-generative-ai-guide

picture.image 论文合集的分类框架如上图所示，把AI研究想象成一个从输入到输出的系统，就像实际部署的场景一样。这个框架分为几层，每层都有其独特的关注点：

输入层： 这是大模型应用的起点，聚焦于输入处理和提示工程的研究。通过巧妙调整输入数据的方式，我们可以让大型语言模型（LLM）输出更优质的结果。

数据/模型层： 这一层关注的是模型的“燃料”和“引擎”。研究内容包括提升数据质量、生成合成数据，确保模型在丰富多样的数据集上训练。此外，还有基础架构的创新，比如新模型架构、多模态能力（融合文本、图像等）、成本与尺寸优化、模型对齐以及扩展上下文长度等。

应用层： 研究如何将LLM应用于现实世界。无论是特定领域的模型（如代码生成、文本转SQL或医疗应用），还是微调、检索增强生成（RAG）和多智能体系统等技术，这一层都是将理论转化为实用工具的关键。

输出层： 如何确保模型的输出靠谱？这一层的研究集中在评估方法上，从人机交互系统到基准测试和LLM评委，提供了多种有效评估AI输出的手段。

挑战： 生成式AI的局限性：对抗性攻击、模型可解释性、幻觉问题等，这些都是我们需要克服的现实挑战，以确保AI更安全、更可靠。

输入层

提示工程

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
The Prompt Report: A Systematic Survey of Prompting Techniques

数据模型层

1. 数据质量/合成数据生成

On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Detecting Pretraining Data from Large Language Models
A Survey on Data Synthesis and Augmentation for Large Language Models
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts

2. 基座大模型

DeepSeek-V3 Technical Report
xLSTM: Extended Long Short-Term Memory
Sparks of Artificial General Intelligence: Early Experiments with GPT-4
A Survey of Large Language Models
SAM 2: Segment Anything in Images and Videos
Qwen Technical Report
RWKV: Reinventing RNNs for the Transformer Era
KAN: Kolmogorov-Arnold Networks
The Llama 3 Herd of Models
Segment Anything
Differential Transformer
Foundation Models for Music: A Survey

3. 模型优化 (大小, 成本)

A Survey of Small Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
A Survey on LLM Inference-Time Self-Improvement
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
LLM Pruning and Distillation in Practice: The Minitron Approach
What is the Role of Small Models in the LLM Era: A Survey

4. 多模态

Towards Generalist Biomedical AI
MusicLM: Generating Music From Text
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Multimodal Chain-of-Thought Reasoning in Language Models

5. 大模型对齐

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
The Capacity for Moral Self-Correction in Large Language Models
sDPO: Don't Use Your Data All at Once

6. 长上下文

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
YaRN: Efficient Context Window Extension of Large Language Models
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
LongNet: Scaling Transformers to 1,000,000,000 Tokens

应用层

1.领域模型

Qwen2.5-Coder Technical Report
A Survey of Large Language Models for Healthcare: From Data, Technology, and Applications to Accountability and Ethics
ChemCrow: Augmenting Large-Language Models with Chemistry Tools
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
A Survey on Language Models for Code
PMC-LLaMA: Towards Building Open-Source Language Models for Medicine
ChemLLM: A Chemical Large Language Model
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Can Large Language Models Unlock Novel Scientific Research Ideas?

2. RAG

Corrective Retrieval Augmented Generation
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Active Retrieval Augmented Generation
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Retrieval-Augmented Generation for Large Language Models: A Survey
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Searching for Best Practices in Retrieval-Augmented Generation
Seven Failure Points When Engineering a Retrieval Augmented Generation System

3. 智能体

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Large Language Model-Brained GUI Agents: A Survey
A Survey on Large Language Model based Autonomous Agents
Augmented Language Models: a Survey
A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents
Toolformer: Language Models Can Teach Themselves to Use Tools

4. 多智能体

Emergent Autonomous Scientific Research Capabilities of Large Language Models
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
AIOS: LLM Agent Operating System
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Large Language Model-Based Agents for Software Engineering: A Survey

5. 大模型微调

Instruction Tuning with GPT-4
LLMs + Persona-Plug = Personalized LLMs
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
QLoRA: Efficient Finetuning of Quantized LLMs
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
LoRA+: Efficient Low Rank Adaptation of Large Models
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL
A Survey on Employing Large Language Models for Text-to-SQL Tasks
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

输出层

大模型评估

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
RAGEval: Scenario-Specific RAG Evaluation Dataset Generation Framework
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
A Survey on LLM-as-a-Judge
AgentBench: Evaluating LLMs as Agents
A Survey on Evaluation of Large Language Models
Self-Taught Evaluators
PromptBench: A Unified Library for Evaluation of Large Language Models
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B
Evaluating Large Language Models: A Comprehensive Survey
Mathematical Capabilities of ChatGPT

挑战

生成式AI的局限性

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
A Survey on Hallucination in Large Vision-Language Models
A Survey of Hallucination in Large Foundation Models
Chain-of-Verification Reduces Hallucination in Large Language Models
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
Knowledge Conflicts for LLMs: A Survey

点击阅读原文，可以访问带有论文链接的文章

picture.image

添加微信，回复”RAG“进入交流群

picture.image