Paper ToDo List about LLM

LLM相关已读和待读Paper列表

Posted Dec 4, 2024

By YK LI

14 min read

值得精读的论文

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
Distilling System 2 into System 1

论文分类统计

📊 总体统计

总论文: 100篇
待读: 90篇
进行中: 0篇
已完成: 10篇

📊 分类统计

推理与思维链 (Reasoning & Chain of Thought): 15篇
模型架构与优化 (Architecture & Optimization): 14篇
训练方法 (Training Methods): 15篇
安全与对齐 (Safety & Alignment): 9篇
应用场景 (Applications): 18篇
评估与分析 (Evaluation & Analysis): 27篇

综述

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ⏳ | 2025 | 2025-01-16 | - | A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers |

推理与思维链 (Reasoning & Chain of Thought)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ✅ | 2022 | 2024-11-05 | 2024-11-28 | CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | | 2 | ✅ | 2022 | 2024-11-05 | 2024-11-28 | CoT-SC: Self-Consistency Improves Chain of Thought Reasoning in Language Models | | 3 | ✅ | 2023 | 2024-11-05 | 2024-11-28 | ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models | | 4 | ✅ | 2023 | 2024-11-05 | 2024-11-28 | GoT: Graph of Thoughts: Solving Elaborate Problems with Large Language Models | | 5 | ✅ | 2023 | 2024-11-05 | 2024-11-28 | XoT: Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation | | 6 | ✅ | 2024 | 2024-11-05 | 2024-11-28 | DoT: On the Diagram of Thought | | 7 | ✅ | 2024 | 2024-12-05 | 2025-01-07 | Chain-of-Thought Reasoning without Prompting | | 8 | ✅ | 2024 | 2024-12-09 | 2025-01-10 | Reverse Thinking Makes LLMs Stronger Reasoners | | 9 | ⏳ | 2024 | 2024-12-09 | - | Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models | | 10 | ✅ | 2024 | 2024-12-11 | 2025-01-10 | Training Large Language Models to Reason in a Continuous Latent Space | | 11 | ✅ | 2024 | 2024-12-19 | 2025-01-10 | Are Your LLMs Capable of Stable Reasoning? | | 12 | ⏳ | 2024 | 2024-12-19 | - | Compressed Chain of Thought: Efficient Reasoning Through Dense Representations | | 13 | ✅ | 2024 | 2025-01-03 | 2025-01-10 | Let’s verify step by step | | 14 | ✅ | 2024 | 2025-01-07 | 2025-01-13 | Test-time Computing: from System-1 Thinking to System-2 Thinking | | 15 | ✅ | 2024 | 2025-01-08 | 2025-01-13 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | | 16 | ⏳ | 2024 | 2025-01-09 | - | Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought | | 17 | ⏳ | 2024 | 2025-01-09 | - | Star-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | | 18 | ⏳ | 2024 | 2025-01-14 | - | Enabling Scalable Oversight via Self-Evolving Critic | | 19 | ⏳ | 2024 | 2025-01-14 | - | Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | | 20 | ⏳ | 2024 | 2025-01-14 | - | Demystifying Domain-adaptive Post-training for Financial LLMs | | 21 | ⏳ | 2024 | 2025-01-17 | - | Titans: Learning to Memorize at Test Time | | 22 | ⏳ | 2025 | 2025-01-23 | - | Evolving Deeper LLM Thinking |

模型架构与优化 (Architecture & Optimization)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ✅ | 2024 | 2024-12-05 | 2025-01-07 | [Cut Your Losses in Large-VOCABULARY Language Models] | | 2 | ⏳ | 2024 | 2024-12-05 | - | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | | 3 | ⏳ | 2024 | 2024-12-05 | - | Xmodel-1.5: An 1B-scale Multilingual LLM | | 4 | ⏳ | 2024 | 2024-12-05 | - | Densing Law of LLMs | | 5 | ⏳ | 2024 | 2024-12-09 | - | EXAONE 3.5: Series of Large Language Models for Real-world Use Cases | | 6 | ⏳ | 2024 | 2024-12-11 | - | Fully Open Source Moxin-7B Technical Report | | 7 | ⏳ | 2024 | 2024-12-16 | - | Byte Latent Transformer: Patches Scale Better Than Tokens | | 8 | ⏳ | 2024 | 2024-12-17 | - | Hugging Face - Scaling test time compute with open Models | | 9 | ⏳ | 2024 | 2024-12-17 | - | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | | 10 | ⏳ | 2024 | 2024-12-20 | - | Qwen2.5 Technical Report | | 11 | ⏳ | 2024 | 2024-12-25 | - | Revisiting In-Context Learning with Long Context Language Models | | 12 | ⏳ | 2024 | 2024-12-26 | - | Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization | | 13 | ⏳ | 2024 | 2024-12-30 | - | Introducing DeepSeek-V3! | | 14 | ⏳ | 2024 | 2025-01-07 | - | Metadata Conditioning Accelerates Language Model Pre-training | | 15 | ⏳ | 2024 | 2025-01-15 | - | MiniMax-01: Scaling Foundation Models with Lightning Attention | | 16 | ⏳ | 2024 | 2025-01-16 | - | MiniCPM-o 2.6: A Large-scale Chinese Pre-trained Model | | 17 | ⏳ | 2024 | 2025-01-23 | - | DeepSeek-R1: A Large-scale Chinese Pre-trained Model |

训练方法 (Training Methods)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ⏳ | 2024 | 2024-12-05 | - | DELIFT: DATA EFFICIENT LANGUAGE MODEL IN STRUCTION FINE-TUNING | | 2 | ⏳ | 2024 | 2024-12-05 | - | RedPajama: anOpenDataset for Training Large Language Models | | 3 | ⏳ | 2024 | 2024-12-05 | - | Distilling System 2 into System 1 | | 4 | ⏳ | 2024 | 2024-12-05 | - | Memory-Efficient Fine-Tuning of Transformers via Token Selection | | 5 | ⏳ | 2024 | 2024-12-05 | - | Pretraining Data Detection for Large Language Models: ADivergence-based Calibration Method | | 6 | ⏳ | 2024 | 2024-12-17 | - | SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | | 7 | ⏳ | 2024 | 2024-12-17 | - | GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers | | 8 | ⏳ | 2024 | 2024-12-20 | - | How to Synthesize Text Data without Model Collapse? | | 9 | ⏳ | 2024 | 2024-12-20 | - | Offline Reinforcement Learning for LLM Multi-Step Reasoning | | 10 | ⏳ | 2024 | 2024-12-25 | - | RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response | | 11 | ⏳ | 2024 | 2024-12-25 | - | OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning | | 12 | ⏳ | 2024 | 2024-12-30 | - | Reinforcement Learning Overview | | 13 | ⏳ | 2024 | 2024-12-31 | - | Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging | | 14 | ⏳ | 2024 | 2025-01-02 | - | Reinforcement Fine-Tuning for Large Language Models | | 15 | ⏳ | 2024 | 2025-01-07 | - | Process Reinforcement through Implicit Rewards | | 16 | ⏳ | 2024 | 2025-01-15 | - | Transformer2: Self-adaptive LLMs |

数据集

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ⏳ | 2025 | 2025-01-15 | - | OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training |

安全与对齐 (Safety & Alignment)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ✅ | 2022 | 2024-12-05 | 2024-12-30 | Constitutional AI: Harmlessness from AI Feedback | | 2 | ✅ | 2024 | 2024-12-05 | 2025-01-07 | Evaluating the role of ‘Constitutions’ for learning from AI feedback | | 3 | ✅ | 2024 | 2024-12-05 | 2025-01-10 | DIRECT PREFERENCE OPTIMIZATION USING SPARSE FEATURE-LEVEL CONSTRAINTS | | 4 | ⏳ | 2024 | 2024-12-05 | - | DO I KNOW THIS ENTITY? KNOWLEDGE AWARENESS AND HALLUCINATIONS IN LANGUAGE MODELS | | 5 | ⏳ | 2024 | 2024-12-05 | - | First-Person Fairness in Chatbots | | 6 | ⏳ | 2024 | 2024-12-20 | - | LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps | | 7 | ⏳ | 2024 | 2024-12-25 | - | OpenAI o1 System Card | | 8 | ⏳ | 2024 | 2024-12-25 | - | NILE: Internal Consistency Alignment in Large Language Models | | 9 | ⏳ | 2024 | 2025-01-07 | - | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models |

应用场景 (Applications)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ⏳ | 2024 | 2024-12-05 | - | From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond | | 2 | ⏳ | 2024 | 2024-12-05 | - | LongKey: Keyphrase Extraction for Long Documents | | 3 | ⏳ | 2024 | 2024-12-05 | - | Are Large Language Models Capable of Generating Human-Level Narratives? | | 4 | ⏳ | 2024 | 2024-12-05 | - | Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs | | 5 | ⏳ | 2024 | 2024-12-05 | - | Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents | | 6 | ⏳ | 2024 | 2024-12-05 | - | To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning | | 7 | ⏳ | 2024 | 2024-12-05 | - | Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation | | 8 | ⏳ | 2024 | 2024-12-05 | - | Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation | | 9 | ⏳ | 2024 | 2024-12-05 | - | OPTIMA: OPTIMIZING EFFECTIVENESS AND EFFICIENCY FOR LLM-BASED MULTI-AGENT SYSTEM | | 10 | ⏳ | 2024 | 2024-12-05 | - | INCHARACTER: Evaluating Personality Fidelity in Role-Playing Agents | | 11 | ⏳ | 2024 | 2024-12-05 | - | IBSEN: Director-Actor Agent Collaboration for Interactive Drama Script Generation | | 12 | ⏳ | 2024 | 2024-12-05 | - | Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | | 13 | ⏳ | 2024 | 2024-12-20 | - | Multi-LLM Text Summarization | | 14 | ⏳ | 2024 | 2024-12-25 | - | Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding | | 15 | ⏳ | 2024 | 2024-12-30 | - | DRT-o1 - applies long chain-of-thought reasoning to machine translation | | 16 | ⏳ | 2024 | 2024-12-31 | - | HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs | | 17 | ⏳ | 2024 | 2025-01-07 | - | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | | 18 | ⏳ | 2024 | 2025-01-07 | - | Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization |

评估与分析 (Evaluation & Analysis)

| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 | |—|–|—|–|—|—| | 1 | ⏳ | 2024 | 2024-12-05 | - | Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions | | 2 | ⏳ | 2024 | 2024-12-05 | - | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | | 3 | ⏳ | 2024 | 2024-12-05 | - | O1Replication Journey– Part 2: Surpassing O1-preview through Simple Distillation Big Progress or Bitter Lesson? | | 4 | ⏳ | 2024 | 2024-12-05 | - | NATURAL LANGUAGE REINFORCEMENT LEARNING | | 5 | ⏳ | 2024 | 2024-12-05 | - | Survey of User Interface Design and Interaction Techniques in Generative AI Applications | | 6 | ⏳ | 2024 | 2024-12-05 | - | Learning to Retrieve Iteratively for In-Context Learning | | 7 | ⏳ | 2024 | 2024-12-05 | - | Large Language Models Can Self-Improve in Long-context Reasoning | | 8 | ⏳ | 2024 | 2024-12-05 | - | Measuring Psychological Depth in Language Models | | 9 | ⏳ | 2023 | 2024-12-05 | - | The Bitter Lesson | | 10 | ⏳ | 2024 | 2024-12-05 | - | GAIA: A Benchmark for General AI Assistants | | 11 | ⏳ | 2024 | 2024-12-05 | - | Evaluation of OpenAI o1: Opportunities and Challenges of AGI | | 12 | ⏳ | 2024 | 2024-12-05 | - | Scalable and Domain-General Abstractive Proposition Segmentation | | 13 | ⏳ | 2024 | 2024-12-06 | - | OpenAI o1 System Card | | 14 | ⏳ | 2024 | 2024-12-11 | - | Frame Representation Hypothesis: Multi-Token LLM Interpretability | | 15 | ⏳ | 2019 | 2024-12-11 | - | Implicit Generation and Generalization in Energy-Based Models | | 16 | ⏳ | 2024 | 2024-12-16 | - | Large Action Models: From Inception to Implementation | | 17 | ⏳ | 2024 | 2024-12-19 | - | Emergence of Abstractions: Concept Encoding and Decoding Mechanism | | 18 | ⏳ | 2024 | 2024-12-25 | - | B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners | | 19 | ⏳ | 2024 | 2024-12-25 | - | Outcome-Refining Process Supervision for Code Generation | | 20 | ⏳ | 2024 | 2024-12-25 | - | LearnLM: Improving Gemini for Learning | | 21 | ⏳ | 2024 | 2024-12-25 | - | DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought | | 22 | ⏳ | 2024 | 2024-12-25 | - | ResearchTown: Simulator of Human Research Community | | 23 | ⏳ | 2024 | 2024-12-26 | - | In Case You Missed It: ARC ‘Challenge’ Is Not That Challenging | | 24 | ⏳ | 2024 | 2024-12-26 | - | Ensembling Large Language Models with Process Reward-Guided Tree Search | | 25 | ⏳ | 2024 | 2024-12-27 | - | Token-Budget-Aware LLM Reasoning | | 26 | ⏳ | 2024 | 2024-12-30 | - | Large Language Models can Learn Rules | | 27 | ⏳ | 2024 | 2025-01-04 | - | ProgCo: Program Helps Self-Correction of Large Language Models |

图例说明：

⏳ 待读
📝 进行中
✅ 已完成

Reference

Awesome-Story-Generation

Reading, Paper

This post is licensed under CC BY 4.0 by the author.