Paper ToDo List about Text2Image
已读和待读Paper列表
Paper ToDo List about Text2Image
Text2Image
📊 统计
- 总论文: 39篇
- 待读: 23篇
- 进行中: 0篇
- 已完成: 7篇
| ID | 状态 | 年份 | 收录日期 | 完成日期 | 论文标题 |
|---|---|---|---|---|---|
| 1 | ⏳ | 2023 | 2024-11-05 | - | Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack |
| 2 | ⏳ | 2024 | 2024-11-05 | - | NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation |
| 3 | ⏳ | 2024 | 2024-11-05 | - | ITERCOMP: ITERATIVE COMPOSITION-AWARE FEEDBACK LEARNING FROM MODEL GALLERY FOR TEXT-TO-IMAGE GENERATION |
| 4 | ⏳ | 2024 | 2024-11-05 | - | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs |
| 5 | ⏳ | 2024 | 2024-11-05 | - | SIMPLIFYING, STABILIZING & SCALING CONTINUOUS TIME CONSISTENCY MODELS |
| 6 | ⏳ | 2024 | 2024-11-05 | - | GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation |
| 7 | ⏳ | 2023 | 2024-11-05 | - | IN-CONTEXT LORA FOR DIFFUSION TRANSFORMERS |
| 8 | ⏳ | 2024 | 2024-11-05 | - | Training-free Regional Prompting for Diffusion Transformers |
| 9 | ⏳ | 2024 | 2024-11-05 | - | Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent |
| 10 | ⏳ | 2024 | 2024-11-05 | - | MagicQuill: An Intelligent Interactive Image Editing System |
| 11 | ⏳ | 2024 | 2024-11-05 | - | Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement |
| 12 | ⏳ | 2023 | 2024-11-05 | - | FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations |
| 13 | ⏳ | 2023 | 2024-11-05 | - | Generating Compositional Scenes via Text-to-image RGBA Instance Generation |
| 14 | ⏳ | 2024 | 2024-11-05 | - | Style-Friendly SNR Sampler for Style-Driven Generation |
| 15 | ⏳ | 2023 | 2024-11-05 | - | OminiControl: Minimal and Universal Control for Diffusion Transformer |
| 16 | ⏳ | 2024 | 2024-11-05 | - | Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator |
| 17 | ⏳ | 2024 | 2024-11-05 | - | DREAMRUNNER: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation |
| 18 | ⏳ | 2024 | 2024-11-05 | - | One Diffusion to Generate Them All |
| 19 | ⏳ | 2024 | 2024-11-05 | - | DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting |
| 20 | ⏳ | 2024 | 2024-11-05 | - | UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing |
| 21 | ⏳ | 2024 | 2024-11-05 | - | Diffusion Self-Distillation for Zero-Shot Customized Image Generation |
| 22 | ⏳ | 2024 | 2024-11-05 | - | X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models |
| 23 | ⏳ | 2024 | 2024-11-05 | - | SWITTI: Designing Scale-Wise Transformers for Text-to-Image Synthesis |
| 24 | ⏳ | 2023 | 2024-11-05 | - | AMOSampler: Enhancing Text Rendering with Overshooting |
| 25 | ⏳ | 2023 | 2024-11-05 | - | OmniCreator:Self-SupervisedUnifiedGenerationwithUniversalEditing |
| 26 | ✅ | 2024 | 2024-11-05 | 2024-11-18 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step |
| 27 | ✅ | 2023 | 2024-11-05 | 2024-11-20 | Emu1: Generative Pretraining in Multimodality |
| 28 | ✅ | 2023 | 2024-11-05 | 2024-11-20 | Emu2: Generative Multimodal Models are In-Context Learners |
| 29 | ✅ | 2024 | 2024-11-05 | 2024-11-20 | Emu3: Next-Token Prediction is All You Need |
| 30 | ✅ | 2024 | 2024-11-05 | 2024-12-03 | ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting |
| 31 | ✅ | 2024 | 2024-11-05 | 2024-12-04 | Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models |
| 32 | 📝 | 2024 | 2024-11-05 | - | QWEN2VL-FLUX: UNIFYING IMAGE AND TEXT GUIDANCE FOR CONTROLLABLE IMAGE GENERATION |
| 33 | ⏳ | 2024 | 2024-11-05 | - | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics |
| 34 | ✅ | 2024 | 2024-12-12 | 2024-12-13 | DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation |
| 35 | ⏳ | 2024 | 2024-12-12 | - | FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models |
| 36 | ⏳ | 2024 | 2024-12-13 | - | Learning Flow Fields in Attention for Controllable Person Image Generation |
| 37 | ⏳ | 2024 | 2024-12-13 | - | StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements |
| 38 | ⏳ | 2024 | 2024-12-13 | - | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM |
| 39 | ⏳ | 2024 | 2024-12-17 | - | SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding |
| 40 | ⏳ | 2024 | 2024-12-17 | - | SEED-Story: Multimodal Long Story Generation with Large Language Model |
| 41 | ⏳ | 2024 | 2024-12-30 | - | From Elements to Design: A Layered Approach for Automatic Graphic Design Composition |
| 42 | ⏳ | 2024 | 2024-12-31 | - | 1.58-bit FLUX: A New Paradigm for Efficient Image Generation |
| 43 | ⏳ | 2024 | 2025-01-04 | - | Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis |
图例说明:
- ⏳ 待读
- 📝 进行中
- ✅ 已完成
This post is licensed under CC BY 4.0 by the author.