Parallel Jacobi Decoding for Fast Autoregressive Image Generation

CVPR 2026
Westlake University
*Corresponding author: wanghuan [at] westlake [dot] edu [dot] cn
PJD Tesear

Images generated by Lumina-mGPT using vanilla autoregressive decoding (left) and our Parallel Jacobi Decoding (right). Our method reduces the required autoregressive steps by up to 6.8× while maintaining visual fidelity.

Abstract

Autoregressive (AR) models have demonstrated remarkable performance in generating high-fidelity images. However, their inherently sequential next-token prediction leads to significantly slower inference. Recent studies have introduced Jacobi-style decoding to accelerate autoregressive image generation. Extending the draft sequence initially improves efficiency, yet the acceleration quickly saturates as error propagation in the one-dimensional sequence hinders convergence. Observing that images exhibit strong local spatial correlations, we propose Parallel Jacobi Decoding (PJD), a training-free decoding approach that expands draft tokens in the two-dimensional spatial domain to enable efficient spatially parallel refinement. PJD adjusts the attention mask to mitigate error accumulation and improve convergence stability. Extensive experiments on diverse datasets show that PJD achieves 4.8×–6.4× acceleration across multiple autoregressive image generation models while maintaining competitive generation quality.

Method Overview

PJD Overview
An illustration of one PJD iteration. (Left) Three rows become simultaneously active, each initializing three draft tokens. (Middle) All active rows are processed in one forward pass of the autoregressive transformer, followed by row-wise validation: accepted tokens are committed, while rejected ones are reused as the initial drafts for the next iteration. (Right) Each row’s sliding window advances after validation.

Main Results

Qualitative Results

Qualitative-Lumina
Qualitative comparisons of 768 × 768 image generation with Lumina-mGPT across different methods. Our approach accelerates image generation by significantly reducing the number of steps, while maintaining the same level of image quality.
Qualitative-Llama
Qualitative comparison of 512 × 512 image generation results on LlamaGen-XL using four decoding strategies: Vanilla AR, SJD, GSD, and our PJD method. Across all prompts, our approach achieves the fastest generation with the fewest sampling steps.
Qualitative-Janus
Qualitative comparisons of 384 × 384 image generation on Janus-Pro across multiple prompts. For each pair, the left image is generated by Vanilla AR and the right image is generated by our method. Our approach significantly reduces the number of sampling steps while preserving comparable image quality.

BibTeX


        @inproceedings{liao2026parallel,
          title={Parallel Jacobi Decoding for Fast Autoregressive Image Generation},
          author={Liao, Boya and Li, Ying and Jian, Siyong and Wang, Huan},
          booktitle={CVPR},
          year={2026}
        }