🦙PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Introduction

What is in PanoLlama:

New Paradigm: We define a new paradigm for PIG, modeling it as a next-token prediction task to better solve the multilevel coherence challenge.

New Strategy: Based on token redirection, we develop a training-free next-crop prediction strategy that enables endless PIG with existing VAR models. Compared to current methods with complex designs, PanoLlama offers a more straightforward and efficient framework, achieving SOTA performance in coherence (47.50%), fidelity & diversity (28.16%), and aesthetics (15%).

Additional Applications: Beyond basic panorama generation, we support applications other PIG methods cannot achieve, including multi-scale generation, mask-free layout control, and multi-guidance synthesis.

New Benchmark: Given the lack of a standardized testing prompt in prior PIG works, which typically rely on 5-20 specific ones, we construct a dataset of 1,000 detailed prompts across 100+ themes. Along with a comprehensive set of baselines and metrics, this establishes a new benchmark for panorama generation.

For more details, please visit our paper page.

Get Started

Configuration Set up and configure the environment by installing the required packages:

pip install -r requirements.txt

Pre-trained Models Download pre-trained models $\Phi$ from LlamaGen, and place them in the folder /models under the corresponding modules:

module	model	params	tokens	weight
text encoder	FLAN-T5-XL	3B	/	flan-t5-xl
image tokenizer	VQVAE	72M	16x16	vq_ds16_t2i.pt
token generator	LlamaGen-XL	775M	32x32	t2i_XL_stage2_512.pt

Generation We support panorama expansion in vertical, horizontal, and both directions. Try the following command to generate a horizontal one:

python -m token_generator.sample \
    --seed -1 \
    --times 12 \
    --addit-cols 24 \
    --lam 1 \
    --gen-mode h \
    --n 1

Citation

If you find our work helpful, please consider citing:

@article{zhou2024panollama,
  title={PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs},
  author={Zhou, Teng and Zhang, Xiaoyu and Tang, Yongchuan},
  journal={arXiv preprint arXiv:2411.15867},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
image_tokenizer		image_tokenizer
text_encoder		text_encoder
token_generator		token_generator
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Introduction

Get Started

Citation

About

Releases

Packages

Contributors 2

Languages

0606zt/PanoLlama

Folders and files

Latest commit

History

Repository files navigation

🦙PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Introduction

Get Started

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages