logo
0
0
Login

SpatialGen

SpatialLM SpatialLM

GitHub Hugging Face
Image-to-Scene ResultsText-to-Scene Results
Img2SceneText2Scene

SpatialGen produces multi-view, multi-modal information from a semantic layout using a multi-view, multi-modal diffusion model.

✨ News

  • [Aug, 2025] Initial release of SpatialGen-1.0!

📋 Release Plan

  • Provide inference code of SpatialGen.
  • Provide training instruction for SpatialGen.
  • Release SpatialGen dataset.

SpatialGen Models

ModelDownload
SpatialGen-1.0🤗 HuggingFace
FLUX.1-Layout-ControlNet🤗 HuggingFace

Usage

🔧 Installation

Tested with the following environment:

  • Python 3.10
  • PyTorch 2.3.1
  • CUDA Version 12.1
# clone the repository git clone https://github.com/manycore-research/SpatialGen.git cd SpatialGen python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392) pip install nvidia-cublas-cu12==12.4.5.8

📊 Dataset

We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

Inference

# Single image-to-3D Scene bash scripts/infer_spatialgen_i2s.sh # Text-to-image-to-3D Scene bash scripts/infer_spatialgen_t2s.sh

License

SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License. FLUX.1-Layout-ControlNet is licensed under the FLUX.1-dev Non-Commercial License.

Acknowledgements

We would like to thank the following projects that made this work possible:

DiffSplat | SD 2.1 | TAESD | FLUX | SpatialLM

About

No description, topics, or website provided.
38.44 MiB
0 forks0 stars1 branches0 TagREADMEMIT license
Language
Python98.4%
gitignore0.6%
Markdown0.4%
Shell0.3%
Others0.3%