TripoSplat converts a single 2D image into high-quality 3D Gaussians. Because 3D Gaussian rendering is already supported by many mainstream rendering and game engines, TripoSplat serves as a powerful pipeline tool for asset creation, AR/VR, game development, simulation environment, and beyond.
Where Current 3D Gaussian Generators Fall Short
The 3D Gaussian Splatting (3DGS) technique emerged from the domain of novel view synthesis. In a traditional setup, users take photos of an object or scene from hundreds of different viewpoints and run a 3D reconstruction algorithm to fit a 3D representation that can be viewed from any new angle. 3DGS quickly came to dominate this space for two key reasons: it offers much higher flexibility for exhibiting intricate visual details than traditional polygon meshes, and it excels in fitting and rendering efficiency compared to neural representations like Neural Radiance Fields (NeRFs). However, these desirable properties do not come from the Gaussian representation alone; they are largely driven by a sophisticated optimization (fitting) algorithm. A critical part of this process is determining how Gaussian particles are allocated in 3D space to achieve optimal rendering quality—a process known as density control.
In the original Gaussian Splatting paper, density control is achieved via two main operations: splitting and pruning. This ensures that the optimized asset contains a variable number of 3D Gaussians depending on the complexity of the scene. Complex areas receive a high concentration of dense particles, while flat, simple areas use far fewer.
Things change dramatically when moving from optimization to generative modeling. Diffusion and flow-matching models, which have seen massive success in image and video generation, rely on data of known, fixed lengths and perform highly vectorized computations on modern Transformer architectures. Both constraints make it incredibly difficult to apply traditional splitting and pruning operations during generative training.
Current 3D Gaussian generators bypass this by binding Gaussians to structural elements whose dimensions can be predetermined—resulting in pixel-aligned or structure-aligned Gaussians. While widely adopted, this lack of truly adaptive density control severely bottlenecks the representational power of generated 3D Gaussians.
Introducing Density-Sampled Gaussians (DeG)
TripoSplat seamlessly integrates adaptive density control into the generative modeling pipeline. The core intuition is to model an underlying 3D density distribution and optimize it directly via rendering supervision. To achieve this, we introduce a novel representation: Density-Sampled Gaussians (DeG). Instead of predicting explicit coordinates, TripoSplat samples 3D Gaussian centers from a hierarchical, octree-structured density function.
To make this non-differentiable sampling process end-to-end trainable, we draw inspiration from reinforcement learning. We frame the problem using a policy-gradient approach, computing the gradient of the density function with respect to the rendering error using a difference reward. Here, the density at a specific location is adjusted based on how much the presence of Gaussians at that location contributes to fixing the rendering error. This naturally mirrors density control: the model learns to allocate more Gaussians in underfitted regions ("splitting") and reduce them in well-fitted regions ("pruning").
Our practical pipeline consists of two stages:
- We train an autoencoder to compress these DeG-represented 3D objects into fixed-size latent vectors.
- We train a flow-matching model to generate these latents from a single input image.

This learned density control allows TripoSplat to deliver significantly higher visual quality with the same number of Gaussians, or match the quality of structure-aligned methods like TRELLIS while using drastically fewer particles.

Unmatched Quality, Controllable Budgets
TripoSplat delivers significantly higher-quality 3D Gaussians than current state-of-the-art open-source 3D generators. To evaluate this performance empirically, we conducted a comprehensive user study using 94 input images spanning a wide variety of artistic styles and geometric complexities. We gathered 399 pairwise human preference choices from 32 independent participants to compute standardized Elo ratings. In this setup, a higher Elo score directly correlates with a stronger human preference for visual fidelity and asset accuracy. TripoSplat outperforms existing baselines by a substantial margin, sitting over 140 points ahead of the next best model.
| Method | TripoSplat | TRELLIS | TRELLIS.2 | UniLat3D | Hunyuan3D 2.1 |
|---|---|---|---|---|---|
| Elo ↑ | 1137 | 975 | 992 | 900 | 996 |
Furthermore, DeG introduces a highly practical property: inference-time budget control. Because Gaussian locations are independently sampled from the learned density distribution, you can simply specify your desired sample count at inference time depending on your deployment scenario:
- For background elements or simple geometry: You can request a low Gaussian count to minimize storage and maximize rendering frame rates.
- For hero assets: You can scale up the count to preserve intricate, high-fidelity details.
- For multi-platform deployment: You can instantly extract multiple versions of an asset at different resolutions to serve as a Level of Detail (LoD) system. This ability to control your computational budget is a massive win for practical production pipelines, given that raw Gaussians can still be more resource-intensive to render than traditional polygon meshes.

Open for Everyone
The weights and inference code for TripoSplat will be open-sourced very soon. Stay tuned!



