Research from VAST AI

Filter:
May 8, 2026

Generative 3D Gaussians with Learned Density Control

Runjie Yan, Yan-Pei Cao, Peng Wang, Ding Liang, Yuan-Chen GuoSIGGRAPH 2026

TripoSplat introduces Density-Sampled Gaussians (DeG) for fully adaptive, grid-free 3D generation. We achieve differentiable densification by parameterizing primitive centers as samples from a learnable spatial density, optimized directly via a novel render-loss gradient. To model these unstructured sets, our VecSeq diffusion framework resolves permutation ambiguity by anchoring latents to a deterministic 3D Sobol sequence. TripoSplat achieves state-of-the-art single-image-to-3D generation while uniquely enabling variable-resolution decoding from a single compact latent code.

April 9, 2026

AniGen: Unified S^3 Fields for Animatable 3D Asset Generation

Yi-Hua Huang, Zi-Xin Zou, Yuting He, Chirui Chang, Cheng-Feng Pu, Ziyi Yang, Yuan-Chen Guo, Yan-Pei Cao, Xiaojuan QiSIGGRAPH 2026

AniGen overcomes the brittleness of sequential "generate-then-rig" pipelines by directly generating fully animatable 3D assets from a single image. We achieve this by unifying shape, skeleton, and skinning weights into a shared continuous spatial representation: $S^3$ Fields. To model rigs of arbitrary complexity, we introduce a joint-count agnostic Dual Skin Field, paired with a confidence-decaying skeleton field that explicitly resolves topological ambiguities at kinematic boundaries. By generating these compressed fields via a two-stage structured latent flow-matching architecture, AniGen guarantees intrinsic structural consistency between geometry and articulation. The result is a robust, end-to-end foundation for generating production-ready, instantly animatable characters and objects across diverse categories.

March 2, 2026

FACE: A Face-based Autoregressive Representation for High-Fidelity and Efficient Mesh Generation

Hanxiao Wang, Yuan-Chen Guo, Ying-Tian Liu, Zi-Xin Zou, Biao Zhang, Weize Quan, Ding Liang, Yan-Pei Cao, Dong-Ming YanCVPR 2026 (Highlight)

Autoregressive mesh generation is traditionally bottlenecked by the quadratic compute cost of modeling flattened vertex coordinate sequences. FACE fundamentally resolves this by elevating generation to a higher semantic tier. Through a novel “one-face-one-token” strategy, we encapsulate entire triangle faces into single unified tokens, slashing sequence lengths by a factor of nine. This architectural elegance yields an unprecedented compression ratio of 0.11, doubling the efficiency of prior state-of-the-art without relying on brittle, lossy traversal heuristics. By coupling this highly efficient Autoregressive Autoencoder (ARAE) with latent diffusion, FACE provides a robust, scalable, and compute-efficient foundation for high-fidelity direct mesh generation.

February 4, 2026

SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging

Jia-Peng Zhang, Cheng-Feng Pu, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu

Current auto-rigging methods fail to scale because they treat skinning as a brittle, high-dimensional regression task decoupled from skeleton generation. SkinTokens solves this by compressing sparse skinning matrices into discrete token sequences. This enables TokenRig, an autoregressive framework that jointly synthesizes skeletal topology and surface deformations as a single coherent sequence. Post-trained via reinforcement learning (GRPO) with explicit geometric rewards, TokenRig generalizes robustly to complex, in-the-wild assets. By seamlessly unifying skeleton and skin prediction, the framework achieves up to a 133% improvement in skinning accuracy, establishing a scalable, end-to-end foundation for animation-ready 3D content.

December 14, 2025

LegoACE: Autoregressive Construction Engine for Expressive LEGO® Assemblies

Hao Xu, Yuqing Zhang, Yiqian Wu, Xinyang Zheng, Yutao Liu, Xiangjun Tang, Yunhan Yang, Ding Liang, Yingtian Liu, Yuanchen Guo, Yanpei Cao, Xiaogang JinSIGGRAPH Asia 2025

Built on native brick tokenization and backed by our 55,000-model LegoVerse dataset, LegoACE is an autoregressive engine that generates expressive LEGO assemblies from text or normal maps.