To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, Online Language Splatting is introduced, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality, and open-vocabulary capability. To this end, an innovative design includes: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that the online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.
3D Gaussian Splatting, CLIP Embedding, Auto-encoder
3DGS-SLAM system with language mapping
Custom 3D scene datasets
Mapping accuracy, Processing speed
Cloud-based, On-premises
Yes
Yes
Real-time language mapping, High-dimensional feature compression
Yes
GPU for real-time processing
Windows, Linux
Compatible with various 3D scene representation formats
Data encryption and secure processing
ISO/IEC 27001
None
No
Available through forums and support channels
Research team from the study
Large-scale 3D scene datasets
18ms per frame
Optimized for real-time processing
Visual representation of language mapping
Ensuring accurate and unbiased language mapping
Limited to specific 3D environments
Robotics, Gaming, Simulation
Interactive AI applications, Real-time 3D mapping
Robotics companies, AR/VR developers
Can be integrated with existing 3D mapping systems
Highly scalable
Enterprise support available
Available upon request
Web-based interface
Yes
Supports multiple languages
Subscription-based
Yes
Collaborations with robotics research groups
Pending
Compliant with industry standards
1.0
Software as a Service (SaaS)
Yes
RESTful API available
B2B
0.00
USD
Proprietary
01/03/2025
01/03/2025
123-456-7890
Supports integration with popular 3D mapping frameworks
Yes