Architecture
Mixture of Experts (MoE)
Sparse MoE architecture for efficient scaling and inference
Parameters
~1T Total / ~100B Active
Rumored 1 trillion total parameters with ~100B active during inference
Context Length
1M Tokens
Potentially supports up to 1 million token context windows
Modalities
Text + Vision + Audio
Native multimodal support for text, images, and audio processing
Reasoning
Advanced Chain-of-Thought
Enhanced multi-step reasoning with extended thinking capabilities
Language Support
50+ Languages
Exceptional performance in Chinese, English, Japanese, Korean, and 46+ more
Training Data
15T+ Tokens
Trained on over 15 trillion tokens from diverse high-quality sources
Code Capability
Top-tier Code Generation
Best-in-class performance on HumanEval, SWE-bench, and competitive programming tasks
* Rumored specs are based on community reports and unverified leaks. Official specs will be updated upon release.