📌 TOPINDIATOURS Eksklusif ai: Moonshot's Kimi K2 Thinking emerges as leading
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has even caught up to OpenAI's flagship, paid proprietary model GPT-5 in key third-party performance benchmarks with a new, free model.
The Chinese AI startup Moonshot AI’s new Kimi K2 Thinking model, released today, has vaulted past both proprietary and open-weight competitors to claim the top position in reasoning, coding, and agentic-tool benchmarks.
Despite being fully open-source, the model now outperforms OpenAI’s GPT-5, Anthropic’s Claude Sonnet 4.5 (Thinking mode), and xAI's Grok-4 on several standard evaluations — an inflection point for the competitiveness of open AI systems.
Developers can access the model via platform.moonshot.ai and kimi.com; weights and code are hosted on Hugging Face. The open release includes APIs for chat, reasoning, and multi-tool workflows.
Users can try out Kimi K2 Thinking directly through its own ChatGPT-like website competitor and on a Hugging Face space as well.
Modified Standard Open Source License
Moonshot AI has formally released Kimi K2 Thinking under a Modified MIT License on Hugging Face.
The license grants full commercial and derivative rights — meaning individual researchers and developers working on behalf of enterprise clients can access it freely and use it in commercial applications — but adds one restriction:
"If the software or any derivative product serves over 100 million monthly active users or generates over $20 million USD per month in revenue, the deployer must prominently display 'Kimi K2' on the product’s user interface."
For most research and enterprise applications, this clause functions as a light-touch attribution requirement while preserving the freedoms of standard MIT licensing.
It makes K2 Thinking one of the most permissively licensed frontier-class models currently available.
A New Benchmark Leader
Kimi K2 Thinking is a Mixture-of-Experts (MoE) model built around one trillion parameters, of which 32 billion activate per inference.
It combines long-horizon reasoning with structured tool use, executing up to 200–300 sequential tool calls without human intervention.
According to Moonshot’s published test results, K2 Thinking achieved:
-
44.9 % on Humanity’s Last Exam (HLE), a state-of-the-art score;
-
60.2 % on BrowseComp, an agentic web-search and reasoning test;
-
71.3 % on SWE-Bench Verified and 83.1 % on LiveCodeBench v6, key coding evaluations;
-
56.3 % on Seal-0, a benchmark for real-world information retrieval.
Across these tasks, K2 Thinking consistently outperforms GPT-5’s corresponding scores and surpasses the previous open-weight leader MiniMax-M2—released just weeks earlier by Chinese rival MiniMax AI.
Open Model Outperforms Proprietary Systems
GPT-5 and Claude Sonnet 4.5 Thinking remain the leading proprietary “thinking” models.
Yet in the same benchmark suite, K2 Thinking’s agentic reasoning scores exceed both: for instance, on BrowseComp the open model’s 60.2 % decisively leads GPT-5’s 54.9 % and Claude 4.5’s 24.1 %.
K2 Thinking also edges GPT-5 in GPQA Diamond (85.7 % vs 84.5 %) and matches it on mathematical reasoning tasks such as AIME 2025 and HMMT 2025.
Only in certain heavy-mode configurations—where GPT-5 aggregates multiple trajectories—does the proprietary model regain parity.
That Moonshot’s fully open-weight release can meet or exceed GPT-5’s scores marks a turning point. The gap between closed frontier systems and publicly available models has effectively collapsed for high-end reasoning and coding.
Surpassing MiniMax-M2: The Previous Open-Source Benchmark
When VentureBeat profiled MiniMax-M2 just a week and a half ago, it was hailed as the “new king of open-source LLMs,” achieving top scores among open-weight systems:
-
τ²-Bench 77.2
-
BrowseComp 44.0
-
FinSearchComp-global 65.5
-
SWE-Bench Verified 69.4
Those results placed MiniMax-M2 near GPT-5-level capability in agentic tool use. Yet Kimi K2 Thinking now eclipses them by wide margins.
Its BrowseComp result of 60.2 % exceeds M2’s 44.0 %, and its SWE-Bench Verified 71.3 % edges out M2’s 69.4 %. Even on financial-reasoning tasks such as FinSearchComp-T3 (47.4 %), K2 Thinking performs comparably while maintaining superior general-purpose reasoning.
Technically, both models adopt sparse Mixture-of-Experts architectures for compute efficiency, but Moonshot’s network activates more experts and deploys advanced quantization-aware training (INT4 QAT).
This design doubles inference speed relative to standard precision without degrading accuracy—critical for long “thinking-token” sessions reaching 256 k context windows.
Agentic Reasoning and Tool Use
K2 Thinking’s defining capability lies in its explicit reasoning trace. The model outputs an auxiliary field, reasoning_content, revealing intermediate logic before each final response. This transparency preserves coherence across long multi-turn tasks and multi-step tool calls.
A reference implementation published by Moonshot demonstrates how the model autonomously conducts a “daily news report” workflow: invoking date and web-search tools, analyzing retrieved content, and composing structured output—all while maintaining internal reasoning state.
This end-to-end autonomy enables the model to plan, search, execute, and synthesize evidence across hundreds of steps, mirroring the emerging class of “agentic AI” systems that operate with minimal supervision.
Efficiency and Access
Despite its trillion-parameter scale, K2 Thinking’s runtime cost remains modest. Moonshot lists usage at:
-
$0.15 / 1 M tokens (cache hit)
-
$0.60 / 1 M tokens (cache miss)
-
$2.50 / 1 M tokens output
These rates are competitive even against MiniMax-M2’s $0.30 input / $1.20 output pricing—and an order of magnitude below GPT-5 ($1.25 input / $10 output).
Comparative Context: Open-Weight Acceleration
The rapid succession of M2 and K2 Thinking illustrates how quickly open-source research is catching frontier systems. MiniMax-M2 demonstrated that open models could approach GPT-5-class agentic capability at a fraction of the compute cost. Moonshot has now advanced that frontier further, pushing open weights beyond parity into outright leadership.
Both models rely on sparse activation for efficiency, but K2 Thinking’s higher activation count (32 B vs 10 B active parameters) yields stronger r…
Konten dipersingkat otomatis.
đź”— Sumber: venturebeat.com
📌 TOPINDIATOURS Hot ai: General Atomics unveils new military drone for air-to-grou
The California-based aerospace firm General Atomics Aeronautical Systems unveiled the Gambit 6 unmanned combat aerial vehicle (UCAV) at the International Fighter Conference in Rome.Â
The new combat drone expands the company’s modular Gambit series with a version built specifically for ground strike missions.
A new chapter for the Gambit series
The Gambit 6 adds precision strike and electronic warfare capabilities to a family already known for air-to-air and reconnaissance operations. It is designed to handle suppression of enemy air defenses, deep strikes, and naval attacks in contested environments.
The aircraft unveiled on November 4 features an internal weapons bay that reduces radar visibility and supports precision-guided weapons such as the GBU-53/B StormBreaker.
The defense technology firm confirmed plans to collaborate with European industries for assembly and mission integration. “International deliveries will begin in 2027, with European missionized versions scheduled for 2029,” the company said.
Modular design for efficiency and flexibility
Like its predecessors, the Gambit 6 is built around a shared “Gambit Core,” a modular design that includes landing gear, avionics, and structural elements used across all models.
Earlier versions in the family include the Gambit 1 for long-range surveillance, Gambit 2 for air combat, Gambit 3 for training, Gambit 4 for stealth reconnaissance, and Gambit 5 for carrier missions. The Gambit 6 continues this evolution, focusing on strike and electronic attack.
Another related system, the YFQ-42A based on the Gambit 2, is intended to serve as an AI-powered wingman for the U.S. Air Force’s F-35 and Next Generation Air Dominance (NGAD) fighters. It began flight testing in August 2025, showing how uncrewed aircraft can operate alongside piloted jets in future air combat.
Smart systems and combat roles
The Gambit 6’s open avionics architecture supports autonomy software, electronic warfare systems, and precision sensors. Its digital design allows software and mission upgrades without major hardware changes. This enables it to adapt to new threats and operate in formation with both crewed and uncrewed aircraft.
The new combat drone is optimized for suppression of enemy air defenses and deep precision strikes. It can coordinate with human pilots, carrying out reconnaissance and jamming missions to protect fighter formations. General Atomics said the system will “reduce risk to human pilots by conducting independent or coordinated attacks.”
The Gambit 6 will also support distributed autonomy, allowing multiple drones to work together in complex strike packages. Each aircraft can share data in real time, helping crews make faster and safer tactical decisions in battle.
Production and market outlook
The U.S. defense contractor plans to build the Gambit 6 using its existing production lines in Poway, California, where it manufactures over 100 drones a year. It has delivered more than 1,200 aircraft worldwide, logging over nine million flight hours since 1992.
Localized assembly centers in Europe, including in Germany, will produce versions tailored for regional defense needs. This structure supports faster delivery and national customization.
The Gambit 6 enters a growing market for collaborative combat aircraft. Competitors include Boeing’s MQ-28 Ghost Bat, Anduril’s YFQ-44A, Lockheed Martin’s Vectis, and Shield AI’s X-BAT. General Atomics said it unveiled the new system in Rome to align with European defense discussions and to offer it for foreign acquisition programs.
đź”— Sumber: interestingengineering.com
🤖 Catatan TOPINDIATOURS
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!