📌 TOPINDIATOURS Breaking ai: ByteDance Introduces Astra: A Dual-Model Architecture
The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.
Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.
While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.
ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.
Astra-Global: The Intelligent Brain for Global Localization
Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.
The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):
- V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
- E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
- L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.
In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.
For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.
To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.
Astra-Local: The Intelligent Assistant for Local Planning
Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.
The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.
The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.
The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…
Konten dipersingkat otomatis.
đź”— Sumber: syncedreview.com
📌 TOPINDIATOURS Eksklusif ai: Intel’s Core Ultra 3 and Xeon 6+ mark US return to c
Intel has unveiled its most advanced client and server processors yet, marking a major leap in semiconductor technology.
The new Intel Core Ultra series 3 and Xeon 6+ chips promise higher performance, energy efficiency, and AI capabilities, all built on Intel’s groundbreaking 18A process.
Panther Lake, the codename for Intel Core Ultra series 3, is set to power consumer and commercial AI PCs, gaming devices, and edge solutions. It will begin high-volume production this year, with the first units shipping before the end of 2025.
“We are entering an exciting new era of computing, made possible by great leaps forward in semiconductor technology that will shape the future for decades to come,” said Intel CEO Lip-Bu Tan.
“Our next-gen compute platforms, combined with our leading-edge process technology, manufacturing and advanced packaging capabilities, are catalysts for innovation across our business as we build a new Intel.”
Panther Lake features a scalable, multi-chiplet architecture offering partners flexibility across form factors, segments, and price points.
Highlights include up to 16 new performance-cores (P-cores) and efficient-cores (E-cores) delivering more than 50% faster CPU performance than the previous generation.
A new Intel® Arc GPU with up to 12 Xe cores promises 50% faster graphics performance, while balanced XPU design delivers up to 180 TOPS for AI acceleration.
Scalable AI and robotics
Panther Lake’s reach goes beyond PCs. Intel is also targeting edge applications, including robotics, through a new Intel Robotics AI software suite and reference board. This allows customers to innovate with sophisticated AI capabilities for robot controls and perception.
Clearwater Forest, Intel’s codename for Xeon 6+, is the company’s first Intel 18A-based server processor and will launch in the first half of 2026.
Tailored for hyperscale data centers, cloud providers, and telcos, Xeon 6+ features up to 288 E-cores and a 17% uplift in Instructions Per Cycle (IPC) over the prior generation.
“The United States has always been home to Intel’s most advanced R&D, product design and manufacturing – and we are proud to build on this legacy as we expand our domestic operations and bring new innovations to the market,” Tan said.
Intel 18A: U.S. technology leader
Intel 18A, the first 2-nanometer class node developed and manufactured in the United States, delivers up to 15% better performance per watt and 30% improved chip density compared to Intel 3.
Keney innovations include RibbonFET, a new transistor architecture enabling efficient scaling, and PowerVia, a backside power delivery system enhancing energy flow.
Foveros 3D chip stacking technology enables flexible integration of multiple chiplets, forming advanced system-on-chip designs for both client and server applications.
Panther Lake and Xeon 6+, along with multiple future generations, will leverage Intel 18A and advanced packaging.
Fab 52: U.S. foundry milestone
Both chips are being manufactured at Intel’s state-of-the-art Fab 52 in Chandler, Arizona, part of Intel’s $100 billion investment to expand domestic operations.
The facility strengthens U.S. manufacturing leadership, supports a resilient semiconductor supply chain, and positions Intel to serve both its own product lines and foundry customers.
Fab 52 builds on Intel’s 56 years of R&D and manufacturing advancements in Oregon, Arizona, and New Mexico.
The site is central to Intel’s strategy for providing advanced AI compute platforms while maintaining a trusted U.S. semiconductor supply base.
đź”— Sumber: interestingengineering.com
🤖 Catatan TOPINDIATOURS
Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.
✅ Update berikutnya dalam 30 menit — tema random menanti!