TOPINDIATOURS Update ai: ByteDance Introduces Astra: A Dual-Model Architecture for Autonom

📌 TOPINDIATOURS Update ai: ByteDance Introduces Astra: A Dual-Model Architecture f

The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.

Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.

While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.

ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.

Astra-Global: The Intelligent Brain for Global Localization

Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.

The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):

V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.

In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.

For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.

To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.

Astra-Local: The Intelligent Assistant for Local Planning

Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.

The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.

The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.

The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…

Konten dipersingkat otomatis.

🔗 Sumber: syncedreview.com

📌 TOPINDIATOURS Eksklusif ai: 5 ways Iran shields its ballistic missile arsenal fr

For decades, military conflicts have invested heavily in protecting their most important military assets, ballistic missile systems being one of them. The current battle between Iran, Israel, and the United States is a striking example of why ensuring the survival of the missile systems is equally necessary as operating them during a conflict.

Today, missile infrastructure is often protected using a series of methods. For instance, Iran has developed extensive underground missile bases and missile cities designed to conceal the launch systems and ensure they can survive potential strikes.

These strategies rely on terrain, engineering, and mobility to make missile systems harder to detect and destroy. That being said, let’s have a glance at how Iran is currently shielding its most prized military assets for future use.

1. The missile cities

Getty Images

For starters, Iran embeds large ballistic missile facilities deep inside mountains, under hundreds of meters of hard rock, which greatly reduces vulnerability to conventional and even nuclear‑grade bunker‑buster weapons.

Known as “missile cities“, these locations feature multiple hardened tunnels, interconnected storage halls, and separate zones for warheads, fuel, and guidance systems. The overall setup allows the system to survive even when some entrances are hit or compromised.

The depth and rock overburden in particular make it extremely difficult for satellite-guarded munitions to collapse or neutralize the entire complex.

2. Robust entrances

The entrances to these tunnel-based missile sites are often entrenched in deep rock, hidden behind natural terrain, and often reinforced with steel-clad concrete “liners” that are thick on the outside. They effectively turn the opening into a reinforced bunker entrance.

Inside, the tunnel design often includes blast-trap dead-end shafts aligned with the entrance axis. When a strike hits, the shockwave is channeled into these stub tunnels rather than penetrating the main complex. Heavy blast-resistant doors and compartmentalized chambers further help reduce pressure and limit damage inside the facility.

3. Camouflage, decoys, and deception

Visual and infrared camouflage, decoys, and fake launchers together form a strategy Iran uses to confuse surveillance systems such as satellites and reconnaissance aircraft, in a bid to defend its ballistic missile systems. These tactics make it harder for adversaries to identify which targets are real and worth shooting down.

Satellite imagery has revealed dummy missile launchers, inflatable replicas, and wooden models placed in open areas to attract attention and mislead targeting systems.

Some missile storage buildings are painted to resemble civilian structures, while others are covered with thermal-masking nets or earth barriers to reduce their heat and radar signatures.

4. Dispersal tactics

Wikimedia Commons

Iran relies heavily on mobile missile launch vehicles called transporter-erector-launchers (TELs), instead of fixed launch pads. Since these launchers can be moved from place to place, it becomes difficult to destroy the entire ballistic missile system as it is stored in parts and in different places.

These vehicles often use hardened shelters or underground garages. They can be quickly driven onto highways, desert roads, or remote valleys before launch. This mobility, along with decoy buildings and multiple road routes, makes them much harder for satellites or surveillance systems to track and target.

5. Physical hardening

Iran builds concrete-and-earth “sarcophagus-style” covers over key structures, then buries them under soil so they appear like natural terrain to optical and synthetic-aperture-radar satellites. This makes the facilities look like natural terrain from satellite images, helping hide them from optical and radar-based surveillance.

At several missile-related sites near Tehran and other areas, some buildings have been fully buried under concrete roofs and soil, improving protection against air strikes and precision-guided bombs.

After past attacks, Iran has also repaired damaged areas by filling tunnel entrances, rebuilding protective walls, and installing temporary roofs to restore operations and reduce visible damage.

Conclusion

Iran’s approach to protecting its missile forces shows how modern warfare increasingly depends on engineering, mobility, and deception, not just firepower. In today’s conflict environment, the survivability of these systems has become as strategically important as the missiles themselves.

🔗 Sumber: interestingengineering.com

🤖 Catatan TOPINDIATOURS

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!