TOPINDIATOURS Update ai: ByteDance Introduces Astra: A Dual-Model Architecture for Autonom

📌 TOPINDIATOURS Update ai: ByteDance Introduces Astra: A Dual-Model Architecture f

The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.

Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.

While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.

ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.

Astra-Global: The Intelligent Brain for Global Localization

Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.

The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):

  • V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
  • E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
  • L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.

In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.

For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.

To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.

Astra-Local: The Intelligent Assistant for Local Planning

Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.

The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.

The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.

The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…

Konten dipersingkat otomatis.

đź”— Sumber: syncedreview.com


📌 TOPINDIATOURS Hot ai: US’ refueling plane that crashed in Iraq had no parachutes

Following the crash of a US Boeing KC-135 “Stratotanker” refueling jet in Iraq on Thursday (March 12), it came to light that the crew likely lacked any parachutes. According to reports, these pieces of kit were removed in 2008 to save time and money.

This crash occurred as part of America’s ongoing “Operation Epic Fury” in Iran, officials of which have confirmed the loss was not from enemy or friendly fire. According to reports, two KC-135 aircraft were involved in the operation, with the second landing safely.

This incident was the first lost KC-135 since 2013, when a 3-man crew aircraft crashed in the Kyrgyz Republic after the flight control system malfunctioned during a combat aerial refueling mission.

Reports at the time indicate that the lost KC-135 had five crew members onboard at the time of the incident, and no casualty reports have been publicly released. US officials also confirmed that rescue operations were underway to find and recover the lost crew.

KC-135 lost over Iraq

The KC-135 is a derivative of early Boeing-747 airframes, and can weigh up to a 130-ton aircraft when fully laden. They can operate at altitudes of between 30,000 and 40,000 feet and act as flying gas stations, refueling fighters, bombers, and surveillance aircraft mid-air.

The exact cause of the crash has not been publicly announced, but likely the result of mechanical failure, a midair refueling accident, or a fuel system malfunction. Other possibilities include structural failure or controlled flight into terrain.

On the surface, however, what appears to be most surprising is the apparent lack of parachutes for the crew. However, as it turns out, this is not uncommon for such aircraft.

“Removing parachutes from military aircraft may sound peculiar, but KC-135s are not like other aircraft. They seldom have mishaps, and the likelihood a KC-135 crew member would ever need to use a parachute is extremely low,” an Air Force press release said in 2008.

“However, a lot of time, manpower, and money go into buying, maintaining, and training to use parachutes. With the Air Force hungry for cost-saving efficiency under its Air Force for Smart Operations in the 21st Century Program, commonly known as AFSO 21, the parachutes were deemed obsolete,” it added.

Most large military and civilian aircraft tend not to carry parachutes, including airliners, cargo aircraft, AWACs, etc. Instead, safety doctrine focuses on redundant systems, controlled emergency landings, and crew resource management to try to get the plane down safely without the need to bail.

Parachutes may not have helped anyway

Even if the crew had them, it is unlikely they’d be able to safely use them, escape the aircraft at height, and survive in an emergency anyway. Parachutes make more sense for things like fighter ejection seats and other aircraft with specially designed escape systems.

So the lack of parachutes in the KC-135, under this light, should not come as much of a shock. What is more concerning is that the KC-135 is now very old (first built in the late 1950s) and is in dire need of retirement or upgrade.

The airframe is currently scheduled to remain in service until at least 2050, and upgrade programs are constantly being delayed. It is now hoped that this latest tragedy may force the issue to prevent further similar losses in the future.

“Please keep these brave airmen, their families, friends, and units in your thoughts,” the Joint Chiefs chairman, General Dan Caine, told the press in a Pentagon briefing on the matter. “In the coming hours and days, our service members make an incredible sacrifice to go forward and do the things that the nation asks of them,” he added.

đź”— Sumber: interestingengineering.com


🤖 Catatan TOPINDIATOURS

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!