TOPINDIATOURS Eksklusif ai: Mark Zuckerberg Humiliated as AI Glasses Debut Fails in Front

📌 TOPINDIATOURS Eksklusif ai: Mark Zuckerberg Humiliated as AI Glasses Debut Fails

On Wednesday, Meta CEO Mark Zuckerberg unveiled a slew of new augmented reality glasses, including what he claimed to be the “first AI glasses with high resolution,” a new $799 version of its Meta Ray-Ban smart glasses that features a tiny screen that’s viewable to the wearer.

But it didn’t take long for the company’s MetaConnect 2025 keynote to descend into chaos. The social media giant’s demos repeatedly failed, leading to awkward stares, deafening silences, and muted laughter.

The poor showing painfully demonstrates that the tech is far from ready, even as companies continue to shove AI into every aspect of our daily lives.

The stakes are high. Meta is spending tens of billions of dollars to build out infrastructure and hire industry-leading staff to support AI. Zuckerberg has also repeatedly doubled down on smart glasses being the future of the company, as well as AI-powered “superintelligence” as a whole.

“This is one of those special moments where we get to show you something we’ve poured our lives into,” he told the crowd at this week’s event.

Yet getting the tech to work on stage in front of a huge crowd proved too much, demonstrating once again that there’s still a glaring gap between the AI industry’s breathless promises and cold, hard reality.

According to Zuckerberg’s vision of the near future, wearers of Meta’s glasses can converse with an AI chatbot to tell them what they’re looking at — or how to do things, like coaching on how to cook a dish.

“Let’s try it! It’s not something I’ve made before,” food content creator Jack Mancuso told Zuckerberg enthusiastically, after the CEO challenged him to make a steak sauce with the help of a new feature called “Live AI.”

“Can you help me create a Korean-inspired steak sauce?” Mancuso asked his glasses.

“What do I do first?” Mancuso interjected after the robotic voice started making suggestions.

“What do I do first?” the influencer repeated after several seconds of total silence that followed.

“You already combined the base ingredients,” the AI told Mancuso, who was standing in front of an empty glass bowl that he hadn’t touched yet.

A separate attempt by Zuckerberg to make a video call with his glasses ended with him awkwardly trying to explain why it wasn’t working.

“This is, uh… it happens,” the CEO stammered.

“Let’s try it again, I keep messing this up,” he added.

Is this really all Meta has to show for it at this point? If so, the company still has an immense amount to to show if it wants to justify its enormous spending spree.

More on Zuckerberg: Hot Mic Catches Mark Zuckerberg Groveling to Trump

The post Mark Zuckerberg Humiliated as AI Glasses Debut Fails in Front of Huge Crowd appeared first on Futurism.

🔗 Sumber: futurism.com


📌 TOPINDIATOURS Update ai: ByteDance Introduces Astra: A Dual-Model Architecture f

The increasing integration of robots across various sectors, from industrial manufacturing to daily life, highlights a growing need for advanced navigation systems. However, contemporary robot navigation systems face significant challenges in diverse and complex indoor environments, exposing the limitations of traditional approaches. Addressing the fundamental questions of “Where am I?”, “Where am I going?”, and “How do I get there?”, ByteDance has developed Astra, an innovative dual-model architecture designed to overcome these traditional navigation bottlenecks and enable general-purpose mobile robots.

Traditional navigation systems typically consist of multiple, smaller, and often rule-based modules to handle the core challenges of target localization, self-localization, and path planning. Target localization involves understanding natural language or image cues to pinpoint a destination on a map. Self-localization requires a robot to determine its precise position within a map, especially challenging in repetitive environments like warehouses where traditional methods often rely on artificial landmarks (e.g., QR codes). Path planning further divides into global planning for rough route generation and local planning for real-time obstacle avoidance and reaching intermediate waypoints.

While foundation models have shown promise in integrating smaller models to tackle broader tasks, the optimal number of models and their effective integration for comprehensive navigation remained an open question.

ByteDance’s Astra, detailed in their paper “Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning” (website: https://astra-mobility.github.io/), addresses these limitations. Following the System 1/System 2 paradigm, Astra features two primary sub-models: Astra-Global and Astra-Local. Astra-Global handles low-frequency tasks like target and self-localization, while Astra-Local manages high-frequency tasks such as local path planning and odometry estimation. This architecture promises to revolutionize how robots navigate complex indoor spaces.

Astra-Global: The Intelligent Brain for Global Localization

Astra-Global serves as the intelligent core of the Astra architecture, responsible for critical low-frequency tasks: self-localization and target localization. It functions as a Multimodal Large Language Model (MLLM), adept at processing both visual and linguistic inputs to achieve precise global positioning within a map. Its strength lies in utilizing a hybrid topological-semantic graph as contextual input, allowing the model to accurately locate positions based on query images or text prompts.

The construction of this robust localization system begins with offline mapping. The research team developed an offline method to build a hybrid topological-semantic graph G=(V,E,L):

  • V (Nodes): Keyframes, obtained by temporal downsampling of input video and SfM-estimated 6-Degrees-of-Freedom (DoF) camera poses, act as nodes encoding camera poses and landmark references.
  • E (Edges): Undirected edges establish connectivity based on relative node poses, crucial for global path planning.
  • L (Landmarks): Semantic landmark information is extracted by Astra-Global from visual data at each node, enriching the map’s semantic understanding. These landmarks store semantic attributes and are connected to multiple nodes via co-visibility relationships.

In practical localization, Astra-Global’s self-localization and target localization capabilities leverage a coarse-to-fine two-stage process for visual-language localization. The coarse stage analyzes input images and localization prompts, detects landmarks, establishes correspondence with a pre-built landmark map, and filters candidates based on visual consistency. The fine stage then uses the query image and coarse output to sample reference map nodes from the offline map, comparing their visual and positional information to directly output the predicted pose.

For language-based target localization, the model interprets natural language instructions, identifies relevant landmarks using their functional descriptions within the map, and then leverages landmark-to-node association mechanisms to locate relevant nodes, retrieving target images and 6-DoF poses.

To empower Astra-Global with robust localization abilities, the team employed a meticulous training methodology. Using Qwen2.5-VL as the backbone, they combined Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). SFT involved diverse datasets for various tasks, including coarse and fine localization, co-visibility detection, and motion trend estimation. In the GRPO phase, a rule-based reward function (including format, landmark extraction, map matching, and extra landmark rewards) was used to train for visual-language localization. Experiments showed GRPO significantly improved Astra-Global’s zero-shot generalization, achieving 99.9% localization accuracy in unseen home environments, surpassing SFT-only methods.

Astra-Local: The Intelligent Assistant for Local Planning

Astra-Local acts as the intelligent assistant for Astra’s high-frequency tasks, a multi-task network capable of efficiently generating local paths and accurately estimating odometry from sensor data. Its architecture comprises three core components: a 4D spatio-temporal encoder, a planning head, and an odometry head.

The 4D spatio-temporal encoder replaces traditional mobile stack perception and prediction modules. It begins with a 3D spatial encoder that processes N omnidirectional images through a Vision Transformer (ViT) and Lift-Splat-Shoot to convert 2D image features into 3D voxel features. This 3D encoder is trained using self-supervised learning via 3D volumetric differentiable neural rendering. The 4D spatio-temporal encoder then builds upon the 3D encoder, taking past voxel features and future timestamps as input to predict future voxel features through ResNet and DiT modules, providing current and future environmental representations for planning and odometry.

The planning head, based on pre-trained 4D features, robot speed, and task information, generates executable trajectories using Transformer-based flow matching. To prevent collisions, the planning head incorporates a masked ESDF loss (Euclidean Signed Distance Field). This loss calculates the ESDF of a 3D occupancy map and applies a 2D ground truth trajectory mask, significantly reducing collision rates. Experiments demonstrate its superior performance in collision rate and overall score on out-of-distribution (OOD) datasets compared to other methods.

The odometry head predicts the robot’s relative pose using current and past 4D features and additional sensor data (e.g., IMU, wheel data). It trains a Transformer model to fuse information from different sensors. Each sensor modality is processed by a specific tokenizer, combined with modality embeddings and temporal positional embeddi…

Konten dipersingkat otomatis.

🔗 Sumber: syncedreview.com


🤖 Catatan TOPINDIATOURS

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!