TOPINDIATOURS Breaking ai: Adobe Research Unlocking Long-Term Memory in Video World Models

📌 TOPINDIATOURS Breaking ai: Adobe Research Unlocking Long-Term Memory in Video Wo

Video world models, which predict future frames conditioned on actions, hold immense promise for artificial intelligence, enabling agents to plan and reason in dynamic environments. Recent advancements, particularly with video diffusion models, have shown impressive capabilities in generating realistic future sequences. However, a significant bottleneck remains: maintaining long-term memory. Current models struggle to remember events and states from far in the past due to the high computational cost associated with processing extended sequences using traditional attention layers. This limits their ability to perform complex tasks requiring sustained understanding of a scene.

A new paper, “Long-Context State-Space Video World Models” by researchers from Stanford University, Princeton University, and Adobe Research, proposes an innovative solution to this challenge. They introduce a novel architecture that leverages State-Space Models (SSMs) to extend temporal memory without sacrificing computational efficiency.

The core problem lies in the quadratic computational complexity of attention mechanisms with respect to sequence length. As the video context grows, the resources required for attention layers explode, making long-term memory impractical for real-world applications. This means that after a certain number of frames, the model effectively “forgets” earlier events, hindering its performance on tasks that demand long-range coherence or reasoning over extended periods.

The authors’ key insight is to leverage the inherent strengths of State-Space Models (SSMs) for causal sequence modeling. Unlike previous attempts that retrofitted SSMs for non-causal vision tasks, this work fully exploits their advantages in processing sequences efficiently.

The proposed Long-Context State-Space Video World Model (LSSVWM) incorporates several crucial design choices:

  1. Block-wise SSM Scanning Scheme: This is central to their design. Instead of processing the entire video sequence with a single SSM scan, they employ a block-wise scheme. This strategically trades off some spatial consistency (within a block) for significantly extended temporal memory. By breaking down the long sequence into manageable blocks, they can maintain a compressed “state” that carries information across blocks, effectively extending the model’s memory horizon.
  2. Dense Local Attention: To compensate for the potential loss of spatial coherence introduced by the block-wise SSM scanning, the model incorporates dense local attention. This ensures that consecutive frames within and across blocks maintain strong relationships, preserving the fine-grained details and consistency necessary for realistic video generation. This dual approach of global (SSM) and local (attention) processing allows them to achieve both long-term memory and local fidelity.

The paper also introduces two key training strategies to further improve long-context performance:

  • Diffusion Forcing: This technique encourages the model to generate frames conditioned on a prefix of the input, effectively forcing it to learn to maintain consistency over longer durations. By sometimes not sampling a prefix and keeping all tokens noised, the training becomes equivalent to diffusion forcing, which is highlighted as a special case of long-context training where the prefix length is zero. This pushes the model to generate coherent sequences even from minimal initial context.
  • Frame Local Attention: For faster training and sampling, the authors implemented a “frame local attention” mechanism. This utilizes FlexAttention to achieve significant speedups compared to a fully causal mask. By grouping frames into chunks (e.g., chunks of 5 with a frame window size of 10), frames within a chunk maintain bidirectionality while also attending to frames in the previous chunk. This allows for an effective receptive field while optimizing computational load.

The researchers evaluated their LSSVWM on challenging datasets, including Memory Maze and Minecraft, which are specifically designed to test long-term memory capabilities through spatial retrieval and reasoning tasks.

The experiments demonstrate that their approach substantially surpasses baselines in preserving long-range memory. Qualitative results, as shown in supplementary figures (e.g., S1, S2, S3), illustrate that LSSVWM can generate more coherent and accurate sequences over extended periods compared to models relying solely on causal attention or even Mamba2 without frame local attention. For instance, on reasoning tasks for the maze dataset, their model maintains better consistency and accuracy over long horizons. Similarly, for retrieval tasks, LSSVWM shows improved ability to recall and utilize information from distant past frames. Crucially, these improvements are achieved while maintaining practical inference speeds, making the models suitable for interactive applications.

The Paper Long-Context State-Space Video World Models is on arXiv

The post Adobe Research Unlocking Long-Term Memory in Video World Models with State-Space Models first appeared on Synced.

đź”— Sumber: syncedreview.com


📌 TOPINDIATOURS Eksklusif ai: OpenAI’s New AI Web Browser Is a Bit of Mess Terbaru

Earlier this week, OpenAI unveiled an AI browser, dubbed Atlas, which is built around its blockbuster AI product, ChatGPT.

“A browser built with ChatGPT takes us closer to a true super-assistant that understands your world and helps you achieve your goals,” the company boasted in its announcement.

Thanks to an “agent mode,” the browser can complete entire tasks, such as booking flights or buying groceries online, a process OpenAI engineers quickly dubbed “vibe lifing.”

It’s not the first time an AI company has attempted to shoehorn AI chatbot functionality into a web browser. Atlas joins the likes of AI startup Perplexity’s Comet and Google’s AI model Gemini, which it’s baked into its ever-popular Chrome browser.

But given early adopters’ experience with the new tool so far, OpenAI has its work cut out to justify the existence of its newfangled browser — and that’s not to mention the glaring cybersecurity concerns experts have highlighted.

And for a company that’s planning to spend over $1 trillion in the next year to build out enormous data centers to support its AI operations, it’s not exactly a confidence-inducing product launch. As The Verge reports, Atlas’ functionality leaves a lot to be desired.

“The immediately obvious problem is that ChatGPT simply doesn’t feel like an adequate portal to the web,” wrote the site’s Emma Roth, who took the browser for an early spin.

For one thing, ChatGPT’s suggestions “aren’t always relevant.” Roth recounts being provided several search results, including local news that weren’t actually local to her.

OpenAI appears to be aware of how confined this basic functionality is — by making it easy for users to revert to a far more familiar method of searching the web.

“The limited search experience is probably why ChatGPT Atlas includes a link to Google in the top-right corner of each search results page,” Roth quipped.

Other users found that Atlas is heavily restricting many websites, including the New York Times and online banking portals.

Overall, the experience appears almost indistinguishable from similar offerings, which doesn’t bode well. The browser itself is built on Google’s Chromium, an open-source browser project used by a wide swathe of browser companies, including Opera, Arc, and Brave.

“It’s basically a ChatGPT-flavored version of Gemini in Chrome and Perplexity’s AI assistant in Comet, and after a bit of early testing, seems to work about as well,” Roth wrote.

Worst of all, its flagship “agentic mode,” intended to complete entire workflows without interruption, is painfully slow.

Roth asked it to “fill up my Amazon cart with items based on my recent browsing history,” which took an agonizing “ten minutes to add just three items.”

Comet completed the same task in just two minutes — which, for the record, also feels way too slow.

“At times, Atlas struggled with clicking the correct button; it was like watching my toddler feed himself — inefficient but ultimately successful,” Wall Street Journal columnist Nicole Nguyen wrote in a write-up about AI browsers. It took 16 minutes for Atlas to “find flights for a coming trip,” for instance.

Besides an underwhelming user experience, experts have warned of glaring cybersecurity concerns plaguing the crop of AI browsers. Just this week, web browser company Brave outlined major security flaws with Comet, noting how easily it falls prey to “prompt injections,” allowing hackers to deliver hidden messages to an AI to carry out harmful instructions.

How OpenAI will address the issue with its Atlas browser remains to be seen.

“There will always be some residual risks around prompt injections because that’s just the nature of systems that interpret natural language and execute actions,” UCL Interaction Center assistant professor George Chalhoub told Fortune. “In the security world, it’s a bit of a cat-and-mouse game, so we can expect to see other vulnerabilities emerge.”

“The main risk is that it collapses the boundary between the data and the instructions: it could turn an AI agent in a browser from a helpful tool to a potential attack vector against the user,” he added. “So it can go and extract all of your emails and steal your personal data from work, or it can log into your Facebook account and steal your messages, or extract all of your passwords, so you’ve given the agent unfiltered access to all of your accounts.”

Atlas is only the beginning. The browser is an early step in OpenAI’s larger ambitions to build out an entire operating system. Earlier this month, the company introduced apps in ChatGPT and an “Apps SDK,” which allows developers to build their own apps — including “mature” ones — using the chatbot.

More on Atlas: OpenAI Announces Browser-Based AI Agent for “Vibe Lifing”

The post OpenAI’s New AI Web Browser Is a Bit of Mess appeared first on Futurism.

đź”— Sumber: futurism.com


🤖 Catatan TOPINDIATOURS

Artikel ini adalah rangkuman otomatis dari beberapa sumber terpercaya. Kami pilih topik yang sedang tren agar kamu selalu update tanpa ketinggalan.

✅ Update berikutnya dalam 30 menit — tema random menanti!