Tech Landscape #420
An image model with a logical brain, ChatGPT gets brainier and less cringe, and perhaps the metaverse isn’t over already?
Hello!
As I write this I’m in a hotel room in Prague, Czechia, with possibly the heaviest cold I’ve had in years, and a looming work deadline. So I beg your pardon but this is going to be as brief as I can make it.
Hope you’re well! (I’m not.)
Synthetic Audio-Video
Luma Labs announced Uni-1
A unified multimodal (image and text) model with a ”logical brain” that uses structured internal reasoning for an understanding of time, space, and motion, promising character and scene consistency even across style variants.
I’m very curious to try this. On the surface it seems like Google’s Nano Banana, but the spatial understanding is intriguing; my article The Future of AI Images (previously only for paid subscribers, now available to everyone) is about how physics and causality are currently lacking in image (and video) models, and Uni-1 appears to be an attempt at mitigating it.
It will launch soon as the engine of Luma’s new Agents feature, an AI orchestration platform that uses various AI tools to handle a creative project from start to finish while maintaining context using Uni-1. instagram.com/lumalabsai
This is one of the futures of creation: set your goals, let the agents sort it out.
Lightricks released LTX-2.3, an updated video model featuring sharper details, improved prompt understanding and image-to-video, and support for 1080x1920 portrait video. It’s available under an open source license, and there’s a free LTX Desktop video editor available for MacOS and Windows. ltx.io
It’s not great (to put it mildly) when used on its own, as you can see in the few tests ⬇️ I ran; visual quality is decent, but motion and physics are way off. But as an open source model it can be combined with other tools and workflows to improve or extend its output. Companies seem to be backing away from releasing open source video models, presumably due to the cost of training them, so it’s good to have a company still committed to it.
Utopai Studios announced PAI, an AI model “built for long-form video storytelling, with a focus on continuity across scenes, narrative structure”. utopaistudios.com
A gaming startup turned AI content studio is now releasing its own video model. I’ve no idea how good this might be; I’m on the waitlist to find out.Kling released VIDEO 3.0 Motion Control, a full-body motion transfer image+video to video model, that now lets you upload image references of a face from multiple angles for improved consistency. app.klingai.com
I ran a couple of quick tests ⬇️ and the motion and camera tracking are very good, lip-syncing a little less so. But I need to spend some more time with it.
Viggle launched V4, adding 3D world architecture to its motion-transfer model for more complex high-speed actions and improved character stability. viggle.ai
The original Viggle was a real ‘wow’ moment; now it sort of feels like a feature that a lot of other models could copy, and have copied.MiniMax unveiled Music 2.5+, adding an instrumental mode to its AI music model. minimax.io
Creative Tools
Krea added Voice Mode to its iPad app. instagram.com/krea_ai
This is one of the other futures of creation: sculpting with natural language.Grok Imagine can now extend videos from any chosen frame. x.com/grok
Autodesk Flow Studio added Wonder 3D, that generates fully editable and textured 3D assets from text or image prompts. blogs.autodesk.com
Metaverse-ish
Pico announced OS 6, an upgraded XR operating system with redesigned UI, cross-platform mixed 2D and 3D apps, and support for many types of input including ‘look and pinch’. It also teased Project Swarm, a new flagship headset with 4K low-latency displays. youtube.com
I honestly thought that ByteDance had abandoned Pico; the XR headset market isn’t exactly prospering (has Samsung even released the Galaxy XR anywhere outside of the U.S. yet?). But moving to a cross-platform (openXR, webXR, PCVR, Android) OS is a shrewd move.
Roblox added real-time chat rephrasing, filtering profanity into more respectful language. about.roblox.com
Google announced global Play Store changes, opening Android to third-party app stores and alternative payment systems with reduced developer fees. android-developers.googleblog.com
This was driven in large part by the dispute with Epic Games, which is now settled. The two companies are working together on ‘metaverse browsers’, whatever they are.8th Wall’s WebAR engine transitioned to open-source, making its core architecture and augmented reality feature modules freely available on GitHub. 8thwall.com
Social & Messaging
X launched Creator Subscriptions 2.0, adding Exclusive Threads that are locked to subscribers, plus tools for promotion and management. x.com/XCreators
X is launching a standalone Chat app, available to test on iOS. x.com/mjboswell
If Musk’s concept for X is “the everything app”, why launch a separate app for this?X is testing its Money service by letting William Shatner auction exclusive beta invites. techcrunch.com
Mastodon added a new Share Button, making it easier to post external content into the platform. blog.joinmastodon.org
I say “easier”, but it’s still a little more complex than sharing to literally any other social platform.
Assistants & Search
OpenAI released GPT-5.4 Thinking, with a focus on agents in professional workflows (especially finance) and built-in “computer use” capabilities for interacting directly with software, and GPT-5.3 Instant with improvements to everyday conversation (“less cringe”).
Google added Canvas to AI Mode in Search for all users in the U.S. to “draft documents or create custom, interactive tools right within Search”. blog.google
AI Mode is Gemini in disguise.Google’s NotebookLM added Cinematic Video Overviews to transform research notes into fully animated narrated videos. blog.google
MiniMax Agent added MaxClaw, a personalised autonomous agent in the mobile app. x.com/MiniMaxAgent
Qualcomm announced Snapdragon Wear Elite, a processor platform designed for AI-powered wearables with on-device computer vision and voice interaction. qualcomm.com


