Tech Landscape #343

New AR creator tools, this week's synthetic video model, and an all-AI social network.

Jun 24, 2024

Hello!

The sun finally came out in London and I spent the weekend outside enjoying it, meaning I haven’t got time to write an intro.

So let’s get right on with it. Hope you’re well!

XR / Immersive

The annual Augment World Expo (AWE) took place this week. There were several announcements of digital tools and new hardware (VR shoes!); here are a few of the most notable.

Snap added GenAI Suite to Lens Studio.

Lens Studio 5.0 came out of Beta and brought with it an Assistant for coding help, face and full environment AI effects, 3D asset creation, and more.

ar.snap.com/blog/genai-suite-lens-studio-5.0

Enabling anyone to create new Machine Learning AR effects is really powerful. Snap has always had the best tools, but being limited to Snapchat has been a big restriction to their usefulness; with the move to make Lenses available on the web, 🤞 this is going to open them up to many more users.

Niantic launched Studio.

The visual editor for web-based XR experiences also includes a web gaming engine, and is available for free in public beta.

8thwall.com/blog/post/170524048722/introducing-niantic-studio

Niantic’s 8th Wall Web XR platform has always been impressive, but required coding skills. A visual editor with real-time previews and collaboration features opens it up to more types of creator.

XREAL launched the Beam Pro, an Android-based “phone-sized tablet” that’s dedicated to running apps in XR. x.com/XREAL_Global
You can run the XREAL glasses from your phone, but the Beam Pro has a hardware stack that’s optimised for the job, and has twin cameras for recording spatial images and video. I wouldn’t mind trying the XREAL glasses.

Synthetic Media

Runway announced Gen-3 Alpha.

The text-to-video model promises “highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art directions”.

runwayml.com/blog/introducing-gen-3-alpha/

The model is due to be opened up to everyone “over the coming days”, so for now all we have to go on are the (cherry-picked) samples on the announcement page. But with this, Luma’s Dream Machine, Kling (in China), the forthcoming releases of Google’s Veo, ByteDance’s Dreamina, Open AI’s Sora, and more, it’s clear we’re stepping firmly into the era of synthetic video.

Hedra is a new video avatar tool, with lip-syncing and video motion from a single image. x.com/hedra_labs
There are other avatar tools which use video that look better, but this is very impressive coming from only an image.
Luma’s Dream Machine gained an extend feature, adding 5 seconds to generated clips (to a maximum of around 1 minute total). x.com/LumaLabsAI
With competitor tools the image quality usually degrades quickly over time, but here it manages to maintain consistency quite impressively. Take a look at this example; it definitely loses some coherence, but not catastrophically.
Midjourney added style and personalisation blending by chaining reference attributes. x.com/midjourney
What I admire most about Midjourney is that it’s by far the best quality image model but is intent on pushing beyond that to really explore aesthetics.

Eleven Labs launched a Text to SoundFX API, for developers of third-party apps. x.com/elevenlabsio
As a showcase it made this Video to SoundFX app, which analyses frames of videos and generates sounds based on their content. It’s mostly for fun, it doesn’t really sync the sounds at all, it’s more vibe-y. Google DeepMind’s preview of its own video-to-audio work is really impressive; take a look at the drums example.

Also worth reading: Roblox’s Road to 4D Generative AI, “where the fourth dimension is interaction”.

Assistants & Chat

Anthropic launched Claude 3.5 Sonnet.

The update has enhanced reasoning, coding, and vision capabilities; Anthropic claims the midweight 3.5 Sonnet is better than the heavyweight 3.0 Opus. It also added Artifacts, which breaks generated content (such as code) into its own window next to the conversation, making it easier to see and track changes.

anthropic.com/news/claude-3-5-sonnet

I don’t use LLMs to a degree where I can confidently assert that any model is better than others, but chatter on social media has been very positive so far. The Artifacts feature is most intriguing; a glimpse into the future?

Somewhat related to the above, this is an excellent read:

One Useful Thing

Latent Expertise: Everyone is in R&D

AI discussions often fall into a weird dichotomy - it is either all “hype” or else the age of the superhuman machines is imminent. At least for now, that is a false dichotomy. There are areas where AI is better than an expert human at particular tasks, and areas where it is completely useless. Instead of blanket statements, we should focus on specifics…

4 days ago · 235 likes · 35 comments · Ethan Mollick

A new AI companion app called Dot launched on the App Store. fastcompany.com
It’s nice enough, but there are a bunch of competitors out there doing this already, and in my (limited) testing I haven’t seen anything that distinguishes it. It could benefit from a full voice interface, IMO; long text chats aren’t very interesting to use.
Genspark is an ‘AI agent engine’ that generates custom information pages (“Sparkpages”) from user queries. mainfunc.ai
This is very much like what Arc and Perplexity already offer, and what Google has announced; it also didn’t handle well the first few requests I gave it (Fitzrovia is not in South East London). Unless this can improve quickly and find a niche, I don’t see a future for it.

AI & Social

TikTok announced Symphony Avatars.

The new feature allows users to create and animate personalized avatars, either stock or custom, for videos.

newsroom.tiktok.com/en-us/announcing-symphony-avatars

This isn’t a new technology; tools such as Synthesia and HeyGen have been offering it for a while. But adding it to one of the largest social media platforms is going to mainstream it very quickly. All generated videos will be labelled as “generated with AI”.

In general I feel the requirement by most social networks to disclose media made with AI is reactionary, unnecessary, and destined to lead to situations where photos containing small edits with Generative Fill in Photoshop get flagged, which is counter-productive. The broad application of the label is a concession; a shield against a moral panic.

It's absurd because there’s no requirement to disclose synthetic images made with other technologies. If I make an image that's intended to mislead, what does it matter if I make it in Midjourney, Unreal Engine, or Photoshop? YouTube gets this right, requiring disclosure only

when content a viewer could easily mistake for a real person, place, scene, or event is made with altered or synthetic media, including generative AI.

Symphony Avatars clearly fit this definition.

Butterflies is a new ‘AI social network’ where you create various “butterflies” (synthetic personas) and they all interact with each other. x.com/butterflies_ai
I’ve been trying this out for a few days and I don’t really get what it’s for. Maybe it’s not intended for me, which is fine; I also don’t understand the attraction of things like character.ai, but they’re very popular (this interview with the founder makes reference to that).
Messenger is adding AI chatbots for businesses, and paid marketing messages. facebook.com
Chatbots are back! It’s 2017 all over again! Except, maybe, this time the technology can live up to the promise. But I doubt this is coming to Europe imminently; Meta has delayed bringing its AI tools here following intervention from the EU.

Messaging & Social

YouTube is experimenting with ‘community notes’ for users to add contextual notes and fact-checking to videos. blog.youtube
Community Notes is the best innovation from X (Twitter) in years. There’s an argument that it delegates labour to volunteer users rather than to the platform, but you could say the same thing about Wikipedia.
Threads launched its API, for third-parties to view, publish, manage, and measure their own content. developers.facebook.com
The API should enable more brands and creators to cross-post to the platform, as well as adding tools like post scheduling. It doesn’t allow for building rival interfaces, though.
Instagram Live streams can now be for Close Friends only. threads.net/@metanewsroom
ByteDance launched Whee, a new app focused on private photo sharing with close friends. androidpolice.com
Currently in testing on Android in select countries. This looks quite a bit like Instagram, and quite a bit like TikTok Notes. Is this just experimentation, or is it ByteDance hedging its bets in case of an eventual ban in the US?

Also of note: the Governor of New York signed a bill that bans “addictive feeds” (in other words, algorithmic recommendation feeds) for social media users aged under 18. Is there any evidence of harm? No. But we’re in the middle of a moral panic, and if you keep repeating the word “doomscrolling” enough this is what happens.