Tech Landscape #359

An abundance of new and updated AI models; a peek into the future; and how synthetic podcasts conquered the world.

Dec 09, 2024

Hello!

As you read this I’ve been away for the weekend, but I’m writing it on Thursday evening — hopefully there weren’t any massive announcements on Friday, because if there were, they’re not in here.

But this is a bumper edition, with lots to cover and little time to do it in, so please excuse the brevity and inevitable sloppy typos.

And with that said, I’m going to get right on with it. Hope you’re well!

a distance shot of a man walking through a London park at night, illuminated by street lamps. it is early winter and the trees are bare. a strong wind is blowing fallen leaves, and the air is damp and drizzly. the wide-angle shot emphasises the small figure of the man and the solitude of the evening. — That time of year. Generated with Midjourney.

Synthetic Images & Video

Amazon announced Nova

The family of AI foundation models includes three versions (Micro, Lite, and Pro) of a text model, some with multimodal (image, text, and video understanding) capabilities, and the Canvas image generator and Reel video generator.

aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws

There are more models coming to the family next year. All of these are largely aimed at developers and businesses, and I haven’t got my hands on them to try them yet, but of course they claim best-in-class performance.

Google released a “private preview” of Veo, its video generation model. cloud.google.com
This is only available by permission, so as with Amazon’s Nova tools I haven’t been able to test it.
Runway announced Frames, a new image model offering high quality and stylistic consistency. runwayml.com
The (cherry-picked) examples look impressive, but it hasn’t rolled out yet so it’s one more thing I haven’t tested. This isn’t the most auspicious start to a newsletter, is it?

Content Creation Tools

Luma revamped Dream Machine from a simple video generator to a suite of creative tools. instagram.com
It comes with Photon, a new image model offering quality and detail at very high speed for fast iteration, and will soon also feature Ray 2, a video model that will also be available on Amazon Bedrock.
Stability AI released ControlNets for Stable Diffusion 3.5, enabling Canny (edge) and Depth image guidance, and upscaling. stability.ai
Kling added AI Try-on, to apply new outfits to characters. instagram.com
I’ve tried this! It’s pretty impressive. Funny that AI try-ons is one of the first (almost) solved problems of generative AI; high street fashion brand Mango is already using it.

Runway added Character Reference to Act-One, enabling re-dubbing and new facial performances in existing videos. x.com/runwayml
This is quite neat. For example: if you shoot a video and decide you need a different take of a line you can just get the actor to re-record their part without having to do a full re-shoot. Here’s a silly test I did very quickly:

Hailuo added I2V Live, an image-to-video model specialising in 2D animation. x.com/Hailuo_AI
I’ve tried this too! But I’ll have to show you next week.
Kaiser released an improved version of Video Restyle, using an image to transform the style of a video. instagram.com
Haiper added a keyframe timeline for fine control over generated outputs. x.com/HaiperGenAI
Tencent released the Hunyuan video model, the largest open source model to date. aivideo.hunyuan.tencent.com
The examples I’ve seen look impressive. It’s available to try, but apparently very slow and will need some tuning before it becomes properly usable.
Nim added camera controls for video generation, and released the model as open source. x.com/nimvideo

This week we got a couple of little glimpses of the future of generative AI: world models. Put another way: turning a single image into a navigable 3D scene. Put yet another way: the holodeck (v0.1).

First came a showcase from World Labs, a new startup lead by the estimable Fei-Fei Li. This demonstrates the basic concept with some hands-on demos, and videos that are definitely worth taking a few minutes to watch.

Then Google DeepMind showed off Genie 2, which takes the concept even further, adding objects, characters, and gameplay into the mix.

David Holz of Midjourney has previously said that the company is working on “the holodeck", and expects to release something similar in the near future.

It’s clearly still very early days, but there’s something in this; if it works as promised it’s going to change creation workflows in a lot of different fields.

Synthetic Audio

Eleven Labs launched Conversational AI, a platform for building interactive voice agents. elevenlabs.io
This would have made it so much easier to build our recent “AI Granny” campaign for O2.
Eleven Reader added GenFM, which generates a ‘podcast-style’ summary of a provided article or document. elevenlabs.io

Eleven Labs’ GenFM is clearly influenced by the Audio Overview feature in Google’s NotebookLM, which must be one of 2024’s breakthrough technologies — it’s inspired several similar features, and even made its way into Spotify Wrapped 2024.

While GenFM is impressive, it’s not quite up to the same quality as Google’s tool; it’s a little too perfect, less ‘human’. Check out this very interesting snippet from an interview with the author Stephen Johnson where he talks about how the NotebookLM team added ‘disfluencies’ and vocal patterns to achieve higher realism.

NVIDIA teased Fugatto, a cutting-edge generative audio model. blogs.nvidia.com
No indication of when it will be released, though.

Assistants

Anthropic’s Claude can now reply in a variety of written styles, including one trained on your own style. anthropic.com
OpenAI brought the ChatGPT o1 model out of preview, with improvements to speed and reasoning ability. x.com/OpenAI
There’s also a $200 per month Pro subscription, if you’ve got deep pockets.
X added a Grok button, for AI context on individual posts. x.com/TheGregYang

Social

Threads released more new and updated features: a redesigned tab bar to show all feeds; advanced search, enabling filtering by author or time period; and another ‘fediverse’ update, enabling users to follow people from other (non-Threads) servers. It’s also testing per-post analytics.
Coincidentally, Bluesky now has 24 million users.
Instagram added new features to Broadcast Channels: (optional) user replies, conversation prompts, and metrics / insights. about.fb.com

Metaverse-ish

Roblox launched Party, enabling groups of up to six friends to join experiences together, with text chat available to those aged 13+. corp.roblox.com
Roblox now offers 25% more Robux, when purchased outside of mobile app stores. x.com/Roblox
The 30% app store tax is a curse and I’m unhappy that businesses have to work around it.
Microsoft is discontinuing Xbox Avatars from January 9 2025. eurogamer.net

Everything Else

Xreal launched One and One Pro MR glasses, with high resolution and low-latency video displays and spatial anchoring. prnewswire.com
While everyone is looking at Apple and Meta, Xreal is quietly plugging away to create what are (apparently) genuinely impressive MR glasses.
Humane is spinning off CosmOS, the operating system that powers the AiPin. humane.com
The AiPin is a flop; I’m not sure the OS behind it is particularly attractive to developers, but maybe.
Nike is shutting down RTFKT, the NFT sneaker platform it purchased back in December 2021. x.com/RTFKT
I never really got it; but then I never really got anything about the NFT bubble.