Tech Landscape #342

Cranking up the Dream Machine, the hope of open image models, a possible exclusive(?), Meta adds fuel to my theory, and more.

Jun 17, 2024

Hello!

I’ll tell you what, it was quite interesting (considering recent discourse) that Apple said it trains its AI models on data scraped from the public web and nobody batted an eyelid.

Anyway, I’ve written quite a lot of context and reaction in this edition and I’m all out of energy, so let’s get right on with it.

Hope you’re well!

Synthetic A/V

Luma launched Dream Machine.

The new video generation model can quickly (around two minutes) produce up to five seconds of high-quality video, with movement and physics that beat most rivals.

twitter.com/i/status/1801116605457289345

This is a step beyond existing services such as Runway and Pika, and is arriving at the quality of Open AI’s Sora (from what we’ve seen of it). It works especially well when using images in the prompt. Luma was previously known for its 3D spatial rendering (using NeRFs) software, and this shows in Dream Machine’s generations of physical spaces, as you can see in this ⬇️ video.

I would add that, as impressive as this looks, I still think we‘re some distance away from generative video being used in narrative stories; it’s hard to direct, to guarantee quality, to maintain consistency between generations, and to make small changes. In order to get usable results you need to work within its strict parameters.

Kaiber added Motion Brush, to control the output of generated video. threads.net/@kaiber.ai
Krea’s Video Enhancer now supports higher resolution and 120fps, and is open to to all users. x.com/krea_ai
Suno added Audio Input, which extends user-uploaded sounds into full songs. x.com/suno_ai_
As with Udio’s version (in last week’s newsletter) it could end up being really useful for musicians who want to turn a hook or loop into a full song and experiment with different styles quickly. But expect legal challenges.
Suno also launched Radio, which uses song similarity to create an infinite playlist. x.com/suno_ai_

Synthetic Images

Stable Diffusion 3’s open model is now available.

The Medium version of the model, which optimises for speed and performance and runs on consumer hardware, is free for non-commercial use, with the price increasing for creators and businesses.

stability.ai/news/stable-diffusion-3-medium

Here’s the reason this is important: Stable Diffusion is a decent enough model, but more critically it’s an open model. Previous versions have been released for anyone to use; that has directly lead to incredible experimentation and innovation from the synthetic art community, and is the foundation of several businesses (such as Leonardo ⬇️) which customise and fine-tune the foundation model to their own needs (and quite often the fine-tuned versions are better than the original).

Without Stable Diffusion, every image model would be owned by companies which restrict its use; the pace of innovation would be dramatically slower.

But: it costs millions to train a foundation model, and the investors in Stability AI would like to see some of their money back. So the company is trying to thread the needle by reserving the most powerful versions of Stable Diffusion 3 for paying customers (either through Stable Assistant or its API), and releasing the less powerful but more accessible Medium version with a license that permits non-commercial experimentation.

Can it work? I hope so. Open models are necessary.

Leonardo introduced Phoenix, a new image model (based on Stable Diffusion XL) with improved quality, prompt understanding, and text rendering. threads.net/@leonardoaiofficial
Phoenix is still technically in preview, and while it’s impressive in some aspects (especially prompt adherence) it’s not always an improvement on the current Kino XL model. Here’s a few quick tests I ran.
Midjourney added Personalization, which uses your preferences to influence image generation towards your own taste. x.com/midjourney
This is really interesting; it’s a step towards a new frontier of image synthesis where creators can better express their own taste even if they don’t know how to express it in words. You can see how mine manifests in the side-by-side comparison ⬇️.

Side-by-side comparison of two images of a Latina ballerina in a studio; the personalised image on the right has a softer, paler look to it. — Midjourney default (left) and with my personalised style (right).

Every time you write a prompt there's a lot that remains ‘unspoken‘. Our algorithms usually fill in the blank with their own ‘preferences‘, which are really the combined biases and preferences of our community. But of course everyone is different! Model personalization learns what you like so that it's more likely to fill in the blanks with your tastes.

Shutterstock and Databricks partnered to launch ImageAI, a new text-to-image generation model trained on Shutterstock’s image library. prnewswire.com
Getty Images and Picsart are partnering on a new image model, offering commercially-safe AI image generation from licensed content. picsart.com

These ‘safe’ models, trained on licensed data and with copyright mitigation in place, are absolutely critical to doing anything for risk-averse brands; the problem is, they’re still not good enough. That’s the consequence of using a limited data source.

Exclusive? I’m not 100% convinced it is, but I haven’t seen anyone else report it: CapCut (owned by Bytedance, parent of TikTok) is launching it’s own image and video model: Dreamina. dreamina.capcut.com

It’s currently available in China (under the name Jimeng), but is new to everywhere else. Closed, invite-only test at the moment; visit the link ⬆️ to request access.

Social AI art app Remix rolled out a major redesign, with Magic Camera for AI selfies, a choice of 20 image models, and more. threads.net/@getremixai

Social & Messaging

Posts shared from Instagram to Threads now show media inline rather than as a preview card. threads.net/@tb_99999
If you’ve been reading my newsletter for a while you’ll know that I have a theory that Meta is going to make Instagram a full-screen video-first app, like TikTok, and that Threads is intended to be not a simple X (Twitter) rival, but the new home of the Feed. I think this is more evidence of it. Have a look ⬇️ at the post I added on Instagram (right) then sent to Threads (left); I think the carousel might actually be more effective on Threads because you can see more of it.

A carousel of images in Threads, and the same carousel in Instagram — A carousel posted to Threads (left) from Instagram (right)

Follow me on Threads (@stopsatgreen) and Instagram (@techlandscape).

X made all Likes private, so you can no longer see what people get up to. x.com/XEng
This is, on balance, probably a net good as it closes off an abuse vector; but you just know that the main reason it was implemented is to hide Elon’s interactions with far-right wingnuts.
WhatsApp enhanced calling, adding screen sharing with audio, larger group video calls, and improved quality. blog.whatsapp.com
Google users in Brazil can use WhatsApp to contact businesses on Search, where they will also be able to directly schedule beauty and doctor appointments. blog.google
BeReal has been bought by games publisher Voodoo to “further accelerate its diversification into consumer apps”. blog.voodoo.io
40 million active users, €500 million valuation. I still don’t believe this has much of a future, but I wait to be proved wrong.

Gaming & Metaverse-ish

Horizon Worlds is fully expanding to mobile and Web, with the exception of worlds that are members-only or contain in-world purchases. meta.com
PS5 gamers can now join Discord calls directly, making it easier to connect with friends during gameplay. discord.com

Fortnite’s next big music event features Metallica, taking place on 22 June, with events and items available in-game now. BBC Newsround published Who are Metallica and why are they in Fortnite? which a) made me feel very old, and b) curiously didn’t mention the use of one of their songs in the Season 4 finale of Stranger Things.

The Monstercat record label is launching a Roblox experience. Monstercat is music provider to Epic Games’ Rocket League and has previously collaborated with Fortnite on in-game music and an event featuring Kaskade • Cricket’s governing body The ICC launched ‘fan zones’ in existing popular sporting experiences.

Renault launched a series of mods to put its new Renault 5 E-TECH in popular gaming titles, from the usual suspects (Fortnite and Roblox) to the more unusual (Stardew Valley, Stray).