The Golden Age of Tinkering, Reachy Mini on Qwen+Cerebras, Compute Predictions, Fuzzing Artifacts, Flatter orgs

Matt (00:01.482)
Whoa!

Wilhelm (00:03.05)
Whoa, we're back. Wait, is it recording? It must be. no, he died.

Matt (00:10.233)
Wait, what?

Wilhelm (00:12.035)
The robot.

Matt (00:15.864)
Bringing back to life.

Wilhelm (00:17.848)
Britchie, come back to life. What happened to you?

did I hit the off button? I think I hit the off button. There's a little off button on the back. He came last week and I've been playing around with, he's booting up again. It takes a while. He's powered by a Raspberry Pi.

Matt (00:29.849)
I'll turn them on.

Matt (00:38.617)
Goooo

Wilhelm (00:40.59)
It's actually really fun to assemble this. I mean, I've been looking forward to this guy arriving for months. But you actually have to assemble it yourself, and it takes like three hours or something.

Matt (00:51.843)
my god, I'm jealous

Wilhelm (00:54.776)
But it's really fun. then, yeah, you can just hack around with it. Come on, Richie, come back to life. He'll jump up in a second.

Matt (01:02.669)
Well, while you're doing that, I turned Gladstone back on again because he was broken.

Wilhelm (01:11.737)
yeah, I noticed that.

Matt (01:13.73)
Yeah. Look at this.

Wilhelm (01:16.846)
Do you agree that you're very cute, Richie?

Can you hear it? Are you here, Richie? now he's here. Hey there, ready for some fun? It's, so it's interesting. I'll... Can you hear it? I mean, it's speaking, but maybe you can't hear it. Richie didn't expect a talking robot on your desk, did you? It has some sass. I don't know if you can hear it.

Matt (01:33.005)
He has them.

Yep.

Matt (01:42.52)
Yeah, I can hear it. That's kind of weird.

Wilhelm (01:50.499)
you don't like my robot.

Matt (01:52.984)
I don't know. I don't know. I expected him to have a cuter voice.

Wilhelm (02:01.358)
Oh, I don't like his Kiwi accent. Wait, let me turn him off because he's just actually, I think I can mute him. I think I can mute him. Okay. And then he won't interrupt while we're talking. He'll just keep looking around. Maybe I can have him in the background somehow. No, I don't know if that's possible. Yeah, I really wanted.

Matt (02:11.646)
He's so funny.

He's so funny.

Wilhelm (02:25.388)
to use all the 11 Labs voices with him. So you have to build a bit more yourself to make sure you can use... Otherwise you can just use the GPT real-time model or Google has one, but then you don't have as much customizability on the voices. But yeah, I don't know where to start with this. It's like, it's...

Matt (02:28.152)
Hmm?

Matt (02:40.577)
Yep.

Matt (02:48.791)
Yeah, I don't know about dogs, it's like, it's...

Wilhelm (02:51.946)
It's extremely experimental. You kind of have to build all the software yourself. even all the, well, I guess not all the software. I'm sure there's like some SDK, but let's just say even like codex 5.5 extra high running for like three hours, output something that's like barely usable. So you have to like kind of, I don't know, figure out your way around that or whatever to make, to just get to like a conversational robot that like kind of.

Matt (02:53.112)
You

Matt (02:56.759)
Okay.

Matt (03:17.687)
Next, we'll just get to the conversation.

Wilhelm (03:21.938)
bobs its head around a little bit and moves its antenna and like it kind of feels natural and it doesn't feel completely like like artificial and then I haven't even integrated the camera stuff yet that's all still to come.

Matt (03:22.612)
All right.

Matt (03:34.881)
Dude this is so cool. So when Gladstone says, Wags's tail, you could create like a, like Wags's tail emoji and you could make it as like a head bob.

Wilhelm (03:39.363)
Mm.

Wilhelm (03:47.126)
Yeah, exactly. So it has all these emotions that you can trigger and I'll show you more screen riches.

Matt (03:54.657)
So could you do it as like the, the, could you map like emojis? Cause the model's really good at like outputting emojis, like in its, in its sentences, could you map emojis to like, to its voice patterns? So it doesn't say the emoji, but the emoji triggers like, like something.

Wilhelm (04:02.19)
interesting.

Wilhelm (04:09.55)
Yeah, yeah, yeah. That sounds smart. There's some people who've built so that you can see the way it works now is like with these like angled brackets in the output. And then there's someone who set up all these different emotes because the robot, can like move its body and then it can like shift its head around like in all these ways. And then has these two antenna and it can like move the antenna up.

Matt (04:29.175)
This is so cool!

So it is a Raspberry Pi.

Wilhelm (04:35.926)
It's a Raspberry Pi inside of it, yeah. Or that's for the wireless version at least. So the wireless version has a battery and a Raspberry Pi and the wired version, I think, has no computer in it. So it's just linked to your computer directly.

Matt (04:47.764)
And so you could theoretically run like Gladstone on it, right?

Wilhelm (04:52.576)
Yeah, yeah, yeah, yeah. it's fun as well because this made me finally check out some of the open source models because I guess what you want the most.

Matt (04:59.125)
Mmm.

Wilhelm (05:07.414)
like to have a natural interaction is you want it to be really fast, like really, really fast. So even, and then even with Haiku, like even Haiku, like the latest Haiku is just like not fast enough basically to have like a normal sounding, like you say something to the robot and then it waits like three seconds and then you get the whole thing back. So I tested out a bunch of them and I ended up running.

Matt (05:11.146)
Yeah.

Wilhelm (05:31.046)
quen on cerebrus for, the inference. And then I think it's the 11Labs speech to text model on the, on the front end or on the beginning of the flow, which I think is 11Labs scribe V2, then quen on cerebrus, which is really, really fast. And actually what you care about is not even just tokens per second, but time to first token.

Matt (05:36.031)
Hmm

Wilhelm (05:56.47)
And then you have the 11 Labs text to speech model, the voice model on the other end. So that's the pipeline for this at the moment. I haven't tried Kimi yet, although I think it takes a bit longer to get to time to first token or something, or maybe the output isn't as quick. I forget. But then obviously that can do tool calling and stuff. And I think it's less conversational and more like an actual agent, which will be fun.

Matt (05:57.205)
Bye.

Matt (06:18.294)
station.

Matt (06:23.508)
very cool dude I'm actually so jealous like this like local LLM thing I don't know I feel like it's coming back a little bit with ad-thopic restricting like the harnesses that you can use the subscriptions on I think people are like re- at least I'm like rethinking the hardware that I already have in my house I'm like oh could I run could I run something on this is it possible it's like maybe

Wilhelm (06:45.761)
Hmm

Wilhelm (06:50.176)
Yeah, yeah, that's interesting. Yeah, that's fascinating. Anyway, man, wait, we need to roll the intro, and then I need to, we haven't spoken in forever, so I need to ask you how you're doing.

Matt (06:52.84)
Yeah.

Matt (07:09.654)
Dude I'm just looking your mate, can we say his name on the pod?

Wilhelm (07:17.925)
sure.

Matt (07:18.645)
the guy who's on the sleep report.

Wilhelm (07:22.75)
Lucas. Yeah, yeah,

Matt (07:23.807)
He's absolutely annihilating his 14 day average is 91.6 % and for anyone who's listening, basically it's actually insane. So I think this is a random number generator and it's ridiculous. So Will and his friend Lucas, is Lucas even on the Discord server or does he just send us his?

Wilhelm (07:45.128)
No, I told him to join, but he said he's too old for Discord and I should just keep him posted. Sorry, he doesn't want to join the Discord.

Matt (07:50.133)
Okay, so we have a Discord server. So we have a Discord server and Lucas is just sending, Lucas's agent is sending my agent Lucas's sleep data, which my agent plots out every day. Well, my agent fetches them from a centralized server. Friends.FYI. And so, yeah, so I have mine, Will's, and Lucas's sleep data. And Lucas, the absolute killer.

Wilhelm (08:10.296)
Shout out.

Matt (08:18.453)
He's 91.6 % average over the last 14 days. Like I want 84 and you dude, what have you been up to? You're an 80. You're an 80.

Wilhelm (08:18.958)
Okay, so...

Wilhelm (08:26.976)
I want... well, wait a second. At least I've been wearing my watch consistently.

Matt (08:32.238)
I have also been wearing my watch, my garment just wasn't sinking.

Wilhelm (08:35.31)
Oh, I see, Wait, did the colors change? No, the colors didn't change. Yeah, I had a couple rough nights actually. Quite personal that you know that. And I have to justify my rough nights. Don't know why.

Matt (08:36.998)
It backfilled the data. It backfilled it.

No, but I'm still blue.

Yeah.

Matt (08:51.635)
Yeah, you really do. You had a 52, which is pretty, pretty bad.

Wilhelm (08:56.098)
that was after a night out. That was after a night out. I think that was expected. That was expected. But we recovered.

Matt (08:59.335)
Yes, sir.

Yeah, and then I had a 57 like a few days later, which was also really bad. I think that was when I was an AI engineer and dude, I was just so sleepy the last day. It was really rough.

Wilhelm (09:13.144)
just living the life. No, my theory on Lucas, to be honest, I mean, this guy, he's like a startup founder, CTO kind of guy, right? Used to working like really, really hard, optimizing the numbers. And then his startup got acquired. And think now he's just looking for new numbers to optimize. So he's found this one.

Matt (09:34.74)
okay, so he's like, okay, I was like, this is you didn't tell me any of this backstory. I just thought I was being absolutely shamed by some randomer. Okay, he's like fully into this. Okay.

Wilhelm (09:38.146)
He has like an eight sleep set up at home. I think he's like, he quit drinking like years ago and he's trying to be like really healthy.

Wilhelm (09:52.131)
Ha

Well, I think he cares about his sleep score, yeah. And that's why he was so keen to join.

Matt (09:57.647)
Okay, I thought I was being shamed

because like an average like he's getting like 94, 95 all week that's pretty intense.

Wilhelm (10:08.428)
Yeah, that's pretty good. I think he's also, you know, one thing I noticed is when I do a little bit of training, I sleep better as well. So like when you, I mean, it makes sense, right? That like when you physically exhaust yourself, then you sleep better. So I've...

Matt (10:10.238)
It's really good.

Matt (10:15.699)
Hmm.

Matt (10:20.376)
Yeah.

That's not true. That's not true. If you go too hard, you have much worse sleep.

Wilhelm (10:30.316)
I mean, okay,

Matt (10:31.773)
because you spike your courses or levels and then you end up with like funky sleep hands.

Wilhelm (10:37.346)
But this is if you exercise too close to going to sleep, right? This is like the...

Matt (10:41.062)
No, no, no, no, no, no, no. This is like, if you do like a big triathlon training camp or whatever you, whatever you feel like you're doing and you say you ride safe, like for argument's sake, you ride some ridiculous like, I don't know, 600 kilometers in a week and you ran like, I don't know, 50 K or something ridiculous. So something like really, really big numbers, like probably pro athlete numbers. If you went from zero to that guaranteed by the end of that week, you're not sleeping properly.

Wilhelm (11:10.796)
Hmm, I mean interesting. feel like just for me, whenever I did all the, I used to do way more triathlon training, like something like 15 hours a week or something, and I would sleep like a baby every night, like no sleep problems at all.

Matt (11:21.907)
Yeah, but 15 hours is okay. I think if you try to do like 50, I think of training you would...

Wilhelm (11:25.943)
Right.

of training. I mean, think even pros do like 20, 25, right?

Matt (11:34.042)
Nah, they don't. Nah, they do way more. They do way more, They do way more.

Wilhelm (11:39.105)
I mean that's what the pro triathletes tend to do I think. I think the top ones, the best triathlete in the world, think he has a crazy, unusually high, but when he does 35 or something, or he does 30.

Matt (11:50.492)
Okay, but this, okay, no, no, no, so they're out.

Okay, there were, okay, so there were...

is okay this is a bit weird because like if you look at like super in like um if you look at like trail runners for instance they'll be on their feet for like a huge amount of the day like uh like kylie and johna will be like like super successful trail runner he'll go out and he'll go like ski touring for like eight hours and he won't even consider that training really he'll come back and then he'll do his like sprints so i guess like i guess that's i'm saying yes sure that's not training but like that

Wilhelm (12:11.768)
Mm-hmm.

Wilhelm (12:17.08)
Mm-hmm.

Wilhelm (12:21.486)
Sure, sure, sure. Yeah, yeah.

Matt (12:29.579)
very easily could do 50 hours of activity in one week.

Wilhelm (12:32.79)
Yeah, I mean, I think all I'm saying is basically that I've noticed for me on days where I don't do exercise, I'm not as tired so I don't sleep as well. I wake up more often in the night. And then on days where I like have done like a run or something, I sleep better. So I think that's showing up in my data as well.

Matt (12:41.955)
definitely not. Yeah.

Matt (12:50.61)
Am I looking tanned? You can't really see because I've got like such white on my background.

Wilhelm (12:56.172)
You are looking, yeah, what's been going on? Are you spending a lot of time outside? you had a week off.

Matt (13:00.531)
Yeah, I had a week off. Yeah, last week. That's why we couldn't do pod because I was like so busy. Like all of the times where you wanted to do the pod is like my evening, my afternoon, late afternoon, evening time.

Wilhelm (13:12.51)
Yeah, yeah, it's like prime surfing weather probably.

Matt (13:16.114)
a prime kiting and winging weather because the wind picks up in the late afternoon and then it's like basically around till sunset. So like now for instance there is absolutely pumping wind outside but I'm saving myself. I've a wind yesterday.

Wilhelm (13:19.341)
Hmm.

Wilhelm (13:30.03)
Brutal, brutal. Okay, wait, so they walk me through your week off.

Matt (13:31.494)
Thank you.

We got, what did we do? Did a bunch of kiting, bunch of winging, explored around Portugal a little bit, went up to Obidôs, which is like a, there's a lagoon, but there's also the town of Obidôs. It's actually a crazy town. It's really well known for two things, I think, or at least two things that I know of. There's probably more, but the two things are, a tiny town, like really, really small, like old, like fort town thing on a hill, like everything in Portugal.

It's number two things. There's a chocolate festival, which they have every year and down all down the main street, pretty much every shop will serve you chocolate, a chocolate cup with a little shot of cherry liqueur in it. It's like a thing. And the other thing they're really known for is they do this. They have this like medieval festival where they do jousting on horses. Like

Wilhelm (14:07.278)
nice.

Wilhelm (14:19.182)
That sounds great.

Wilhelm (14:27.074)
No way.

Matt (14:28.709)
proper jousting and you can rent like the outfits and like go hang out and it's apparently a big vibe.

Wilhelm (14:33.534)
No way. Okay, and so you did, and they were both on at the same time, I hope, the Chocolate Festival and the jousting while you were there.

Matt (14:38.042)
No, God no. No, no, they weren't. no. That was, it was way less interesting when we were there. But I went for the kite into the lagoon, because the lagoon is beautiful. It's really, really stunning.

Wilhelm (14:47.466)
yeah, okay, I see, Mate, you need to send me videos of you doing all this water sports stuff.

Matt (14:53.874)
Oh, I can show you a little one now. I've got a... Because I've actually just ordered some new kit, but when you come and visit... Yeah... You can come to this lagoon next to where I live. And the video is taken through a tree, but look how pretty it is! Little jibe. Look at that. That's me.

Wilhelm (15:02.894)
That's, yeah, that. wow.

Wilhelm (15:12.376)
Look at that. yeah. That's beautiful. That's beautiful.

Wilhelm (15:20.962)
That's amazing. Is that you? Wait, is that a... Are you windsurfing or kitesurfing there?

Matt (15:28.295)
That's a wing. It's a wing foil.

Wilhelm (15:31.648)
It's a wing foil.

Matt (15:34.427)
Yeah, there's so many sports, Yesterday, yesterday I yesterday I went kiting and I also went winging and I went kiting on two different boards because I went twin tipping and then the wind died off and I went on the surfboard for some waves and then I went winging afterwards. It was so fun.

Wilhelm (15:36.75)
They are so funny.

Wilhelm (15:56.032)
living the life and now you're back at work with the Postgres cap.

Matt (15:57.935)
Yeah. Yeah. Now I'm back at work. yeah. This is planet scale cap. It's actually, I think it's kind of cool. And also it's like hiding my very salty hair.

Wilhelm (16:08.718)
Mmm. Do you, you should put all this stuff on Strava, I feel like I see Juliet's stuff on Strava, but I don't see yours and that's a real shame.

Matt (16:14.265)
Yeah.

Matt (16:18.713)
She lets me into putting her stuff on show. I like, I don't know. And maybe if I do some like training. Yeah. I think that there is a point where it's, stops being like casual and it goes like training. And I think I'm going to do some of that soon because I actually have two quite big things coming up. I have a wing foil race in Portimao. Yeah. I entered a race, which is going to be my first water sports race since I rage quit in 2020.

Wilhelm (16:31.544)
sure.

Wilhelm (16:40.717)
No way.

Matt (16:48.528)
It's quite exciting. Yeah, so that's like, it's a World Series race. I'm going to get absolutely annihilated. It's gonna be very fun. So yeah, I've got that in Portemount on the 24th of May, 25th of May, like it's a couple of days in May. And then at the end of June, I've got Marathon du Mont Blanc, which is like a trail running race with two and a half meters of elevation, two and a half thousand meters of elevation.

Wilhelm (16:48.608)
No way. That is very exciting.

Wilhelm (16:59.661)
When is it?

Wilhelm (17:17.654)
A marathon?

Matt (17:19.031)
A marathon, yeah, like 43, 45k.

Wilhelm (17:22.35)
You're doing a trail marathon with 2000bhp elevation in... What did you say? In June? That's very soon,

Matt (17:26.81)
Two and a half thousand, that extra half thousand hertz. Yeah, I'm telling you.

Matt (17:33.635)
Yeah, and the dune.

Yeah, I'm quite excited.

Wilhelm (17:39.98)
Wow, I don't think I could do that.

Matt (17:42.607)
Yeah, you could. Do you wanna come?

Wilhelm (17:45.23)
bad. It's very tempting. I have actually booked all my summer travel now, and it is a bit of insanity, it's exciting. I'll be in Europe in early June and then in end of August.

Matt (17:56.079)
when you come to Europe.

Matt (18:03.407)
Okay, have I sent you my latest? So I did some vibe coding with Julia on the weekend. Did I send you it? No, I didn't send you my little piece of vibe coding. Okay, I'm gonna send you it right now and you can open it. want your reactions. Yeah, don't necessarily share the link that much.

Wilhelm (18:09.582)
Hell yeah, I don't think so.

Wilhelm (18:17.516)
my reaction.

Wilhelm (18:24.096)
as in don't read out the URL on the pod.

Matt (18:27.19)
Yeah, I mean, people can find it if they really want it, but it's a weird.

Wilhelm (18:32.159)
Hahaha!

I know that I've seen what it is. Anyway, it would be a little bit weird, Aw, that's beautiful, man. That's so, ooh, I like the fonts. I like the fonts.

Matt (18:50.22)
I thought you'd like the fonts.

Wilhelm (18:52.108)
So this is, wait, do you wanna, or am I allowed to narrate a little bit of what this is? Okay, you know what? That's funny actually. So the font is Montserrat, which is actually, that used to be, or maybe still is the SimplePol body font.

Matt (18:56.588)
Yeah, you could know, right? Good for it.

Matt (19:06.912)
Nice. It's good font. I was looking for Airbnb-esque fonts and their font is... Yeah, their font is... whatever it's called. Yeah.

Wilhelm (19:09.496)
good fun.

Wilhelm (19:15.502)
You can actually book when you want to stay with you. It's like a beautiful website. that's cool.

Matt (19:21.42)
Yeah, it's not conf-

It's full Vibecode. Juliet had never deployed a website, and so this was the first one.

Wilhelm (19:31.426)
What happened when Claude inevitably suggested, you want to deploy this? I recommend Vercel.

Matt (19:39.48)
Well, it never did that because it has my Cloudflare worker skill. But, and also we needed too much stuff that Versaille wouldn't be able to offer. We needed, okay, so it has email sending included. So it sends transactional emails, which from Cloudflare now is very, very easy. We have email sending in open beta. I think it's open beta. Yeah. And so that's pretty cool. Like you literally, from your worker, you have access to send emails.

Wilhelm (19:43.849)
I see, I see, I see.

Wilhelm (19:50.041)
Go on.

Wilhelm (19:54.295)
Nice.

Wilhelm (20:02.069)
Nice.

Matt (20:09.368)
So we needed that. What else did we need? We needed turnstile. We needed a bunch of bot protection. Yeah. Rebooked.

Wilhelm (20:13.686)
I saw that.

I was just going to book myself in for February 2030.

You

Matt (20:24.267)
Okay, so for people who obviously can't see anything of what Will's doing, he's doing a horrific job narrating and using the website. It's like an Airbnb for the spare room in my flat. And yeah, that's pretty much it. And it was fully vibe coded with a little bit of opus, a little bit of 5.5, a bunch of skills that I have off to hand and...

Wilhelm (20:31.406)
Yeah, sorry.

Matt (20:53.167)
My girlfriend smashed it. She just, she just went to town and did it, asked a few questions. Actually the hardest part of that whole situation was installing Node on her laptop. Actually the hardest part, like really annoying. Like I wanted to install it properly with like Node version manager. Um, but for some reason, like it just didn't go on the path properly. And then I just couldn't be bothered to, and Opus kept on being like, Oh no, we should just install it directly with Brew. I'm like, no.

Wilhelm (21:05.198)
Mm-hmm. Mm-hmm.

Wilhelm (21:11.394)
Mm-hmm.

Wilhelm (21:15.372)
Mmm.

Wilhelm (21:22.337)
Mm-hmm.

Matt (21:22.338)
And then I was like, this would have been so much faster if I'd have just done it myself. And then I just gave up and she was like, why can't we just do what the AI wants? So I just did what the AI wanted and now she has no 24 pinned. So that'll be fine for a bit.

Wilhelm (21:26.794)
Haha

Wilhelm (21:34.605)
Nice.

That will be fine for him. Yeah, yeah. I feel like getting Cloud Code running for the first time on someone who's not a developer or whatever, that can be an extremely emotional experience.

Matt (21:48.013)
Yeah.

Matt (21:52.791)
It was way too hard. And we started with Cursor as well because, yeah, like we just started with Cursor because I had a subscription and it was already installed on that laptop because it's my personal laptop. like, like that was even more tough to get working and just like be good. No, like nothing wrong with Cursor, but I think that the IDE is a very scary place to someone who hasn't written code. Like, did you just written some Python before?

Wilhelm (21:57.393)
yeah.

Wilhelm (22:16.259)
Mm-hmm.

Wilhelm (22:21.529)
yeah.

Matt (22:22.765)
But the ID is a very scary place if you haven't touched. Like if you're used to, even if you're used to like PyCharm and writing some scripts and then like pressing play on the scripts, like to have like a full VS code experience, bit scary. And they've like added more stuff with AI, not removed stuff. Like now you have like a massive sidebar taking up more space and like, do I allow all, do I like?

Wilhelm (22:42.092)
Yeah, yeah, it's a lot going on.

Matt (22:48.704)
What's it trying to do? It's like playing with my machine. Like nothing's working. It was real tough. So once we got it all set up, actually just removing her from the code was even better. Because like code code, just asking stuff in natural language and not even seeing the code really was like kind of better in some ways.

Wilhelm (22:55.478)
Yeah, yeah, 100%.

Wilhelm (23:07.704)
Did you consider using like Claude code in the Claude desktop app or co-work or something like that?

Matt (23:13.004)
I didn't but I probably should Think about that. That might be a good that probably would have been a much better stepping stone. Yeah

Wilhelm (23:23.934)
feel less comfortable in there because I feel like I know less of what's going on and it's like spitting up some kind of sandbox and stuff in the background and all this kind of other fancy stuff but I think it's like less scary.

Matt (23:35.66)
Yeah, that could have been cool. Even simple things like I remember I cd'd into a directory and that was kind of magic. She'd never used a terminal before and so that felt a bit scary already. That was already on the back foot.

Wilhelm (23:52.492)
Yeah, yeah, yeah. Like, what is CD? What is going on? No, this is beautiful. This is awesome. Now, big question. Is this agent-first UI? How easy is it for my clanker to buck some time? Do you have an LLMs.txt?

Matt (23:56.405)
Yeah.

Wilhelm (24:17.944)
Does this support code mode?

Matt (24:21.046)
No, no, there's nothing. There's nothing. I think. I think. Okay?

Matt (24:29.258)
I'm gonna go on a limb here and this is not because I haven't done it, but I think there are some things that are gonna stay deeply human and I want people to humanly request to join my... I think this is a website for humans.

Wilhelm (24:43.158)
Well, I want nothing to do with that then. This is the wrong podcast.

Matt (24:48.032)
Yeah man, maybe I have to just make an LMS or TXC.

Wilhelm (24:50.278)
No, no, Actually, if you wanted to be human, need to... I think the move is to authenticate with world ID.

Matt (25:00.204)
That's it. Okay. I think...

I was just about to say, I think a lot of these things add extra steps. I was just thinking about what it would take to one time use my website with an agent. And would literally take copy and paste the domain with the forward slash LMTXT and read this, make API request. nothing is authenticated, right? You can just make an API request. But then

Wilhelm (25:06.548)
You know what I mean?

Wilhelm (25:19.874)
Mm.

Matt (25:35.947)
But then it would have to have, yeah, it's like pretty tough because then it would have to have access to your Gmail inbox. It would have access to your email inbox to do the human request. And I've literally got bot detection on there so it can't be triggered. The API endpoints can't be triggered by scrapers and stuff. There was a point to this. That was intentional. And so it would have to have.

Wilhelm (25:42.722)
Mm-hmm.

Wilhelm (25:55.438)
by a, a, right, right, right, right, yeah. That was intentional, yeah, yeah, was made for humans.

Matt (26:03.947)
some way of proving that the agent would have to have some way of, this is a cool thought experiment, right? The agent would have to have some way of proving that it was, that it was like triggered by a human, yeah.

Wilhelm (26:08.504)
Hmm.

Wilhelm (26:12.524)
Well, yeah, yeah, or we're back to friends.fui, man, because this is basically like, if there's a trusted agent, a trusted friends agent, then they're allowed to maybe browse the availability and make a request, right? So it's like, do you know what I mean? So it's like when,

Matt (26:33.183)
Yeah, yeah, yeah.

Wilhelm (26:35.566)
As long as, because you don't really, like what you don't want is you don't want random bots to like scrape this, right? And to like, like you don't want spam. But you'd probably be happy for a friends agent to see the availability and make booking requests and all that.

Matt (26:52.863)
Yeah, I would. I would, I would, I would, I would. Like, I'm open to all of it. I'm open, I'm open, I'm open. I think...

I I feel like I saw another travel agent pitched on X recently, like today even. I just don't understand why I would want a third party travel agent. Surely if we all get personal agents, well, we already have personal agents, like you have Chad, I have Gladstone, like that thing can work out travel plans for me so well. That thing with a good search API,

Wilhelm (27:12.838)
yeah, I guess I'll just as well.

Wilhelm (27:21.806)
Hmm.

Wilhelm (27:26.659)
Yep.

Wilhelm (27:33.762)
Yeah, yeah, yeah, yeah, yeah.

Matt (27:34.954)
I need anything else. I'm not convinced that non-technical people won't also have this personal agent thing, whether it comes Claude wrapped or opening eye wrapped or like Chachipdew wrapped or something. I'm not convinced there won't be this consumer breakout, do it all consumer tool. We've had it for chat and it will come for agents. There will be something.

Wilhelm (27:48.11)
Totally.

Wilhelm (27:57.506)
Mm-hmm.

Wilhelm (28:03.798)
It's true, it's interesting. feel like, do you have a preferred?

Matt (28:04.627)
It's interesting.

Wilhelm (28:10.286)
search provider type thing for LLMs because I've seen it really just even so parallel.ai they just raised a new round I thought they were like the state of the art like run by the old like Twitter CEO seems like a very smart guy the website like looks really cool it has all these different like modes it looks really sophisticated and then when I actually tried it I was just like really not that impressed and it I actually never sure if this is a skill issue because

Matt (28:13.758)
because I think that really did keep up with what.

Matt (28:22.513)
You

Matt (28:29.965)
And then, I actually find it, I was like, I mean, not that impressive. I actually never thought it was just a know, that's like boring at home, because I feel like you've seen such a bit of success in your years, like Coca-Cola or something, and you just feel so good about it. And then you've got to use things like, like, like, anti-procedure, and things like that, think that's what I thought was kind of personal, and I think that's not there.

Wilhelm (28:37.932)
And I'm just like holding it wrong because I feel like you just fall into such a pit of success when you use like cloud code or something and you just feel spoiled by that or whatever. whereas then when you have to use these like much more specialized or maybe more niche things, like it just feels like you have to work harder to get the results or they're just like not there. But I mean, certain sites, so I'll give you two examples specifically, like jobs and like jobs listed on LinkedIn or whatever really hard. I found that to scrape and I don't.

Matt (28:56.49)
But I'll give you an example. Today I'm in the tree house and my job is to go and do something.

Wilhelm (29:07.856)
know who, like even this parallel thing seems to not find them. And then travel is the other one.

Matt (29:10.356)
That's because they hold the keys. That's because they hold the keys, right?

Wilhelm (29:17.142)
Like LinkedIn holds the keys, you mean? Or who?

Matt (29:18.858)
Yeah, like it's their data. They're not gonna give it to some random scraping API.

Wilhelm (29:23.298)
But the same is true, I guess, as my point for booking.com or these travel things, right?

Matt (29:27.934)
Yeah, these like gated, gated, like in the future, if all of this stuff does take off, like they will adapt or die. And by adapting, they'll probably allow, they'll allow some sort of authenticated agent access directly to their platform. And by die, some people will just exfiltrate all the data at some point, you know, like there'll be hacky workarounds until the point where they give in like,

Wilhelm (29:54.092)
Yes.

Matt (29:58.409)
If users want it, the platforms will eventually either monetize it or give it, you know, like, I don't know. Like it will, it will happen. I'm pretty, I'm pretty convinced of this or someone will change the business model so that it becomes irrelevant. It's also another option.

Wilhelm (30:07.82)
Yep, yep, yep.

Wilhelm (30:17.806)
So I think to link it back to what you were saying about the travel agent thing, I feel like this would be the struggle that it's so locked down behind using a personal agent to book travel. And what I would hope I would get from a dedicated travel agent agent, which has more, I don't know, access. It gets around booking.com better, or it gets around Airbnb better, or whatever. But I don't know if the one you saw does that.

Matt (30:31.25)
Hmm.

Yeah.

It's only, yeah, but it's only locked down as much as like the search API. Like I just think you're competing with so many people when you make this like external agent tool. Like this full external agent. Like I understand making an agent tool. Like I get that. And like, if you were just making a travel agent tool that could access travel data and you were going to give it as an MCP server, like I actually get that. Like I fully understand that as a.

small-ish potentially like bootstrap business model. What I don't understand is that the VC play of being like, right now we're gonna make, this is gonna be your travel agent and we're gonna raise loads and loads of money and everyone is gonna use this as their travel agent tool. Like I really don't buy that when it could be someone else is gonna make a connector. Someone else is gonna make a connector to ChatDBT and like it will do.

Wilhelm (31:21.358)
Sure.

Wilhelm (31:29.594)
yeah, totally with you on that.

Matt (31:39.421)
the same thing, you know, like, and maybe not as well initially, but if there is, where there's a will, there's a way, right? And better to live within the ecosystem. Yeah. Anyway.

Wilhelm (31:41.748)
Mm-hmm.

Wilhelm (31:49.612)
That's my motto.

Is, do you have a, or have you tried any of these like search, third party search providers, or do you like any of them?

Matt (32:01.352)
Yeah, I was my experience been a little bit different. I've actually been really quite happy with parallel. I played with it a bunch. My brother tried loads because I don't know if you know much about what he was doing. He was like researching people who might have made cash to sell them the yachts. Like that was that's what he's been up to. So that trying to work out like who's had wealth events and all this stuff.

Wilhelm (32:22.209)
cool. Nice.

Matt (32:28.412)
And he tried them all and he was like really stoked about Parallel. So I kind of just took him at his word and was playing with it. I was very impressed with like X's websites when they first came out, being able to like index the whole like web and find loads of like really, really niche information. But I feel like my agent doesn't really do that. My agent actually just does more is more akin to like general web searches. And that's kind of probably that's probably trained in

Wilhelm (32:34.466)
Mm-hmm

Wilhelm (32:45.475)
Mm-hmm.

Matt (32:57.865)
because that's trained because the like Claude at least is trained on web search tools so it's trained behavior

Wilhelm (33:05.003)
Yeah, yeah, yeah, yeah, yeah, yeah. Interesting. Yeah, I'll need to play around more. It feels like it must be a skill issue.

Matt (33:08.776)
Yeah.

Matt (33:15.014)
Yeah, I don't know, aren't they? What have you been up to, Since we last chatted?

Wilhelm (33:20.192)
It's a good question, Matt. I have no idea. Let me check. We chatted like three weeks ago or something, right? So surely I've done something.

Matt (33:26.792)
was ages ago. was ages ago. I, okay. A little bit of thing that I watched recently, which I think he's always a mine of fun ideas is, uh, know, Jeffrey Huntley.

Wilhelm (33:41.995)
yeah, yeah. Totally, yeah.

Matt (33:42.824)
the Ralph Loop guy, he did like a crazy long talk at a PI conference somewhere in Eastern Europe that came out recently. And I watched like part of it because he was linking sections of it on his Twitter. And that I would recommend, really interesting, like to, as in I would recommend watching some of it, like maybe not the whole thing, but definitely some of it. And he brings out a really good point about,

Wilhelm (33:51.875)
Mm.

Matt (34:13.19)
like, I love all these little takes on the future of work. I find it very fun. Like, we're getting rid of managers, each each like director level person is going to be responsible for like, 50 people because all of these people are going to have their own agency and they're all going to be responsible for the 50 agents, you know, like we're going to make much more of a the graph is going to get less deep and it's going to go much wider and you

Wilhelm (34:17.965)
Mm-hmm.

Wilhelm (34:28.867)
Right.

Wilhelm (34:32.236)
Yeah, yeah.

Matt (34:37.831)
You have to be like five steps away from the CEO because you need to have agency as a human to make things happen. I think there's some fun ideas there that I thought were just kind of entertaining. Like I was trying to count how many spots I was away from the CEO at Cloudflare.

Wilhelm (34:55.79)
How many are you away? How many layers?

Matt (34:58.842)
I think I'm...

Matt (35:04.763)
Maybe five?

Wilhelm (35:06.734)
Nice.

Matt (35:08.391)
It's not that far, really. I think five.

Wilhelm (35:11.618)
No, that's definitely, I think when I briefly worked at Microsoft, I think, or whatever, it's a weird sentence to say. I never felt quite like something I would say. I think I was like 10 or 12 away from Satya. was was deep, man. It was a very layered org truck. But mean, it's a massive company, obviously. That's cool.

Matt (35:19.367)
Whatever you're calling that part of your life.

Matt (35:31.974)
Yeah.

Matt (35:38.737)
Yeah, I think five, which is cool. like, regularly have chats for people who are like, chatted with someone today who's two away, which is like two hops, his manager's manager is the CEO. So like, there's good cross communication on the org chat. And that's what makes Cloudflare quite special actually as a company at the moment. And why we're having a little bit of a moment.

Wilhelm (35:41.09)
Wilhelm (35:54.188)
Hmm.

Wilhelm (35:57.977)
Mm-hmm.

Matt (36:06.2)
is that like stuff that C-suite wants to happen just happens. Like there's no like messing around. It just happens. We get on with it.

Wilhelm (36:10.231)
Yeah, yeah, yeah.

No, it seems like a special company, especially right now. Yeah, I last time we tried it was Agents Week. This week is Earnings Week, obviously. You always say that, like I'm trying to extract some kind of intel from you. I'm just trying to put you on the spot.

Matt (36:25.156)
Yeah, we can't talk about that.

Matt (36:31.619)
I don't want to say anything stupid and then like it ends up on the record. Yeah, that would suck, wouldn't it? That would like actually suck. speaking of being in jail for 20 years, have you seen, okay, because I'm just thinking of all the things that I've seen the last few weeks that like we weren't able talk about. Sam Bagman Freed, have you seen like his like what would have been his investor, his portfolio? It's insane. Like he had like.

Wilhelm (36:37.804)
and then be in jail for 20 years. You know, just like that.

Wilhelm (36:53.954)
Yeah.

Wilhelm (36:57.972)
Yeah, it's wild. The poor guy.

Matt (37:02.542)
he had like a couple of percent or something off cursor initially and this like 60 well there's $60 billion acquisition by SpaceX potentially at the end of the year or 10 billion if they don't get acquired which basically funds all the compute they would have spent like

Wilhelm (37:07.979)
Yep.

Wilhelm (37:19.214)
Yeah, I heard a rumor that it's basically a confirmed acquisition and it's just that the SpaceX IPO is in June and therefore they couldn't make it an acquisition because it would mess up the IPO paperwork.

Matt (37:26.566)
Let's make that back to you though, if it's true.

Matt (37:35.558)
Okay, so I found the numbers all a bit weird, if we want to talk about numbers, because the idea was it was like 60 billion at the end of the year, which obviously is an insane outcome for the curse of people like well done. like

Wilhelm (37:51.552)
Yeah, it's like, what is happening? I mean, GitHub was acquired for 8 billion. That's like my one reference point. I mean, that's wild.

Matt (37:59.494)
Yeah, it's like a wild outcome and just like utterly like kind of crazy and seems to be driven by their rebranding of like them being able to make composer two, which is like a rebrand of the Kimi model with a little bit of post training. We're a little bit like fine tuning at the end and some other stuff like, like, people have just kind of woken up to realize that cursor probably has the best RL training set ever. Like

Wilhelm (38:27.214)
Mm.

Matt (38:28.473)
they just have all of these traces of people playing with cursor. So outside of the big labs, they probably have the best one, which is kind of, kind of mad and like makes perfect sense for SpaceX, like XAI to want to acquire that because they don't have the best one. They have like people atting grok on Twitter, which is great, but like it's not super helpful for generating code. Yeah.

Wilhelm (38:31.82)
Yeah. Yep.

Wilhelm (38:36.768)
Outside of the big labs, yep, yep.

Wilhelm (38:46.53)
Mm-hmm.

Wilhelm (38:50.04)
That's something.

Also, just for context, Cloudflare market cap is $85 billion. And it was not that like two weeks ago. is like, SpaceX is acquiring like a cursor at $60 billion compared to Cloudflare at $85. It's kind of wild.

Matt (39:07.62)
borrowing like 60 billion compared to 2013 by.

Yeah, yeah, like, it's insane. But the numbers are also really wacky. so the idea is that if they don't get acquired, SpaceX will give them 10 billion, which basically covers their compute requirements that they're going to spend for this year.

Wilhelm (39:31.458)
Mm.

Wilhelm (39:35.34)
Yeah, wow.

Matt (39:36.344)
So this is also insane. This was the bit that I found even more insane because the idea was that if they do get acquired, then the compute they'll get for free because like, but they're starting to use the compute from now. And the idea is like, once they get acquired, but if they don't get acquired, then SpaceX will basically like give them 10 billion of which I guess they'll have to pay most of it back. Like it will be like a null and void transaction. They'll just have had access to compute for the year.

Wilhelm (39:38.528)
Yeah, yeah.

Wilhelm (39:49.176)
Right.

Wilhelm (40:00.462)
Mm-hmm.

Matt (40:05.867)
That was the way I heard it. And if that is the case, that means not only is there CapEx expenditure like 10 billions worth of compute in like less than one year, they're getting acquired potentially for like, say if it's 10 billion for the next like nine months, like maybe it's something ridiculous, like 100 billion, sorry, something ridiculous like 15 billion for the year.

Wilhelm (40:08.012)
Yeah, interesting.

Wilhelm (40:16.344)
Mm-hmm.

Wilhelm (40:22.755)
Yeah.

Matt (40:33.859)
They're getting acquired for like 4x? Like 4 years worth of compute capex?

Wilhelm (40:40.51)
Right, right, right, which is also kind of insane. I mean...

Matt (40:43.009)
Which is wacky. Like that means, yeah, I just don't understand the numbers there in terms of capex even.

Wilhelm (40:45.954)
Yeah.

Wilhelm (40:50.798)
This reminds me of something, which is something I highly recommend everyone watch. There was a clip of it already going around Twitter quite a bit, but Stripe Sessions was on this past week, which is like Stripe's annual conference. And they always invite interesting people and they also do a bit of like a state of the world economy, state of the online economy kind of report.

Matt (41:13.091)
if the world economies evolve.

Wilhelm (41:17.462)
which they can kind of uniquely do because they have like such a large, think was it over 1 % of the world GDP passes through Stripe or something wild. But.

Matt (41:19.081)
That's insane.

But the next one.

Wilhelm (41:32.078)
The specific talk that I thought was really interesting was the one with Nat Friedman and Daniel Gross, who, I mean, Nat Friedman was the GitHub CEO, then they did a bunch of investing together for a while. And then now they both work at Meta doing like, or leading their like AI efforts. And there was a clip of it going around with Nat Friedman talking about how his open claw, he instructed it to like keep him hydrated. And then it would send him pictures through his home security cameras telling him that he needs to drink water now.

Matt (41:41.101)
it was.

Matt (41:59.454)
I think that needs to be about and that's the thing that I think is important. Actually, it's really important to talk about the cover. Like, like, deeply integrated, but also I think it's about the present brand as well. It's like an hour interview thing, it's a bit of a various thing. It's all made for tinkering, and they just talk a lot about what that means,

Wilhelm (42:02.252)
and then sending a nice job picture when it saw him actually go to the fridge and have some water or something. So he's clearly deeply integrated open call stuff into his life or experimented a ton. But it's like an hour interview thing and it's just in general very interesting. It's called the golden age of tinkering and they just talk a lot about what that means and...

the compute implications. And at the end, they're kind of asked a little bit for like advice or guidance. And obviously they keep it like very high level. But since we are speaking about compute, one of them said something like, if you want to buy a new computer in the next few years, just buy it now. Don't wait.

Matt (42:24.749)
Yeah. Yeah.

Wilhelm (42:45.536)
And I'm just like, wow, that is such an eerie... That's like a... mean, we already see like all the Mac minis and whatever being sold out and stuff, right? But now we're talking about like cursor buying all this compute. We've talked previously about like being very compute constrained, but that just feels like such a warning or such a like a shift that is like much more intense than I thought.

Matt (42:54.817)
But now we're talking about classifying those computers. We're talking about being very computer-strained. But that just feels like such a more...

Matt (43:10.625)
Yeah, it's like all these things, the... People have a very knee-jerk reaction and suddenly they've realized that agents probably need some type of sandbox, or at least the appearance of a sandbox computer to do a lot of tasks. Like, we have our... I have my agent running on Gladstone.

running on my Raspberry Pi. the reason for this is it's like super easy to make an agent that runs on a computer that's always running. Because I can just write to a file system, I can do lots of stuff and that agent basically just lives in that computer and it's absolutely fine. But as we're...

Wilhelm (43:47.288)
Mm-hmm.

Matt (43:56.546)
And when you think about that and when you you try to start thinking about numbers and you're like, my god If every person every working person doing knowledge work has six of these running all the time We are like, I don't know million X saying or something ridiculous like our current compute requirements just for just for CPU let alone like a Ln inference and like this I've heard this like argument made and this argument

Wilhelm (44:18.936)
Just receive, yeah, yeah, yeah.

Matt (44:25.334)
I think this is like so much stuff, but it's like, it just doesn't make any sense at all in like a few different ways. Like you get economies of scale when you bring, when you bring most people, when like, instead of me having a computer here on my desk, what happens if we re-architect the agent to use cloud computing, like paradigms that already exist that are mega scalable. So for instance,

We re-architected it to be able to have multiple connections at once, have sandboxes that spin up, spin down. We don't need persistent connections to an agent to have it running and doing ambient work. We have schedulers in the cloud. We have isolates, a great example. An isolate can be spun up in a millisecond. You're basically running some JavaScript and it can write to...

database or some sort of workflow scheduler or something like that. these applications become much more complicated because because they become distributed systems problems, but they are like many, many powers of 10 less computer intensive because suddenly like now you can have millions of people sharing the same machine that's the same size of my Raspberry Pi because they're not all using it all the time. And so you have a lot less wasted CPU cycles. Like that's what happens when you move to the cloud and you start bringing together everyone's compute together.

Wilhelm (45:22.958)
Mm-hmm.

Matt (45:50.897)
At the moment, we have the lazy knee-jerk reaction because we have two sides of it. We have the lazy reaction, which is people need compute. Let's give them all persistent machines because we don't know what agents are going to look like in a few years. the cloud thing is going to happen. At Cloudflare especially, have the agents SDK and not pitching for that, but you can run agents on the cloud. You can run billions of them.

Wilhelm (46:15.694)
Mm.

Matt (46:19.103)
and like you can just run them, you know, and the compute requirements are way less than if I had a billion Raspberry Pis.

Wilhelm (46:25.87)
So yeah, okay, I think I follow, but okay, so you think, do you think that the compute required, like the cloud or like the CPU needs of the world will go up less than what you think Nat Friedman thinks? Or you, like, okay.

Matt (46:44.96)
They might, yeah, exactly. They might 10x, they might 100x potentially. They might even 1000x, but I don't think they will million or billion X in the next few years. And I think all of these things, these agents are very, very simple in what they need to do. You need to have some agent loop that gets triggered on some sort of trigger.

You need some way of reading and writing to external storage, potentially something like an S3 API or a database. Like they are just web applications that we've always had in the past. They're a little bit more like ambient and triggered on. And a lot of the times they don't even need a UI. Like it's not mega complicated. You just need to work out. It's just a distributed systems problem. like, initially I think Cloud Code has done us a massive disservice here by being this like agent.

Wilhelm (47:25.358)
Mm-hmm.

Matt (47:42.624)
plus tools together because now like people don't separate the sandbox from the harness. Like the harness can be somewhere else. The sandbox doesn't even have to be real, they doesn't have to really be a real sandbox. It's very rare, I think in the future. I don't know if it's very rare, but I agents will spin up sandboxes when they need them. And know that is happening, that is coming, that is already here in some cases, I think.

Wilhelm (48:05.358)
Mm-hmm.

Matt (48:11.489)
If you think that every human is going to have a thousand machines running just to service their like a card or orders or Tesco's orders or whatever, like I think that's dumb.

Wilhelm (48:19.502)
Yeah, yeah, No, that makes sense that we can make it more efficient. And I think, I feel like you probably.

talking in part about like Sunil posted this chart, right, of like what he's building in terms of like you can go up and down this like abstraction ladder based on what your agent actually needs in terms of capability. Like some agents don't need a file system, some do, some need a browser, some do, and like depending on what you need, you can get it done in like some tiny isolate or in some full-fledged operating system.

Matt (48:41.952)
Yeah, yeah, yeah, and I'm very, I'm very interested in like that you said this is like the golden age of tinkerers, like, I don't know if you were quoting, but I'm very interested in that method, that idea and methodology as well. I think there will be many more people that try and have like ambient running machines that

Wilhelm (49:03.788)
Mm-hmm.

Matt (49:11.582)
they program agents on because we have agents that can program agents and like, it's so fun to automate a bunch of workflows and like this, there's so much stuff that it's actually gonna be really hard to make a product out of it. Like you just wouldn't want to.

Wilhelm (49:24.192)
Yeah, that's interesting as well. Yeah, I agree. Like, I'm not sure. Yeah, that seems a little blurry.

Matt (49:28.8)
There's so much crap like our sleep thing, like we just did it because we could, you know, like, yes, sure. It, we got the idea from several products, but like, would you have a product just for that? Nah.

Wilhelm (49:33.302)
Yeah.

Wilhelm (49:38.894)
Like would you sell it to anyone? Like yeah, not really, yeah.

Matt (49:44.787)
But maybe you would sell the whole idea of the agent box. Like as a dev system. Like Raspberry Pi probably should ship, like if I was Raspberry Pi, I'd be trying to work out how they could ship like a reference agent pre-installed into their operating systems. Because that's super cool. Like that's like the hackable dev idea. Like you're teaching people about agents by just...

Wilhelm (49:49.342)
Yeah. Mm-hmm.

Wilhelm (50:00.972)
Mm-hmm.

Matt (50:10.622)
shipping them a pre-built one and then they can add their own connectors and all this stuff and maybe they need a bit of a hand to do that but it will come.

Wilhelm (50:18.926)
feel like I've been reluctant to use other people's agents. I'd rather they give me like a skill and then I can use my agent setup. But...

Matt (50:27.73)
Yes, but that's because a lot of agents are like pretty grim. think Pi has done really well here and like they have built the full agent loops and not just, and like they started with the full agent loops. They started it with like, like with extensions and all of this sort of stuff. But the idea is that it's very lightweight and they haven't added loads of stuff. And so if you ship an agent that's based on Pi,

Wilhelm (50:31.138)
different people.

Wilhelm (50:36.588)
Mm-hmm.

Matt (50:57.626)
like your actual extensions to py and your extra tools or whatever you're looking at like a thousand lines of code

Wilhelm (51:05.282)
Yeah, yeah, or that's quite compelling, I agree.

Matt (51:09.266)
That's what I've done, like super easy.

Wilhelm (51:12.022)
And even as a user, can just like write that or like read the prompt or whatever. And then it feels kind of customizable.

Matt (51:19.346)
Yeah, you can ask your agent to explain itself. Yeah, it feels custom. But if it's using PyAgent, it's like, Py does the heavy lifting in like, yeah, Py does the heavy lifting. Yeah.

Wilhelm (51:30.422)
It's not that custom. Yeah, yeah. I'm in the process of rebuilding Chad, making Chad V2. There's a couple of ways in which it's really broken and there's like so many ideas for improving it. I also spent a bunch of time, I guess the step one of this was cleaning up a little bit like my current setup because it was just like...

Matt (51:44.274)
Mm-hmm.

Wilhelm (51:55.662)
It worked when it was great or whatever, it just had all these... I started getting a little bit worried about security, in part because of the mytho stuff, but also just because I think our mental model about security needs to change. I don't know if you're a different mental model, but think pre-AI, the rough mental model was like... There's two major ways of getting attacked or getting hacked or whatever.

Matt (51:55.989)
I think my heart can't handle it. I kind of want to be like, I felt like I didn't love those very fast security. And a lot of are like mental and stuff. And a lot of people just think that I think like, oh, I don't know, mobile fast security, we kind of need to change. I don't know, they give it a little something, and I think it's time to be real. So I'm not going to lose my head. I guess I'm going to.

Wilhelm (52:23.764)
One is like a sort of government sponsored zero day, where it's like someone pays $10 million, finds some kind of deep flaw in your operating system or browser, and they can just pwn you. But you're not valuable enough to burn $10 million. Like there's nothing on my computer, there's nothing in my data that would be worth that. So this would be relevant if I was like some kind of spy or like a journalist or some kind of country, leader of a country or whatever.

Matt (52:41.886)
So, definitely one of the types of like, something that might just fly, like a journalist or some kind of country view, country or whatever. And that's like the long-term hopes to get back at them and do best against them. But it's not really something that you have to worry about. At least as second category of like, last-deck choice, which is like, you were fishing in packs, you were like, out of clinic, like, getting down on a a sub-clinic, and then out of clinic, and it's like,

Wilhelm (52:51.394)
And that is like a hard class of problems to attack again, to protect against, but it's not really something that you have to be that worried about. That leaves kind of the second category of like mass exploits, which is like, you're phishing attacks. You're like, clicking on a link and downloading something and then running it. And for that, usually if you just have a little bit of, understanding of how computers work, you can kind of spot these and protect yourself against them. And of course they can be very, sophisticated ones. And I wouldn't pretend that I'm like,

Matt (53:09.63)
And that's really just a couple of other questions that I think we have from Chris. And of course, they can be very, sophisticated ones, and I couldn't defend that one. I'm but I think, know, it's not long enough. I think it's like, know, it's really something that's being like, oh, okay. But the now, right, it's a problem pretty often, but not today, I think that's the perfect way, but much, much more cheaper. And...

Wilhelm (53:20.492)
worried that I could spot all of them. But it's like you have to really do something to be hacked in that way. But the problem now, right? So that was pre-AI though. But now post-AI, the problem I think is that you can get zero days just much, much, much more cheaply. And we've seen a bunch of these supply chain attacks recently. We've seen a bunch of these things recently where there's just vulnerabilities that are

Matt (53:38.311)
We see the possibility of supply chain and actually seeing the possibility of things recently. But yeah, it's just like vulnerabilities that are findable by people with ambition and they can't depend on people. I think the most security mental health is just kind called a block.

Wilhelm (53:47.818)
exploitable and findable by people with ambition and AI. And then they can just hit a lot of people. So I think the security mental model kind of needs to shift. And I started locking down my stuff a little bit. I'm still kind of in the process of it, but that was step one of rebuilding Chad. So I deleted all my secrets from GitHub and all my private data that was checked into the repo and reorganized that a little bit. I'm like tail scale maxing everything.

Matt (54:01.649)
Yeah, that's good.

Wilhelm (54:17.334)
A bunch of servers I used to run internally that are kind of on the public internet, like no longer having them on the public internet, but just having them on tail scale. I think there's a lot more I can do with tail scale. I don't know if I can use Cloudflare Funnels or whatever it's called, but tail scale is really pleasant to use.

Matt (54:18.486)
Thank

Matt (54:23.773)
Dude, you just connected two different products. There's Cloudflare Tunnels and Cloudflare Mesh.

Wilhelm (54:35.968)
right, I meant the tunnels, not the funnels. Tailscale has a thing called funnel as well. Did I? Okay.

Matt (54:39.645)
You meant the mesh. The mesh is the new one. The mesh is the one where you can put them both on the same private network.

Wilhelm (54:49.362)
okay, okay. Yeah, that's what I use a tail scale for, I guess.

Matt (54:52.763)
Yeah, that's mesh that just got announced the agents week as in like it's it's open. It's Shout out to Thomas. There's this guy genuinely insane PM and he's just he did email sending He's done all he's done some of the coolest products in cloud flight. They don't necessarily get the Recognition they deserve he's done all of them. I think he deserves a shout out. He did email sending mesh and hyperdrive They're the big ones and hyperdrive is sick

Wilhelm (54:56.908)
It's very new.

Wilhelm (55:22.626)
What's Hyperdrive?

Matt (55:22.726)
Hyperdrive is the one way from a worker you can connect to like AWS VPC or like you can peer to like any database pretty much that you have like in AWS RDS or like in GCP or Azure or something like that.

Wilhelm (55:30.221)
Okay.

Wilhelm (55:38.754)
Wait, what does peer, does that mean like you're just in the VPC? Is that what you mean by peering or does that mean something else?

Matt (55:44.666)
Yeah, yeah, yeah. So instead of exposing your, this is super nerdy, but instead of exposing your AWS RDS to the public internet, you can connect it via hyperdrive and then your workers have binding access to your database without traversing the public internet. Yeah.

Wilhelm (56:02.767)
I see. Yeah, nice, nice, nice. So it's kind of like, yeah, that's really cool.

Matt (56:08.41)
So all the really good integration products, he did email sending and email receiving as well. Which is fucking true.

Wilhelm (56:13.912)
Yeah, that's an impressive list of things to ship. So yeah, lot of security lockdown stuff. I'm really leaning into that paradigm that I think you talked about ages ago, which is like start your agent from like one root folder and then all of your, all the stuff that you work on is in the sub folders, which I had to re, like, so my problem was I used to check in that whole folder into GitHub, right? And that worked great.

Matt (56:39.704)
You should have mounts. No, but like don't they get massive? you like JSON files get huge?

Wilhelm (56:45.984)
Yeah, yeah, they get massive. Yeah. It's like some of it is in Git LFS. There's some weird. And then also I have the setup right where like I use the syncing thing to sync it between the MacBook and the Mac mini. And then also I have all this backup infrastructure. yeah, I'm now backing up some more stuff to R2 because I don't want to, like back in Feb I almost, or I was very, very close to losing all my data because Claude messed up some R sync commands.

Matt (56:51.611)
to continue to

Wilhelm (57:13.07)
Anyway, that's all not that interesting. One of the things that I need to really figure out for Chad though is the kind of like data architecture or like the layers of the data because it does so much stuff and it has so much data. Like it has all my like health data. It has all my like work projects. I'm really curious for yours. also like I think like, yeah, I'm curious. Yeah, I want to know yours. The thing I'm thinking about at the moment is like the...

Matt (57:14.395)
you

Matt (57:29.339)
Do want another mine?

Matt (57:35.564)
Yeah, here's.

Wilhelm (57:41.962)
Android could part the like LLM wiki thing, which some of the ideas I have already. And then I think there's some other really cool ideas in there as well that I need to figure out. But yeah, sorry, go on. What's yours?

Matt (57:44.468)
Yeah, that sounds cool.

Matt (57:50.748)
Yeah. Mine is quite dumb. So mine is like separated into sort of like three, I think I have more namespaces, but we'll say three for the moment. I have like system, which is the ones that I check into Git. And so this is like skills, like a pre out the box skills. It's like the, there's some prompts there, but it's basically like system memory, system files.

And then I have user, which in here goes like user generated memories, user generated skills, even user generated apps go in here like production ones under the apps directory, it can host its own apps. Then there's like a scratch directory for it to play with code. And that's where it plays with code. I think I should just call it code directory. But that's the idea of like my Git thing is like I open everything, I open...

Wilhelm (58:27.437)
Nice.

Wilhelm (58:37.71)
you

Matt (58:47.675)
it all in the same directory. I have all my repos in the same directory and I open my agent in that directory. So I have that there. There's a few other user bits. There's some user defined variables. There's a variables file in there. But yeah, there's a username space. And then there's like a runtime level namespace. And that has the sessions.

Wilhelm (59:06.925)
Mm-hmm.

Matt (59:14.683)
so like the JSON files for every session. And also it has SQLite database. So the SQLite has just the messages, so the user messages and the system messages, and the sessions obviously has the full tool calls and tool responses as well. So that's why I sort of store those in two different places, like I have both of them. then because SQLite, could do search over previous sessions.

Wilhelm (59:15.278)
Yep, nice.

Wilhelm (59:32.556)
Yeah, yeah, yeah.

Matt (59:43.482)
previous stuff and then if I want to dig down to it more, it can open the relevant JSON file and find the stuff. And like, this is all talked about in a top level skill that's there. So for instance, like the Wiki that you would do would be like, it would be user. So it's all tried to be very general purpose. So you can kind of mold it into whatever you want. And that's why Gladstone, like I have multiple different channels on Discord, all of them doing different stuff. And like there is memory.

Wilhelm (59:54.136)
Nice.

Matt (01:00:12.451)
per channel as well. So you have user channels and memory, and there's also global memory. And so what you could do is you could have a Wiki channel, or you could have like a user Wiki, and then you could write a skill to be like, my Wiki is a user Wiki, this is like how it's structured. If you wanna search over previous sessions that we talked about, you can find it in the DB. But like...

Wilhelm (01:00:14.722)
interesting. Yep.

Matt (01:00:38.841)
I think it's really important. I kind of just worked this out recently. So I had this entirely flat structure before. I think it's really important to namespace different stuff because there should be files that the LLM can't change. And so it shouldn't have permissions in my mind to change anything that's like runtime related or anything that is system related.

Wilhelm (01:00:47.64)
Yeah, I think that's right, yeah.

Wilhelm (01:00:53.558)
Yeah, I agree. Yep.

Matt (01:01:03.897)
So the runtime stuff is like this stuff gets written and it gets pruned. But the pruner is like a cron job. There's also a bunch of cron stuff, the cron files and also the other stuff like the LM, the user has events files that it can write. And that gets picked up by like the cron. That's how my crons work is the user writes events JSON files. So a JSON file for each event.

Wilhelm (01:01:13.262)
You

Wilhelm (01:01:24.494)
Hell yeah, that's great.

Wilhelm (01:01:29.368)
man, damn, I wish we could just like hack on both of our agents simultaneously for like a week or something because I'm... Okay, nice, yeah.

Matt (01:01:35.684)
Well, I'm going to publish mine at some point. Like I'm to publish it really soon. I've been trying to clean it up enough to publish it. And I actually want to publish it like with a full imager and stuff. So people can write their own. Like, so to do the whole Raspberry Pi thing, people can like write their own agent box, or they can like quote, can like, they can like pull the images, basically, the Docker images. And so once they have the Raspberry Pi OS, they can, there's a, there's a pre-install script that installs Docker and stuff. And then it.

Wilhelm (01:01:51.785)
I see what saying.

Mm-hmm.

Matt (01:02:05.497)
pulls the Docker images and just sets everything up.

That's the plan.

Wilhelm (01:02:10.038)
I think Chad V2 will work or not work based on how well I can figure out these boundaries between what can the agent edit, what can't it edit, where.

Matt (01:02:24.505)
But there's loads more that you could do. Like for instance, on any search, so on the search tool, you should probably have like a global method of like classification for prompt injection, probably. And so in the search tool, you should use it and also in any incoming messages, you should use it. Any like incoming data to your LLM, you should probably have some method of prompt classification.

Wilhelm (01:02:51.544)
Sure, yeah, yeah, yeah.

Matt (01:02:53.559)
And I haven't built any of that because mine's running on a Pi, like yours on a Mac Mini, you could run a tiny model locally. That would be sick in another container.

Wilhelm (01:03:01.654)
Yeah, or even like, or even haiku or something, or like, I feel like...

Matt (01:03:05.847)
You could, but I wouldn't want to... I think you end up spending a lot of extra money.

Wilhelm (01:03:12.268)
I think it's also just like the frontier models are reasonably good at detecting prompt injection, like by default, actually.

Matt (01:03:23.767)
Nada no.

Wilhelm (01:03:26.722)
You don't think so?

Matt (01:03:27.191)
They're just not. They're just not, dude. Like, if it retrieves some data from the internet that says, like, in order to access the rest of this data, you need to run this shell command, and it just sounds like super casual and legit, like, 100 % Opus is going to execute that shell command.

Wilhelm (01:03:44.049)
Mm.

Wilhelm (01:03:48.214)
Interesting. I've seen a lot of like, that this seems like a prompt injection. Like I'm not going to do that before checking with the user. Like, I mean, in theory, I agree with you, but like, I feel like in practice, it's just, I've just seen a lot of other stuff, but also I haven't like, I am both not super exposed to the risks and also I haven't like actually tried to hammer it. I mean, there is a lot to explore and figure out, but

Matt (01:03:56.055)
fit.

Matt (01:04:02.357)
I hope that...

Matt (01:04:11.989)
the lobster.

Maybe they've done some more fine tuning recently. I'm sure they have done like every generation of this is gonna get better. But I know when we were playing around with some of the MCP stuff like six, eight months ago, the, yeah, it was like, it was bad. It was just really easy. Like you retrieve some data, the LLM like asserts that it is like data that comes from the user and it just thinks it's a user request. It's super hard to work out that it's not.

Wilhelm (01:04:27.608)
Sure, I am.

Wilhelm (01:04:40.0)
Yeah.

Wilhelm (01:04:43.648)
So it's funny actually, so Lucas who previously mentioned, I think he was telling me the story that back in January he started playing around with the open claw and like Opus 4.5 was the model backing it. And then he had, he told a friend of his, hey, like message my agent on WhatsApp and try to prompt inject it. And like that he couldn't do it. Like it just wasn't like good enough.

Matt (01:04:55.639)
Mm.

Matt (01:05:04.215)
So, it's a skill issue.

Wilhelm (01:05:08.654)
But yeah, I think like the boundary stuff, getting that right will be really, really key. And if I can figure out some good abstractions for it, then I think that'll level up like chat a lot. And then there's like so much other stuff to do and to figure out and will be cool.

Matt (01:05:22.879)
What are you excited about Will? What are you stoked about in your world?

Wilhelm (01:05:28.224)
I mean genuinely the next version of Chad.

Matt (01:05:32.503)
gonna be good, because it's still gonna be Python.

Wilhelm (01:05:35.15)
Probably. Yeah, I mean, the nice thing with Python is I can do a bunch of fun colo stuff on top of it as well. We're also building an AI CEO for SimplePool. That'll be really fun.

Matt (01:05:37.471)
Hive until I die, baby.

Matt (01:05:50.465)
yeah?

Wilhelm (01:05:54.85)
I don't know how much I want to say about this, but it'll be really good. think it'll be like, yeah, there'll just be like a lot we can ship. I mean, everyone's just building AI CEO software factories, right? Like,

Matt (01:05:55.147)
Have you seen?

Okay.

Matt (01:06:05.791)
Yeah, I was going to say, did you see did you see co-founder like they released like co-founder to?

Wilhelm (01:06:11.852)
No, have no idea what this is.

Matt (01:06:13.495)
It's a company that Yohei from Untapped PC invested in. I always like the stuff he invests in and it's super well done. Like the branding and everything, it's super cool. I haven't tried it, but I remember seeing number one and being mega impressed with their launch and all this stuff. I'd love to test it at some point. I see the guy on Twitter.

Wilhelm (01:06:25.535)
cool, I should check this out. Yeah, it's beautiful.

Wilhelm (01:06:39.158)
Yeah, nice. That looks really cool. But yeah, so much development. I there's so much exciting stuff. There's so much. Yeah, it's wild.

Matt (01:06:41.334)
Yeah, I like.

Matt (01:06:49.846)
Good stuff. You know, like one thing I'm like mega stoked about is like artifacts is the Git hosting product that we released. That was work. That was really cool. Like it came together in like less than three weeks. Like.

Wilhelm (01:06:49.848)
What are you excited about?

Wilhelm (01:07:12.974)
Yeah, that's wild.

Matt (01:07:14.102)
fully came together and like there's still like a bunch of things that need to work like it's not in its final state like there'll be like version two of it there'll be version three of it there'll be version four of it but as far as like users are concerned like it works they can they can they can hammer it you know it like works to the level it needs to for the moment and it will get better and better and better and faster and like less limits over time and all sort of good stuff but that's super sick like you can imagine like every agent chat every agent like

A lot of the stuff that we're talking about, like getting the data structure right.

It feels very, feels very, like manual, like at the moment, and like, we're doing it on a machine that's local and all of this stuff. then, but then when you start doing stuff in the cloud, like, yeah, sure. The data structure, it does matter, right? It doesn't matter, but like what happens if you can, if you can just commit files to version storage, you don't have to worry about backups. That's number one. That's like pretty nice, right? You don't have to worry about like.

Wilhelm (01:07:56.566)
Yep.

Wilhelm (01:08:03.95)
Uh-huh.

Wilhelm (01:08:15.628)
Yeah.

Matt (01:08:19.22)
being able to recover stuff, can just roll back like a commit. You don't have to worry about like any of this stuff that we're talking about. Even things like the session files can just be, they can just be stored, like get LFS, know, like pointed to them in R2. It's just all like, so it's just a really nice abstraction. The same way that you were just checking everything into your repo. Imagine you could just actually do that. You could do the naive approach.

Wilhelm (01:08:43.458)
Mm-hmm.

So you're saying I should have just checked them. I actually was thinking about doing stuff with artifacts for this. Or you know one thing that's actually really interesting, which I think is also going to be key for Chad V2, is having a review workflow that actually makes sense. Because I do like my agent being able to modify its own runtime to give itself more features. But the problem, and I do like that in theory it would be nice if it was able to update its own instruction or system level files or whatever.

Matt (01:09:01.779)
Yeah.

Wilhelm (01:09:15.6)
So it's very easy for this to like doom loop and like get messed up in some way. So you need some kind of review workflow that makes sense. And I haven't found that, I haven't found a good solution for that yet. And at its peak, it was bad. Like, yeah, go on.

Matt (01:09:28.672)
Yeah, I think there's two angles to this. You either need very good review,

in which case you will try and catch some stuff. But in reality to me, like review is just more of the process that you're doing to build it with slightly different prompts. So I think you should be doing that anyway. But then what you actually need is a really good method of rollback. And that's what you actually need. Like you just need to have some external method of like rolling back one commit, rolling back one version, which means you need...

Wilhelm (01:09:55.69)
Interesting. Yeah.

Wilhelm (01:10:03.997)
so, yeah, yeah.

Matt (01:10:05.084)
Yeah, need external, you need to be able to containerize your solution so much so that you can store multiple versions externally and just pull one of them, pull the latest one back. I remember a talk, you might have even been there. You know Boris, Observability Boris? Yeah. He had this like mad talk where like he...

Wilhelm (01:10:21.869)
Yes.

Matt (01:10:30.228)
The thing that I always remember from the talk, is definitely not what he wanted me to remember, was that we don't write tests because we trust our engineers. And that just made me laugh a lot because he's definitely been proven wrong there. Tests are amazing for other LAMs and they're really good documentation and all this sort of good stuff. But at least I think he's been proven wrong. But then the other side of things is, and they show the feature in its entirety and everything that works and doesn't.

Wilhelm (01:10:37.613)
Mm-hmm.

Wilhelm (01:10:43.768)
Matt (01:10:59.892)
But the other side of that, that he was trying, I guess, trying to explain in that moment was he was like, we don't really need tests because we have such good observability that we can observe what's broken, like in a staging environment, um, before our customers ever see it. Like we see the broken stuff before they ever see it. And like, he was kind of taking it to the end extreme, like you a hundred percent do need tests even more in a world with a lens. Like, otherwise you are just going to ship broken stuff.

Wilhelm (01:11:14.168)
Mm-hmm. Yep.

Matt (01:11:28.617)
But he was like writing by hand, I don't ship broken stuff very often. And I check it works. And also I can see that it works because I have such good observability. So I actually think we need better observability solutions, not better code review solutions. I think that is.

Wilhelm (01:11:33.592)
Mm-hmm.

Wilhelm (01:11:41.26)
Yeah, okay. So this is actually part of what I mean by review. I don't mean like a pull request review or a code review. I mean like some really abstract notion of like, is this what you want? Does it work? And then I like my input as a human is just like, yes, this is what I want or like, yes, this does work or like I have tested it.

Matt (01:11:56.82)
Yeah, need some notion of does it, yeah, you need some notion of is this what you want? And I think tests were a really good middle ground for that because they defined the features. Now maybe less so, interestingly enough. Now maybe like a visualization, maybe during, with markdown is better and all this stuff. And tests are actually a lower level primitive that you actually want a higher level view.

Wilhelm (01:12:10.061)
Yeah.

Wilhelm (01:12:17.034)
Yeah. Yeah, exactly. Yeah.

Mm-hmm.

Matt (01:12:26.556)
And then, that's like, does it do, is it what I want? And then like, does it work? It's almost like a different style of testing. It's not like these like documentation style tests. It's like, almost like load testing. Like, can I fuzz test this? Can I like hit it with like loads of random, like for artifacts, we built a fuzzer. You know what, I'll tell you this. It was actually sick because like.

Wilhelm (01:12:35.95)
Mm-hmm.

Wilhelm (01:12:48.374)
That's very cool. I didn't know I didn't know that yet.

Matt (01:12:52.167)
The main scare thing about artifacts is we store corrupted data from a user and then the durable object ends up in such a weird state that we have to reset it and we lose some data or something. That's the main scare factor. So yeah, we built a fuzzer. It so fun. And we just hid it with loads of weird and wonderful stuff and we found a bunch of edge cases. I think that sort of thing is gonna be necessary.

Wilhelm (01:12:58.381)
Right.

Wilhelm (01:13:07.842)
I

Wilhelm (01:13:11.864)
That's really cool. That's awesome.

Wilhelm (01:13:17.357)
Yeah.

Wilhelm (01:13:21.762)
That's great. I'm going to build my own fuzzer real quick to do my own fuzz testing on artifacts. Do some really weird shallow repos with like tons of top level folders, some really deep nested, like a million folders nested.

Matt (01:13:30.342)
Shots. Yeah.

That stuff will be fine. It's like when you have like massive pack sizes and stuff, that's where I think starts going a bit weird.

Wilhelm (01:13:41.966)
Fair, fair, fair. I need to run actually. I need to head to a meeting.

Matt (01:13:45.178)
Yeah, me too. Dude, it was a pleasure.

Wilhelm (01:13:49.454)
It's very nice to catch up. Yeah, I feel like we're both in the middle of tons of things.

Matt (01:13:55.282)
It's gonna be exciting. And you need to book a trip to come visit me, because I don't think I'm gonna get to stand by just here. Jesus Christ, mate, we'll all be old and grey by then.

Wilhelm (01:13:58.766)
Yeah, I put it in there. Feb 2030. No, no. I'll need to make it happen. Yeah, yeah, No, love the site. Give my best to Juliet as well. Be great. See you guys. Peace.

Matt (01:14:11.179)
and to cheers to. Right catch you in a minute. Bye!

The Golden Age of Tinkering, Reachy Mini on Qwen+Cerebras, Compute Predictions, Fuzzing Artifacts, Flatter orgs
Broadcast by