Mark Mandel

In this episode Rich speaks with Mark Mandel from Google. Topics include: How Mark started coding, ColdFusion, the early days of Kubernetes and how things have changed for new users, Agones (a Kubernetes-based system for running dedicated game servers), the challenges in game development, secrecy in gaming and how open source is impacting it.

Kube Cuddle - Mark Mandel

Rich: Hello, and welcome to Kube Cuddle, a podcast about Kubernetes and the people who build and use it. I'm your host Rich Burroughs. Today I'm speaking with Mark Mandel. Mark is a Developer Advocate at Google. Welcome Mark.

Mark: Hey, nice, nice to be here Rich. Longtime listener, first time uh speaker.

Rich: I actually wanted to take a bit before we get into things and thankl you. I don't know if I've told you this story, but we had a conversation at KubeCon San Diego that was actually one of the things that inspired me to start this podcast. Um, Yeah, so I, I met you there for the first time in person.

And we were talking about your time hosting the Google Cloud Podcast, which I was a big fan of. And you mentioned at one point something that has really stuck with me, that one of the things you loved about doing that podcast was that when there was something new that you wanted to learn about.

You would just find the person who is the expert and invite them on and have them tell you about it.

Mark: Yeah, Yeah.

Rich: I thought that was so smart and such a great way to learn. And I see a lot of people doing that now with streaming too, bringing on people who build tools and getting them to explain them.

And that was actually, it was there at San Diego that I got the idea to do this and that conversation was part of it. And part of it is that I was just running into so many rad people and I was like, I could just do a podcast and talk to all these people.

Mark: That that makes perfect sense. I haven't worked on the podcast for a number of years now, but there's some very talented people who have taken it over, but that was definitely a thing. Absolutely. It was always like, what am I, oh, that thing looks interesting. I'll, let's go find that person and bring them in and let them talk at us and explain how all this stuff works.

Now. I actually have to do my own research, which is...

Rich: Oh man.

Mark: Terrible shame. But, uh, you know, those are the days. Those are the days.

Rich: I was such a big fan. I, listened when you and Francesc we're doing it. And, um, yeah. Yeah. And, the thing that I really loved about it is that sometimes you all would talk about topics that weren't even that interesting to be, but you both just had this like super positive, friendly energy, and it just made me happy to listen to the two of you talk.

Mark: That makes me very happy. I'm super glad to hear it.

Rich: So, uh, usually I start off with folks talking about what their path into computing and Kubernetes was. How did you get started with computing?

Mark: Oh, how how far back do we want to go here?

Rich: Um, I mean, You can go way back if you want.

Mark: Uh, Let's see. So I grew up in Australia primarily. Very privileged in that my family, we had computers. I think we would have had an old 286 back in the day. I think we had a 20 megabyte hard drive. Um, so, you know. Video games, that's usually the way in from that. Playing Bubble Bubble, Golden Axe, Commander Keen, all that good stuff.

Yeah, so always dabbled with computers in that way. It was definitely not that uh, I didn't start programming I think until into university. Uh, almost, almost segued into a life of art and design. Realized that wasn't for me. Ended up in, so this is, this actually dates me quite nicely. For those of you who aren't familiar with, say Australia, particularly universities, like we don't have majors or minors, like you would say in the U .S. You just have your specialty.

So it was the late nineties. And so I have a bachelor of multimedia because multimedia was the word. It was a brand new sort of vaguely experimental course that was run by my university. It was weird and interesting. I think for the first three months, we didn't even have computers, which is weird. But anyway, it was fine.

We did it. And it was during that, that I learned how to code and basically actually, even way back in the day, just started building web apps and that kind of stuff is where I started sorta cutting my teeth on that. And ended up doing a ton of ColdFusion.


Rich: Oh...

Mark: That was my and butter for very long time.

Rich: Oh my gosh. I'm I might have flashbacks here.

Mark: Pun intended, talking about Flash? Okay. Yep.

Rich: I worked at an internet provider in that same time period.

Mark: There we yeah.

Rich: And we uh, had web hosting and had customers who used ColdFusion. And it was not always fun at least being on the hosting end of that.

Mark: Yeah, but it was a, it was an interesting kind of niche community. I mean, At the time it, it did reasonably well. It has highs and lows, but that's where I cut my teeth on open source to begin with. So I started doing some open source projects in that I wrote an object, relational mapper that was very popular for a long time.

Rich: Oh, wow.

Mark: There was some interrupt tools that I built that were very popular. At one point ColdFusion migrated from, I think it was a C++ code base to being like a J2EE app. So I wrote a bunch of tools onto these like Java interrupts stuff that were very popular. Uh, there was a, there was a port of Spring, like the dependency injection framework that was brought over.

I rewrote a new version of that. There was all kinds of stuff. So that's where I really cut my teeth on like open source and doing conference presenting and like being in front of audiences as well as, doing the tech side and like actually code and that kinda stuff.


Rich: And then what brought you to Google?

Mark: What brought me to Google. So actually I actually think that's actually what ultimately brought me to Google. I was very well known in that community. One of my teammates, Terry Ryan was an evangelist at Adobe at the time. I'd known him forever and ever, and he ended up joining Google.

And it was his recommendation. He reached out to me and he's Hey. We've got a brand new team. This is like when Google Cloud is, six years ago, Google Cloud was small it was itty And he just reached out and he was like, Hey, I've seen you do this kind of work before, advocacy relations, that kind of stuff.

Uh, In, in like previous stuff that you used to do in sort of like in the ColdFusion space. I think you might be a really interesting fit. And I was like, oh yeah, it's Google, but you know, I might as well give it a shot. That seems reasonable. There's a, there's an, there's an SMS that I wish I'd kept that I sent my wife.

That was like, it was like, Hey like, as if it were ever going to happen, but you know, we could move to the Bay Area. What do you think? I was like the ever whatevs and she was like. What are we going to do with the dog? And I was like, we will bring the dog with us. And yeah, he referred me and I went through the, the ridiculous rounds of interviews that is Google. And apparently they liked me and they decided to move me and I've been here ever since.

Um, But I think actually you talking about Kubernetes. I saw Kubernetes, this was very early. I think there was a Kubernetes and MesosSphere talk at Strange Loop one year, which just passed actually, that conference. I love that conference. And that's where I think I first saw Kubernetes.

I vaguely remember that. I think I was also really sick that year as well. So that's how I got to Google and then another teammate of mine after I joined Google, Brian. Oh my god. Oh my god. One of my, one of my favorite teammates whose names I've just completely, uh, completely, forgotten his last name.

Brian Dorsey. Oh God. Yeah. So Brian, who is a

Rich: Oh, Brian. He

Mark: I'm very, sorry. I forgot your last name. I have a terrible memory. I probably had been there for about three months, maybe not even that.

And he was like, I'm teaching a Kubernetes workshop tomorrow, do you want to TA? And and I was like, cool, I have 12 hours to learn Kubernetes. And so I did that and the rest was history, in, in many ways.

Rich: Wow. Do you have an idea what year that would have been??

Mark: So it would have been six and a half years ago. So, um, is it now, 2021? So do the math, 2015 or something.

Rich: Yeah. That's about when I heard of the project too. I saw Kelsey speak about it when he was still working at CoreOS.

Mark: Pre 1.0 days, those were the days.

Rich: This is actually something that I wanted to chat with you about, is that I had a feeling you'd been at Google, like pretty much through the whole life of the project or most of it.

And I'm curious about what it's been to be there at Google and observe Kubernetes growing and the community growing.

Mark: That's an interesting question. It's been interesting. Uh, I would say, it's a really interesting question. Um, I think, I think the thing that I find, I don't know if fascinating is right word, but a patent. It, I would say it's definitely followed a patent that I've seen in other communities where it is.

There's a bunch of people who start with it before it's 1.0, before it's new. So there's, there's very like a limited set of concepts. So I started pre 1.0, and I'm like, oh yeah, pods, deployments, services. Cool. Yeah, done. Like, That's it. Um, And those people like start with the project and, travel through it.

And over time they start looking at more and more complex applications and complex use cases and start, and the project gets bigger and all that kinda stuff. And the thing that I find interesting, and I've never really found a perfect solution for that. But as those people go through, as new people start feeding into the project, they will see those people who are already 10, 15, 20 steps ahead. And they'll see them and they'll be like, oh God, this is terrifying. Because I'm looking at everyone else who's been working with a project forever and being like, but they have seven things that all work together. Do I need seven things? Why do I need seven things? And it's not, I don't know if it's bad thing in many ways, but it's understandable, in that, that's the natural progression for those people to keep going forward and being like, what's cool and shiny.

And like, what else can I do? Because I have institutional knowledge. And so I think the thing that I've seen is that progression, totally saw that happening. Go through where, you know, you're like, oh, I've, I'm run forward with the stuff I work on. And I've seen people be like, alright, Kubernetes and Istio, and I don't know, seven other things that we can put in place and, RBAC at it's gonna [unintelliegible].

I was going, oh god, what why do I need to do that? And then there was almost like, I feel like there was almost like a cliff where there was almost like a reset in many ways. Where a lot of people were like, okay, let's start just putting out introductory stuff again. Let's just start talking about basic Kubernetes.

Let's reset the expectations of what's needed and let's just be like, all right you have a pod and you have deployment and you have services and then you can go do stuff. And then worry about the rest as you need to.

Rich: That's, that's super interesting. I think I've observed some of that as well. And one of the issues that I think can come up is when, those people who have been around for a long time don't remember what it's like to be starting out or. Or they really, it's not even just about remembering, right?

Because starting out with Kubernetes now is so different than starting out with it in 2015 would have been.

Mark: Absolutely. There's a phrase I like to use when I talk to, like, I usually, like if there's people who are new to an industry of any industry and they're like, oh, I want to do like conference speaking, or I want to do that kinda stuff. And I always actually say to them, make whatever. If you want to talk about your experience and share your experience, being a beginner in something is literally the rarest thing that can possibly exist.

So your perspective is just so incredibly important because it's fleeting and you never get it back. And so yeah, people come to me and they're like, how should I learn Kubernetes? And I'm like, I am the person. So I have resources that I think are good, but I've been around for so long that, yeah, I don't know.

But if you can find people who have been doing it for six months and. There are resources that have worked for them. Those are the people you should definitely be asking. And those are the people we should be asking questions of to be like, how can we make this better? What's a better path for onboarding that kind of stuff.

Because yeah it's super hard to see the forest from the trees when you've just planted the forest.

Rich: Yeah. It seems like one of the areas that things have progressed a lot in, and I'm happy to see it, is security. Cause it seems like, it was the wild west at first, right? Like Kubernetes was just super wide open and as time has gone on, there have been a lot of tools and patterns that have come up.

Some of them in the platform, like RBAC, that, that allow people to, to have a lot more control over what's happening in the clusters.

Mark: Yeah. Yeah, that's fun. Um, So we'll touch Agones probably at some point, but like when I, when I ran into people who are like, I'm trying to make this thing, talk to this resource inside my cluster. And I was like, I'm sorry, but it's time you're going to have to learn about RBAC.

Now. I know it's going to suck and it's not going to be fun, but security is not necessarily fun.

Rich: Yeah. Did you, when Kubernetes was starting out, did you have experience with Borg? Had you been working with it at Google?

Mark: No. So I've never done anything internally inside Google other than probably update some docs here and there. So I'm the worst. And I work with eng teams that do stuff internally and I don't know what their language is. It doesn't make any sense to me, but they do.

Rich: Totally understandable. Yeah. So you mentioned Agones, that's definitely one of the things I wanted to talk to you about. For those folks who haven't heard of it before, can you give us a bit of an overview?

of what

Mark: Yeah, absolutely. Especially for this crowd as well. So, um, Agones is, is an operator like for Kubernetes. It has a series of custom resource definitions and some code that you run that has a controller. It basically teaches Kubernetes the lifecycle and management of what we would commonly refer to as dedicated game servers within gaming. That's that's a, in-memory like stateful in-memory workload where a bunch of people who are playing a multiplayer game, say something like Fortnite, or Overwatch, or something really fast paced they all connect to this one simulation server.

Which we call a dedicated game server for lack of a better term. Because server is completely overloaded at the best of times anyway. And it's its job is to run all the physics simulations, see what's going on inside the game and then distribute what's happening out to each player. And the fun thing about that apart from a bunch of stuff we do as well is, for one, like it's not like a web server which a deployment works really great for.

It's an unordered workload in that like, you can shut down a web server whenever. So that's if you've, as long as you have enough to handle the load, cool. Like you might miss a request, but it's not a big deal. But it's not like a database which stays up forever and ever, and usually has an ordered pattern.

Like you go 1, 2, 3, 4, and then if you bring one down, it'll be 4, 3, 2, 1. Game servers are like, I'm going to spin up a bunch of them, because sometimes they actually take a long time to play a long time to start. Um, but then as players finish their game, then you might shut them down in any particular order.

And what makes that really entertaining is if you don't have players playing on a game server, you can shut that down. That is totally fine. If you do have players playing on a game server, you don't want to shut that down because players get mad, um, for whatever reason, I don't know why. And so how do you do things like, Hey, I've got a new game server version.

How do I roll that out, uh in a way that's safe and atomic. How do I scale in that way? So that if I scaled down, I, we, we have fleets of game servers. a big group of game servers. How do I scale that down without interrupting gameplay, all that kind of fun stuff. So it's a slightly different workload.

But yeah, essentially it's a controller and some custom resource definitions and actually some API extensions too for funsies.

Rich: Yeah. So I think this is all super interesting because as someone who's done a lot of gaming over the years I'm always curious about gaming infrastructure, right? Because like you said, there definitely are different requirements and especially when you, when you talk to people, like I've had conversations with folks who work at some of the AAA shops, on infrastructure, and the kind of scale and low-latency and things that are required in that sort of environment often are pretty different than what most people would experience running in for, at, a company that makes whatever kind of widget.

Mark: Yeah, it is. I mean, especially talking dedicated game servers, you're often looking to put your game servers somewhere very close to your players, usually within like 50 milliseconds or so. So you've got a very wide distributed geographic rollout. And there was that. But I think also the, the loads, the load graphs look a little different, I think generally. So you do a game launch and usually you're just going straight up in the air, right? You're just like everyone's showing up right now. Like If you've done your marketing, right. Um, and that makes the risk really high. Cause you can do beta tests and you can do your own bot testing and stuff, but like everyone knows like production is production.

Rich: Yeah.

Mark: But yeah, your launch is where you're going to get probably the largest number of players come in. And if that goes wrong, they may not come back. Um, so not only do you have the hardest sort of like, just graph of just straight up, right.

Um, also it's, possibly the time you're going to make the most amount of money and that kinda stuff. So you really have to solve problems while it's going on. And then a lot of games that are been around for a while now, like we call it games as a service. They're releasing seasons or like, new content on the regular.

So you're getting the repeat of that spike over it, over and over again. And over time, like you're going to work that out, but you kind of, you, you just have to hit that every single time and make sure it works perfectly as much as possible, which is entertaining and not everyone gets it right. That's a hard problem. As we all know, distributed systems, they're not the easiest thing in the world at the best of times.

Rich: Yeah, I was talking to some folks who were working at Ubisoft. I think they were working on a Rainbow Six Siege. I can't remember for sure but they basically were saying that their traffic was growing over time, which was super unusual. That usually the pattern was more like that after that initial bump, things drop off.

Mark: Yeah, I, it depends on like your sort of game as well but yeah, definitely the trend now for games will be uh, you'll be releasing content at regularintervals over time. So you'll hit that bump and then it'll, it can, it can ease off over time. You'll usually get, uh, the graph I like to draw, which I can't do here because we're on a podcast is like this big spike up. And then it drops down to maybe about halfway ish, you know, vaguely and then evens out, and then like on a slow decline. But as you do, more content, or what, commonly actually also it gets referred to as live ops these days, which is like looking at your user base and sending them things in like, Hey, you haven't been in, in awhile.

Here's a box that might have some cool stuff in it and bring you back. You can start to hit bumps and get re relaunches essentially as you go through.

Rich: That's super interesting. I had Kaslin Fields on the podcast a little while ago and yeah. And you were on her show that she does on, um, Fields Tested. And you actually did a demo of Agones, like the two of you walked through it and got it set up and then actually played a game on it.

And so, um, I'll link to that in the show notes, if folks are interested in actually seeing it in action. I think that was a really good look at it.

Mark: We also did it in onesies as well, which I think is most important.

Rich: It's it was very impressive. You both had fantastic onesies. Um, one of the things that you and Kaslin talked about was those differences between, a game server and a sort of typical workload that you would deal with it at most companies.

And you've talked about some of those, but I'm wondering if we can get into a little bit more. Like what the life cycle of a game server looks like.

Mark: Yeah. And some of this is going to vary which is interesting of itself. So let's let's make the assumption that sort of we're playing something that's super physics heavy and lots of players can thing. So like, I, I mentioned before, like a Fortnite or an Overwatch, Rocket League, Valorant, like anything like that kind of game where there's like lots of players and. Cause there's degrees in here, two different types of games. Um, usually what you'll do. Um, so I'll talk about it inAgones terms, cause it's just easier and nomenclature I'm used to. So first thing you probably will end up doing is starting up a group of them. We call a fleet. Game servers themselves can take 30 seconds, a minute.

Like they're probably a container image that's several gigs in size. They're big. Probably pulling in data like map data and player data, and all kinds of fun, stuff like that too. So that they have a lot of stuff in memory. So you'll have a bunch of them kind of sitting there, idle, waiting. And then there's a fun game that you'll play in terms of like, how many idle ones is the right amount.

So I'm not wasting resources versus making sure my players can get them. So that's a whole fun game too, around like, how do you autoscale that? And then also doing that over multiple uh multiple geos, which is also hilarious. So you'll have them sitting there.

Once they're sitting there, we have a thing we call allocation. Which is basically we actually used Kubernetes selectors for it, but basically go to this cluster and matching these selectors, grab me a game server out of the set. And we have some other stuff you can do. You can actually do like preferential ones. Like If you can find one that looks like this, do it this way, but if you can't then fine fall back to this kind of stuff. But at its base level, we were just like, Hey, I need a game server.

Our game servers have, has a state, essentially. So they spin up, they mark themselves as ready once they've loaded all their data and they're ready to go. Like they're good to go. We have an integrated SDK primarily because exposing HTTP and stuff like that on a game server isn't necessarily the easiest. And so the integration was easier. We just built an SDK that you can talk to and call out to rather than stuff calling in. So they mark themselves as ready through the SDK. And we see that on the game servers and we know they can be allocated.

And then once it's allocated, that's our special marker for, there's a player on this. This is important. Keep it like that. And once it's in that allocated state, if you're making an update to a fleet, and say, we're doing a rolling update, that one won't get touched until it gets shut down. If I scale my fleet down that one doesn't get touched until like it shuts down or like specifically gets deleted.

That's that special magic. Um, Agones does some other special stuff. So with game servers, generally, you're connecting directly to the game server. We don't go through load balancers unless you're doing like web sockets or something else. There's no need for them. And all players are playing on the same thing.

So we do a bunch of port management stuff for you as well. And so once you're allocated, you get back your game server, IP and port. So it's the node and which dynamic port we gave you in a range. Players play the game and more often than not, we follow a pretty Kubernetes model that you can escape it if you really want, where, like my game's done.

I, I cold shut down through my SDK because the game server knows it's ready. It goes away. The fleets like, oh, you're done, I'll spin up another one, put it back in the pool. And we continue the cycle and round and round wherever we go.

Rich: Is that just like labels that are telling it

Mark: So we have,

Rich: or not.

Mark: Yeah, we have on our game server resource definition, we have a status. On that status, we have a series of states that we use You can actually also add your own labels and annotations to game servers. And we use that as a communication mechanism too, cause the SDK has access to that. So if you need your own custom sort of lifecycle stuff, you can do that as well. So yeah, we keep track of our own state.

It's just a bit easier that way. So that you can actually query it. And if you go "kubectl get gameservers," because you can do that with custom resource definitions, we can show that as, here's your game servers, what state are they in? What's their IP. What's their first port, that kind of fun stuff.

Rich: When I heard you discussing this with Kaslin and it made me think about the old days of managing a VM life cycle, right? Like that, that especially like CI/CD is the big use case that it made me think of. I was in a shop where we did a lot of CI/CD with VMs and we had that same sort of mechanism where we'd have like that pool running, that could be just grabbed and used immediately.

So people didn't have to wait, every test for something new to get provisioned.

Mark: Yeah, I guess that makes sense. In that sort of realm, right? If you have a CI job running on something, don't shut that machine down, leave that one running but once it's done, once it's done, that's fine. You could get rid of it if you need to or scale it back.

Yeah, it's been interesting. The stuff that I work on with game servers, there's definitely overlap with other industries. We've seen some interesting stuff there SIP and VOIP type stuff, right? If you and I are talking, we're probably talking to an individual thing. Um, I don't know enough about it, but we've definitely had a bunch of people use it for like media transcoding type stuff, which I assume makes sense. Right? Like if if you're transcoding, it needs to stay up until it's done. We've been primarily focused on game servers because it just makes a nice constraint. Um, but there's definitely some application for kind of some of that stuff too.

Rich: That's cool. It's one thing I really liked about it too, is the idea that, y'all are just using Kubernetes. Right? And so these things are objects, and so you could actually do "kubectl get gameservers" and see like a list of the game servers there.

Mark: Yeah. It's co I honestly like, um, the way Kubernetes is, is extensible. It's kind of amazing to be fair. The fact that, you change things through custom resource definitions, or even API extensions. That changes the whole Kubernetes API, which then changes the kubectl tooling, which then is accessible through all the ecosystem of tooling that accesses the system through the same mechanisms and suddenly has visibility into all this kind of stuff.

And, you do the right nomenclature and it just all cascades all the way through. Yeah. So the first time I was like "kubectl get gameservers" and saw it like return. I was like, oh my, that's amazing. And just this, the power that you have from Kubernetes and being able to extend it is it's it's just, yeah, it's ridiculously impressive.

And it amazes me that for so long, so many people were building systems like this just from. They're like, I'm going to build my own scheduler. Oh man yeah. How, I mean, it makes sense. Like you would do that, but it's the fact that Kubernetes exists is makes some of these kinds of things much, much easier.

Now we can solve the other harder problems or focus in on where things are challenging. And that kind of stuff.

Rich: Yeah, I think that one of the advantages of it too, is that when those things get baked into the platform that, that they're known issues, right? There's no debate about how they work anymore. You don't have to reinvent the wheel. You don't have to figure out how the thing works. It's just there.

Mark: Yeah. And it's, onboarding's easier, training's easier, set of tools that can be used across platforms is easier. Yeah. It's like why people use Kubernetes? That's a thing. You know, it has its warts. It's not perfect for everything. But there's so much momentum and so much like institutional knowledge and just, expanding expertise in that area and tooling and everything. That I don't want to treat it as if every hammer, what is it? Every nail, everything looks like a nail when you have a hammer. Um, I I'm sure some of that sort of happens, but at the same time, just so useful.

Rich: Yeah. I mean, I talk to people about this a lot. I try to be, I try to not be dogmatic about tools, right? Because tools have use cases and they all have strengths and weaknesses, and there are some things that some tools are better for. But I think that, in general, if you're running distributed systems, that are containerized, that it's a good option. It's, it may not be the best one for everything.

HashiCorp's Nomad is cool. And there are people who use that for some things. There's lots of different ways to run things. But I guess what specifically about Kubernetes makes it a good option for running these game servers?

Mark: That's a good question. I think it, I see stuff we've already talked about. Honestly, I think more than anything else is the ecosystem of tooling and just the functionality out of the box. At the end of the day game servers are still a piece of fungible software that need all the same things that like everything else needs.

How do I aggregate my logs? How do I know my health checking is working? How do I push out new versions? How do I know the version of what software I'm running is the actual version I should be running? Like just basic stuff like that. It's I think we, I, and someone can probably check me on this, but I feel like we built like the MVP for this, uh, in probably three months and I was probably working way too hard on it, which is probably accurate at the time. But the fact that we could do that with Kubernetes, like there's no way we could have done that from scratch. Like it's just not possible. And yeah, so the ecosystem, like we can build out our own Prometheus dashboards for it and like that kind of stuff, or just general metrics.

All the kind of stuff that exists. I don't want to write how to spin up a sidecar to another, like what? No, I don't like, no, please. No, I don't want to do that. I wanna solve I don't know if more interesting, but other problems.

Rich: Yeah. It seems like the game servers are really good use case for containers too in general. I was playing a lot of Fortnite for awhile and wasn't great at it but, but I remember the thing one of the things that I thought was super cool is, you know, in the game, you could actually spin up your own version of the map.

And just go and mess around on it, like without any other players in, in that instance. And that immediately made me think, oh wow, this would be I hope they're using containers if you're super...

Mark: I definitely don't want to speak on how Epic does anything. but it's.

Rich: I know some folks over there. I haven't asked them about it though.

Mark: Excellent. Yes, I would suggest asking them and going through their PR department. Um, The it's kind been an interesting read too. I mean, um, traditionally gaming is very Windows centric especially on the development side. Game servers definitely ran on Windows machines for a really long time. So it was an interesting one. We've definitely seen a shift in terms of, you know definite acceptance of running stuff on Linux, running containers, that kind of stuff. That's definitely has been a thing that was an initial challenge in the beginning to like, first of all, many people will be like, they're like, containers, these are new and exciting and also scary.

And do they run as fast. Performance is such a big thing for dedicated game servers. Um, Like especially consistent stable like CPU performance. Um, So there was some of that education type of stuff. There was all that kinda stuff. These days it's far less of a concern or just Hey, you wanna run containers?

They're like, ah, yeah, we were in Kubernetes for other stuff or whatever. Right. Like that's just such a norm. But yeah, back in the day it was a thing.

Rich: Yeah, I've been, I was under the understanding for a long time that, it seemed like gaming was behind, like in terms of things like automated testing and things that were more taken for granted, at that point in, in traditional software development.

Mark: Yes. Yes. Um, I'm nodding my head. Uh, much respect and love to everyone I know who works in backend for game. And they're all a lot of very competent people. But it's, it is a different environment. So there's a few things that I think that are interesting in that. One, the lifecycle for games, especially traditionally is very different than what you would have in software development.

So you are looking to build a game. You're probably going to build it over three years, average, something like that. You're probably going to make a bunch of technical decisions at the beginning, and those are probably not going to change through the lifecycle of those three years.

So whatever new stuff is coming through that time period, and is extremely rare that a team is going to deviate from that. So that's number one. Two, uh, the games industry is the games industry. It is both wonderful and horrible. Anyone who's been in the games industry longer than five years is considered a veteran.

So there's not a lot of like age and experience. There, there is definitely that that does exist, but like burnout's a thing, crunch is a thing. So that's a whole aspect to um. The games industry as a whole is like extremely secret. Um, they've slowly been opening up over time. So the sharing of knowledge about like how games get built is also something that we're slowly trying to peel the layers back.

And one of the things that got me, I wanted to do around open source especially with, for gaming is a thing as well. And so there isn't necessarily a lot of sharing of information about how software gets built. it's a fun one. Like I've been at, I've been at talks and some of this is slowly changing.

Like you and I go to a talk at KubeCon, right? This is a thing. And we'll listen to someone talk about, I don't know, making widgets, using sprockets, whatever. And then they're like, and here's the widget GitHub and here's the sprocket GitHub, go nuts. And you're like, awesome. I'm going to go do the thing. And this is definitely changed more so in the recent years, but in the last few, it, especially like two, three years ago, four years ago, it would be. So I'm going to talk about the concepts of making widgets and Sprockets, but I'm not going to show you how to make widgets or sprockets or show you any of the code. It's all internal and proprietary and I'm sorry, you can't touch it. And you're like, cool, thanks. It's good, but it's not like the same sort of level that we're very used to.

And coming from outside in and coming into the games industry was very like, huh. Strangely enough, I do a bunch of like open-source advocacy. And so the keeps industry these days, and I, this is why I also wanted to build open source tools. Cause I wanted to change some of this culture and drive, drive more openness.

So yeah there's definitely some lag there, I think and to greater and lesser degrees across different game companies. But I think, I like to think actually I think it's narrowed over

Rich: I've

Mark: time.

I've definitely noticed what you're talking about. It was actually on my list of things I was going to ask you about the secrecy, because you don't tend to see at infrastructure conferences, people from the gaming companies speaking, that much, you don't tend to see them sharing a lot about what they're doing.

And it's something that I'm personally super curious about, so I'm always like, oh, I want the details.

Yeah. Tell me all the things uh, yeah. Was that a talk where I think it was a big data talk and they were like, so we're not going to, we're specifically not going to mention products we use. And I was like what, tell me. What? Come on though. I want to know, tell me if they're terrible. Tell me if they're good.

Like I want to know. Yeah it's an interesting one and I think it's been really fun. I've really enjoyed. I think is probably a good term. Like trying to help bring more open source. Because there's a bunch of people who've been doing open source and gaming before I showed up. Credit to them.

But having the, power of a multinational corporation doesn't hurt um, to try and drive and help more open source in gaming is, has been really interesting. I also, I think I'm lucky in that The infrastructure side of things relies so much on open-source because you can't get away from it in our day to day jobs, that in many ways, it's actually, I think a good avenue for creating that wedge and making that space because they have to touch open source.

There's no getting away from it in an infrastructure. It's a good opportunity to help companies be engaged. Um, Cause they may already be. Also where they at least have some semblance of understanding of how that already works because yeah, it's infrastructure. You can't get away from it.

Rich: That totally makes sense. Are there companies you can mention or games that you can mention that are using Agones??

Mark: I will pull up the it governance website because that is what is a allowed.

Rich: I know how that goes. Totally.

Mark: Yeah. It's been doing really well. I'm super happy about it. Um, If you could look on there's a companies using Agones section, very important. What's on there. We co-developed it in the beginning with Ubisoft. So that's the definitely the big name.

The ones sitting there. The other uh, there's a bunch of lovely people. How do you compare it to beautiful sunsets? But someone's, I'll probably highlight more than others. Uh, Embark Studios is a company in Malmo, Sweden, who I do a bunch of work with. I also work on another project with them now called Quilkin which is a proxy for UDP traffic for game servers uh, which is like last came in the last few months.

So it's super, super new but they're using it for their workloads. The other fun one that's on here that I really like are RollTable. Uh, If anyone's played like digital tabletop games, they have, they, they run all their stuff. I think I haven't actually looked at it. So, uh, I need to to remind myself, but I'm fairly sure they run the actual physics simulations where you can actually roll physical dice.

Like it's kind It's a physical tabletop. Yeah. I'm looking at it now. And it's physics enabled and you can upload like you have actual pieces which is cool. So there's a bunch of them. There's a bunch I can't mention, which is a real shame. But yeah, if you played a game in the last few years, it's possible may have landed on Agones which is neat.

Rich: I've, I've played some Ubisoft games for sure. So, um, Yeah that's super cool that I think that it's great that you're working to, get more open source into that community. It would be awesome to see over time it open up even more and get a little more transparency so that those of us who happen to be players and infrastructure, nerds, get

Mark: We can really geek out.

Rich: Exactly.

Mark: Yeah. And it also just like, oh, whenever I'm like, being in the role that I am, I work on Google Cloud. I work on gaming. I talk to a bunch of customers. And like having the same conversations with customers about the stuff they're building and I'm like, you're all building the same thing. Can we please just have like, like, I mean, XKCD's like standards comic not withstanding can we please have just like an open thing that we all work on to make this like a reality. Um. There's so much wasted time when we could be building other cool stuff for other cool games. Let's play. Yeah, let's just, let's build a cool foundation together. I just want cool games, man.

Rich: Same. Um, being at Google, while Kubernetes was growing, as we talked about earlier, How does a company like Google balance working in these communities like you do and working with other vendors, you know, Microsoft and all the other folks that are involved. And yet you have a commercial Kubernetes offering.

So like, how do you balance, working in the broader ecosystem versus having a commercial product, if that makes.

Mark: It does. I will speak personally and not on behalf of the company I work for. Mainly because I like my job. To be blunt.

I think that's an interesting question. And I think Kubernetes is a fascinating space in that there are so many people who are working together in the same place with the common goal that are essentially competitors, but are also allies in this sort of like stuff that they want to do.

I think, and I think actually I just spoke specifically about Agones as well, because, we do have a managed product, that's Agones. That's a thing that we do as well. It's that very open core style situation that many of us are very familiar with. It's like Kubernetes and GKE.

We have Agones and, Google Cloud Game Servers. But I think. It's always been important to us, especially especially on the Agones level and that kind of stuff to be like Kubernetes agnostic. Um, there's a, I'm probably gonna mess this up and I feel like it was Eric Schmidt that said it and um, you know, that, and I think he said it at Next one year that I really preach.

Like we don't want to like, I don't want, I don't want to lock you in to a platform. What I want to do is build an open ecosystem where we give you the best solution that you can get. And make sure that you, as a customer, get the thing that you want in the best format that we can give it to you.

And, we think that we can do the best job and if we can then that's good. And then that means that we'll get your business in that way. I mean with Agones specifically, you know, I mean, we do all our testing on GKE, like that's, cause that's what we have like available to us and that kind of stuff too.

But we get, we've had contributions from Microsoft. We have contributions from a variety of other studios who work on different platforms, run on Azure, like that's great. Like at the end of the day if the thing that we, I took away and I think many of the community members I work with who and, you know, people are like, oh, now we have it running on Azure.

And now we have it running on AWS even though I work at Google is, oh, cool. So this means this is actually like a standard. That's, that's what that says to me. This means we're getting adoption. If we weren't getting that would almost be a sign. Like, Oh, maybe we have a problem. Maybe there's a problem with a product. So yeah. Yeah.

Rich: Yeah, no that's interesting because it's, I think it's um, it would be tempting, as a company that was offering, um, uh, products based on these open source tools to try to control it and lock it down more, you know. I could see how some people would have that temptation. But in the end, I think a lot of folks wouldn't use the service, right. If they knew that they weren't, that it wasn't portable. That they weren't going to just be able to roll their own. If they want.

Mark: Yeah. And like we're talking about gaming too, and why I think also interoperability is just so important. So we were talking before about how like for game servers, especially, right. Like you may sit, you may want to sit 50 milliseconds away from your players and you may not be able to do that for all players.

But not all clouds are everywhere. Not all data centers where everywhere. Um, That kind of stuff. Maybe you want to run locally inside your ins in QA, you know, uh, team or even on your own machine. And so from both a a development perspective, but also Hey, you just might need to get a colo somewhere and run some game servers there because no cloud covers a region or there's, isn't a pipe or whatever. And that can be the difference between millions of dollars. Being able to do that and picking a platform that enables you to do that is hugely powerful. And if we don't do that, then it's not going to unlock the whole, you know, the whole pie for lack of a better term. So that was always hugely important from day one. Just being able to follow where your players are, because ultimately you can guess, but you never a hundred percent know.

Rich: Yeah, that totally makes sense. Cool.

Based on working with uh, Agones and the gaming, is there a certain area that there's a lot of friction within Kubernetes? Are there like, pain points that the users run into a lot?

Mark: Ooh, there's fun stuff. Okay. So, um, yes. So there is, there's a lot of good stuff about Kubernetes. There's also some mismatch between what we do and like how Kubernetes works is there. There's probably two areas. I think that are probably interesting in some ways. So one, we talked about this thing called allocation.

Allocation is a imperative command. And Kubernetes by its very nature is declarative. Right? And usually you're like, give me 15 of these things. And it goes cool. I will now run 15 of them and I will make sure that I'm doing the right stuff. And you're like, cool. So we had to escape the Kubernetes ecosystem for like our, or usual paradigm with what we do with allocation. Allocation shows up as a resource, but it's actually an API extension. So if people have been around Kubernetes for a long time, the way that you used to extend Kubernetes was through API extensions. Basically you'd write a little web server that would output a specific set of JSON in a very specific way.

And that's how you would extend the Kubernetes API. Now you have custom resource definitions and it makes life a lot easier and it's great. But we switched to that for a variety of reasons. One, Uh, We have a declarative API. They're not declarative. We have an imperative API, which is like game server, give me a response, done.

So we don't need storage. We want an answer really quickly. And it's not, there's no point in hanging onto it. And it's just I would like one, do you have one? Yes, give me the results. Um, And just the speed of it. We don't need any of that. So I learned a lot about how Kubernetes works and wrote a simple layer that sits inside Agones is that is an API extension that only responds to create commands.

So if you kubectl, you can still kubectl like dash F it like all that stuff. It all looks the same that way. But if you try and apply or delete, there's nothing to delete. Um, It just gives you back a response. So that actually works really well. Kubernetes itself has a few imperative commands, especially around pods and like port forwarding and like other such odd things exist.

But for that, yeah, we needed something imperative. But we also wanted to take advantage of things like RBAC and like kubectl, and like that kind of fun stuff, which is just super useful to have. Um, so that's fun. We also expose that as a GRPC and REST endpoint, too, if you want, if you don't want to use kubectl with indication, you can use like an SSL cert stuff.

So that. So that works out pretty well. Game servers, themselves, the other fun thing that uh, actually talking to a few people about is the churn rate is generally much higher than a traditional pod style workload. There is probably some bleed over in like batch style processing where like you might spin up a bunch of stuff, especially like ML workloads I think. I'm not an ML person, so correct me if I'm wrong. But like maybe doing media transcoding or something where you'd like to spin up, maybe a hundred of pods to do a bunch of stuff and then shut them down later. But you might be running few thousand pods or even more, depending on how big your cluster is. And they're going to spin up and they may run for half an hour and shut down.

And then while you're doing that, you've got, several thousands all doing that thing. So etcd churn like a real thing way more than say like, I am now running a hundred web servers and they don't do anything else now.

Rich: Oh, that's a really interesting point about etcd. I hadn't even thought of that, but I definitely play a lot of games where there are instances created that are very short-lived, a minute or two.

Mark: And so we have some patterns around some of that stuff. We have it so that you can take a game server, move it back to a ready state rather than shut down completely. We're exploring some other options. We have some interesting ideas in that space. We'll see how we go. Generally. It's generally it's okay.

It really depends on your game server workload. We talked a little bit about like, if you're running like big physics workloads, you're probably running like one of those big simulation servers is probably like one of those to every one or two, even two cores. So if you have like a 32 core machine, you're probably only running 16 of them.

So your turn rates not hugely high for your individual cluster. But the whole other side of that is I know a bunch of people, we run usually what we refer to them as relay servers. They're just like super lightweight things where if you and I are playing, literally their job is to take my traffic and send it up to there. And it comes back down to you. And just make sure NAT traversal and other weird networking things that happened between like home routers is not a problem. But then you can run a thousand of those per CPU core without any problem. And so the churn rate on that is much, much higher. And so again, depending on what your workflow is, it can, you can hit some of those things.

There are some ways of breaking that glass but that's loads of fun. Or the other fun trick that comes up a lot is especially actually in those big physics heavy things, what a common pattern is is you'll spin up one of them and then fork your process. So you've got just one copy in memory of all your maps and all your stuff, but you might have three of them running or five or six of them.

And each one is doing its own thing or running its own game session concurrently. So we actually just put in some stuff into Agones to make that easier, because that does escape from the traditional Kubernetes style thing, right? Like you run one thing in a container and that's it. And so we have stuff now with allocation where you can be like, oh, if you already have a game server running, that's got room. Um, There's various variety of ways of doing that, but basically got room for another session. Give me one of those, if you can, if not keep me ready one. And so you got to do some self management there, but. You get the benefit of being able to reuse your memory resources. If you want to do that sort of multi, concurrent game session thing going on as well.

Rich: That totally make sense. And I can I think of games I've played that sort of have both use cases, where there's like a defined instance with X amount of people in it competing against each other versus like more of an open world thing where people are coming and going and...

Mark: Yeah,

actually, that's also a fun problem too. Yeah. In terms of a game session where like you play a game and then you finish, you're done, you're gone versus a big persistent world, or even just a lovey server, which is like, where, like you might just go and hang out while you're waiting for other players to show up before joining a game, where they sorta just they live for like really long times. And then it becomes different. Yeah. We actually just added some stuff too where you can do things like, Hey, if there's a game server, that's got room for five people, give me that one. Otherwise maybe then go get me another one. And being able to manage that sort of player capacity type stuff as well is always fun.

So yeah, there's some interesting patterns in there.

Rich: So in this role, you know, working with these folks in the gaming community, what's it been like overall? What can you say about the folks who are like running these game servers and working on the other side of it?

Mark: Yeah, I just say in general, like I've met some just genuinely wonderful people across the games industry. It's one thing that's been wonderful about that is that, the term game developer could be anyone who you're from a programmer to a UX engineer, to a narrative writer, to a level designer, to an artist, to like concept design, to a technical, like technical environment.

The amount of cool stuff that I've been able to expose to, and some wonderful people who I've been able to meet through that industry has been wonderful. So that is great. Gaming itself. It's, especially in the last few months has definitely had its highs and lows in terms of culture. Possibly some more lows than highs.

But at the same time, there is some just delightfully wonderful people that I've definitely met through the industry. So that's been really cool. It's it's also been really neat in that like, I started doing stuff in the games industry when I moved to Google. I sorta had the, I had an idea that I wanted to do it when I was going through university, and I was living in Melbourne Australia and there were two AAA game studios and then there was one then and my friends there were like, it's great. You work 80 hour weeks and you get paid like 30% less than everyone else. And I was like, so web development seems good. Um, And. Anyway, so yeah, joining Google so there was a gap that I wanted to fill in terms of, gaming and cloud, and so sort of chase that.

But one thing I've always really enjoyed is taking the things that I really am good at and enjoy from a community I'm already established in or know a lot about, and then bringing that into the area in which, I'm new at. And that sort of it really works very well for me throughout my career.

And yeah, coming into gaming and meeting people who understand game infrastructure, and we've talked about different like load patterns and just different tech and like that kind of stuff. I was able to bring the stuff I knew about containerization and open source and distributed systems and some of the stuff that I'd done in wider tech and then meet those people and we could meet wonderfully in the middle. And build stuff. Like we worked on like Agones and things like that. And things like Quilkin. And yeah, it's been a wonderful ride. I really can't complain too much.

Rich: That's fantastic. It really sounds like a fun gig. I'm very happy for you.

Mark: Yeah, it's been great. I really enjoy it. Google seems to like me doing it. I've been promoted a couple of times, so, yeah. Thanks.

Rich: That's awesome.

Mark: It's been good.

Rich: Cool. Well, I really appreciate you coming to chat today. Mark. It's been so nice to see you again, at least over a video camera and to hear about what you've been working on. I'll make sure to put a lot of links in the show notes um, to Agones, and these other things that, and Quilkin and these other things that we've talked about. Do you have anything you want to plug? So you're @neurotic on Twitter.

Mark: I am @neurotic on Twitter. Absolutely. That's great. So please come see me there. If you have any follow up questions, my DMs are open. I'm always open to chat. Get, come join the Agones project. We're always looking for contributors. There's a Slack, you can go do that. Uh, There's projects that other projects as well, that we work on open source for gaming.

So there's Quilkin that I work on as well. We have a Discord for that, which is a open source, a UDP proxy for a game servers. Other teammates of mine work on some other projects, which is Open Match is another one. So it's a framework for distributed matchmaking. So if you want to have a matchmaking pool, Global scale.

And then there's another project called Open Saves that other teammates of mine work on, which is like an opinionated framework for data storage across multiple providers. So if you're in, yeah. So if you're N, your team is I need to store data, but I don't want to care about where it goes.

I just want you to make the right decision. That's a nice little project there for you as well. So we do a bunch of stuff. what is it is the organization that everything sits under.

Rich: Okay, cool. I will make sure to link to that. And for folks that are listening I'm hoping to get this episode out before KubeCon. Mark is not going to be able to be there this year. Unfortunately.. I will be there though. If you're a listener and you want to say hi probably the best place to, to keep track of me is on Twitter.

I'm I'm @richburroughs there. I'm also going to be in the booth for my employer, Loft Labs, some. So I'll make sure to tweet out when that is happening. Again, thanks so much for coming on, Mark. It was a pleasure to chat.

Mark: My pleasure. Thank you so much Rich for having me and hopefully, yeah, we'll get to have a beverage of choice in physical reality at some point soon.


Rich: Kube Cuddle is created and hosted by me, Rich Burroughs. If you enjoyed the podcast, please consider telling a friend. It helps a lot. Big thanks to Emily Griffin who designed the logo. You can find her at And thanks to Monplaisir for our music. You can find more of his work at Thanks a lot for listening.

Mark Mandel
Broadcast by