Justin Garrison

In this episode Rich speaks with Justin Garrison from AWS. Topics include: Why Justin left his college C.S. program, writing a scheduler in Bash, running animation jobs with k8s at Disney, monitoring cloud native services, making Kubernetes TikToks and streaming Containers from the Couch, EKS, Karpenter, and ECS.

Rich: Welcome to Kube Cuddle, a podcast about Kubernetes and the people who build and use it. I'm your host, Rich Burroughs. Today I'm speaking with Justin Garrison. Justin is a senior developer advocate at AWS. Welcome.

Justin: Hey, Rich. Thanks for having me.

Rich: Oh yeah, you bet. Uh, I've actually really been looking forward to this. Uh, you and I have met several times in real life and had some good conversations. We had a nice chat in Valencia. So, uh, I'm looking forward to getting to, uh, have a conversation with you to share with the listeners.

Justin: Yeah, thanks. I've been a fan of the podcast since you started it. I've listened to every episode and love the people you have on, so I'm, I'm thrilled to be a part of such a great group.

Rich: All right. I'm, I'm definitely quoting you on that. I'm definitely like putting that all over the Kube Cuddle website. Um, so I start off asking folks, um, how they got involved with these crazy computers in the first place.

Justin: Uh, I needed money. Um, , I, I mean like I did not have a computer growing up. I did not grow up with technology. Um, I, I managed to go to college and I had two jobs throughout college and I needed a job that I can work on the weekends and there was a computer lab in my dorm, um, that they had people that would come and help people print and, and fix the computers whenever needed, but really just babysit the computers.

And so I signed up for that on the weekends and that was my weekend job was Saturdays and Sundays. I would babysit the lab and I would do my homework cuz I would be in there for you know, eight hours a day or so. Um, so it was a great time for me just to dedicate doing homework, and I got paid for it. And eventually I started to learn a little bit about computers.

I didn't have a computer of my own, so I had to use the lab's anyway. Um, and it wasn't until the next year that I actually got my first computer and then I started doing more things with the computer department and, and ended up working at the university for nine years, um, doing, you know, help desk and desktop support and sysadmin roles and then just kind of kept growing from that.

So, yeah, it was, I, I needed money as a college student and I was like, Oh, this computer thing looks really easy. I'll do that.

Rich: So what, what was your major when you were just working in the lab?

Justin: Uh, I didn't have a major for the first two and a half years while I was in school. , uh, I narrowed it down to the things I didn't like to do and ended up on math and physics. Uh, I just, I, I liked those things. I dropped out of a computer science degree cuz it did not make sense to me. And I was just like, you know what?

The computer thing's kind of hard, but I really like this physics thing. And so I did that.

Rich: When you say it didn't make sense, how's that?

Justin: Uh, I remember my first computer science class, and it was a C, C development class. And I, I got through it, uh, with a lot of help, but I was like, I do not understand what we're doing and why we're doing it. Um, I ended up taking, I think it was four C.S. Classes, um, a couple C++ classes and a independent studies, which was my most, my most fun I had where I designing a UI in C++, and I really liked the visual aspect of it. And the, the professor had this application that they wrote and they're like, I need a UI for this. And I was like, cool. And, but it still didn't make sense to me. I couldn't visualize how like the program would work and it was, you know, a decent amount of, it wasn't a huge program, but I remember, to get an understanding of it, I'm such a visual person, that I actually went to the printer in the lab that I was working in and printed out the entire program. I printed out the program and, and laid it out in the lab, on the lab floor. And then I started taking a marker and marking up this calls that, and that called this. And this is where I need, And that's the only way that I could understand what the software was doing.

And then I was able to say, Oh, so I need to add something here. I need something like this. And I, I kind of went from there, um, because I really had, I couldn't just look at a text file and say like, Oh, I get it. Um, and so I, I dropped out of a C.S. Uh, degree as like what I was thinking of doing. And I was like, I just didn't make sense to me.

And it wasn't until my first full-time job that I actually started like programming made sense cuz I was solving problems not, uh, fulfilling educational goals.

Rich: Yeah, absolutely. No, I think that's a really good way to put it too. And you know, I've, I've, uh, talked to a lot of people who come from lots of different backgrounds, you know, and like I was, I was a Theatre major in college and I dropped out, right. And, and I went back and started studying, um, well, taking some core classes so I could study Computer Science and got a job and was like, All right, I'll just do this for a living instead of like, go to college.

Um, but, but it's interesting because it definitely seems like there are a lot of folks in the Kubernetes community who don't come from that kind of traditional computer science background.

Justin: Yeah, and I, I didn't do a lot of development for a while. I worked help desk for seven years or so, and so my, my background comes a lot from like, documentation's important and making things understandable for users is super important, and that's really where I kind of like, Made a lot of career growth is, uh, the more senior I got, the more, the less code I wrote anyway.

And the more documentation I ended up writing and making that documentation understandable and navigable and, and able to let people solve their own problems by something I wrote a year ago, um, was way better than, you know, spending some time on code or adding some tests to things because I was like, actually no, helping the user do the thing was the more important part.

Rich: Well this is all super interesting and it fills in some, uh, some things about, about your particular skill set that I think you have, that I actually wanna get to in a bit. Um, but, um, but what specifically about Kubernetes, Like what got you into like the cloud native stuff?

Justin: I got into containers pretty early, uh, when I was solving problems at work. Um, just trying to figure out like how to make things more packageable. Um, the place I was working at, uh, we were a big Red Hat shop, and so we had a lot of pa, you know, RHEL packaging. We were writing spec files and RPMs, and I was like, you know, like, this isn't doing enough for what we needed.

Uh, and, and we were doing a lot of, you know, more forward thinking things like software collections. No one even talks about that anymore. But it was like this like loose way to, you know, contain your, um, your path and your, you know, dependencies. But containers really was just like, Oh, actually this is a holistic way of doing it.

This is a more holistic approach of, I need users, I need network, I need all these things that could be isolated. And so I started building that at work and just figuring out, hey, where would this fit and how does this work? And um, I remember seeing a talk about Kubernetes at, uh, Southern California Linux Expo, and this was a while ago, and they were talking about pods and containers.

I was like, I don't understand this. I don't understand why I need this. And I was going down the route of looking at Mesos and Mesosphere, and even Nomad to some degree because those were all kind of already in the area of like, I was creating VMs and running containers on them. Uh, and I was the scheduler.

I had to, had to, I literally had a spreadsheet that was like, this server has these apps on it. This is the ports they're on. When I need to update it talk team and like that was, that was how we did it, that we were on-prem and this is what we were

doing.

Rich: It's, it's so funny that you mentioned that because I think Kelsey brought that up in the interview that we did, um, as well. And it's, if not, it was another discussion that I've had with him in the past. But, um, it's absolutely true and people, you know, people I think don't necessarily get the way we were doing things like ten years ago.

And, and I was that, say, I was the scheduler too. I was that I was, that that person who knew which service ran on which hosts and had it all memorized and,

Justin: yep. Yeah. The meet scheduler, it was, uh, you were , you did the thing, and, um, I met Kelsey at. uh, HashiConf when they announced Nomad and he was giving a demo about Nomad. And so I was, was great meeting him, you know, in person and talking to him. And I ended up inviting him. Um, it was right when he was switching from CoreOS to Google.

I said, Hey, can you come do a similar training day about Kubernetes at, at my work? And he, yeah, he agreed to do it. This was, this was like a prototype for like Kubernetes the Hard Way. Um, he had another repo that he was like playing with and we had a two day training, which was awesome. I don't remember what year this was now.

2016?

2016, Yeah,

Um, and, uh, and we had a great turnout for it and a lot of people started learning it and, and that really, those conversations kind of changed my mind about what Kubernetes was because I thought it was more monolithic, like Nomad was, where Nomad didn't have a flexible scheduler. It didn't have some of those like hooks to things.

It added some huge benefits in speed and, and native execution. You didn't need containers, sort of stuff. But it wasn't really what I was looking for. And then the more I was learning about it, the more I was realizing, oh, actually the flexibility of Kubernetes does solve these problems. And so I wrote a scheduler in Bash, as a prototype of just like, how does this work.

Rich: You, you what? Can you repeat that?

Justin: it's on GitHub. It was, it was my sort of like learning experience of learning the API. So if you look for bashScheduler, um, it was my first like, I started learning Kubernetes by poking at the API. And every lunch break I basically was like spinning up a cluster writing some Bash. I'm like, how does this work?

Where does this node come from? How does this piece work? And I started just, it was just, you know, randomized containers. It didn't place them really, but I wanted to learn how to extend it, what the API was like. And the language I was most familiar with was Bash. And so I was like, I'm just gonna do curl and sed. And that's all I needed. I was just like, I didn't even use jq. It was really ugly, um, at the time.

Rich: I, I have to say that, that "I wrote a scheduler in Bash" is a sentence I didn't ever someone utter,

Justin: Yeah, but it was my, it was my chance to learn it and then my chance to show it to people. Cuz it ended up being, you know, it was less than a hundred lines when I first wrote it. And I was like, I just told Hey, this is the API, this is all you have to do. And I was working at Disney Animation at the time, and so we were scheduling jobs for render jobs.

I was like, how do we, we had a custom scheduler for those jobs and I was like, this actually could be a better way for us to manage this stuff. And so I was showing that team like, Hey, this is kind of how this works. This is where these things fit in. And so I just started getting more and more into Kubernetes, learning how the structure of the API was the benefits, right?

The container orchestration was great, but all the hooks and the flexibility and, and CRDs, you know, at that time was third party resources, whatever. The way to extend it was and why it was so powerful.

Rich: It's interesting that that's your use case, and that you mentioned Nomad, because I think that like batch jobs is probably the most popular Nomad use case

Justin: Yeah.

from what I've heard.

Yeah. And I was, I was heavily looking at it, uh, because that was, but again, the way that we were doing our jobs needed very specific scheduling placements. And, and Nomad didn't have that flexibility at the time. And so I was like, that's why I started with Mesos. Cause Mesos had all the flexibility where you had their frameworks and, you know, two stage scheduling and stuff.

And I was deep into that I'm like, but I don't know Java. I was like, I can learn Java for this thing, but let's, let's figure it out. And then I was, you know, Nomad was so much simpler and so much easier to get going. And it was, it was so much faster and I didn't need to containerize everything. A lot of benefits there.

And then once I went to Kubernetes, I was like, Oh, actually I get the other parts of it. And I still wanted native execution. I still wanted some things that, that Kubernetes couldn't do. Um, but the other areas of extension made a lot more sense to me.

Rich: That's all super interesting and I think that like, um, I dunno, I was a HashiCorp ambassador for a couple years. I'm a big fan of them and their tools. Um, I would say that I am probably as big of a fan of Nomad as you can be without ever actually having used Nomad. Like, I don't know that I've ever run a single workload, you know, with Nomad, but I've, I've always thought it was interesting and, and my take on it has been, at least recently, you know, within the last few years has always been that, like, I think that if you have maybe more limited use cases, and especially if you're an environment where you're using all those other Hashi Corp tools and you wanna like bank on those integrations that they have built into things that, that, um, I think it definitely could make sense, you know, more sense than Kubernetes for some use cases.

And, and, um, it seems like there are people out there who use both too. You know,

Justin: Yeah.

Rich: Nomad for some things and, and Kubernetes for some things.

Justin: And Nomad still has some super amazing features like multi-data center awareness, um, via Consul and all these things that I, I think are fascinating use cases for people that really, they want to do, that they have large batch jobs and services and native jobs and all that stuff makes perfect sense in Nomad.

And people have been bolting that kind of onto Kubernetes sometimes in a lot of cases like actually the thing you might just want is Nomad. Um, because you get, you know, Vault you can have Vault, you can have Consul you can have, uh, Nomad as a scheduler, and it brings a lot of that in there. Um, I think that Kubernetes on the, because of that flexibility, because of all those hooks, because of the other areas of just like, actually what a lot of people wanted was web services.

And what people just wanted was horizontal, scalable web services. And, and they didn't really care about fast placement of jobs because they could scale up a little sooner, right? Like there's a lot of things are just like, actually some of those benefits for a, a broad swath of people was just like, actually I just run a Python service and it's okay if it takes a second, two seconds to start up.

I don't care if it scheduled that fast. Just, just get it out there. And all of the ecosystem of Kubernetes really came around that. It's just like, Hey, we can make this thing really easy. And, and, it resonated with a lot of people.

Rich: Yeah. And I think that, again, you know, if you, if you sort of travel back in time a little bit to those days when you and I were like the schedulers, you know, it was like I, I was working at a Java shop and we had a bunch of Java services and I would go into this custom UI that we had and I would press a button to spin up a new copy of the service, right.

Or to restart it or something like that. So like, Kubernetes scheduling something within a minute of it dying is like way faster than I could ever have done that, right.

Justin: And I, I don't know about your first time like running Docker or a Kubernetes deployment. Like my first time running Docker, I thought I broke something because I ran this, you know, "docker run." And I was like in a new weird shell. I was like, it broke my shell and I was like, Oh, like how could this be? And I'm like, wait.

It couldn't have started that fast. There's no way that this worked like that. But now I'm root, like, is this like some weird, you know, escalation, vulnerability. Yeah. It was in the container.

And I was just like, What is going on? And I was so confused. And same thing with Kubernetes. When I ran my first Kubernetes deployment, I'm like, okay, Kubernetes deployment, You know, give three replicas of NGINX. I'm like, oh, well it broke, it didn't do anything. It couldn't have, you know, it returned way too fast. It couldn't have done anything. And, and then I go back and look. I'm like, Wait a minute. It says they're running, What, what's going on here? And I had to really dig deep into it. I'm like, Wait a minute, this can actually be how fast things are.

Because I'm used to cloning VMs. I'm used to, you know, taking these like longer, bigger processes and making them into something that I could replicate. And, and it was always like, Oh yeah, I wait, you know, a minute wait two, wait a half hour, whatever it was. Uh, and then containers came around just like, Wait, this is like less than seconds.

Like, I don't, this

Rich: Well, and, and think about, think about the feedback loop too, where, you know, a service dies on a weekend and someone gets a page and maybe they get to their laptop and can respond, you know, half an hour later, and then they spend some time digging into it and, you know, start the new one. And it's like, yeah.

Any, any kind of automation, you know, that you can have that's gonna, that's gonna schedule that job again, um, is, is likely to be faster. Unless, unless it's like a cron job that runs once an hour or something. Um, so, so you're at Disney and you're using Kubernetes and it's very early, and you're in this situation where you're, uh, I take it trying to kind of evangelize this thing, trying to convince other people inside the company that, that you all should adopt this platform.

Justin: Yeah, I, I,

Rich: And how did that go,

Justin: It went terribly. Um,

Rich: I've, I've heard a bit of this story already, so I'm not, I'm not surprised by your answers, but, um, but I, I think that there are probably a lot of people in this situation with Kubernetes and with other tools too, you know, who've been down the same road where they're trying to get folks to adopt something that they know really would help everyone, you know?

Justin: Yeah, and I think as a, as an engineer, I often get, you know, narrowly sided on like, the technology is gonna solve the problem. And, and the problem isn't a technology problem. The problem is, uh, people and training and inertia and other tools and integrations and all of the other things on top of the one technology and, and I focused way too much on everyone should go to Kubernetes 'cause I was sold on it 'cause I experienced it.

And then it was, you know, a little while of building, you know, what, what was there, but then way, way longer of I need more docs, I need more training, I need more examples, I need more tools, I need more integrations. All these things that we had already for how people were doing things, uh, those were all the gaps and, and those were the things that uh, needed help.

But also, you can't do it alone. You can't be the only champion of a new technology and a new service, and you have to have at least some level of buy-in from people, from your customers. And, and that I, I did not have, I had a couple teams that were all excited and ready to use it, but then others were, were absolutely against it because I very much underestimated how much that might encroach on things that they had built and, and the actual, like emotional response in the, you know, professional response of like, I understand it now, but at the time I was way too, too lost in the technology to understand, like, this actually makes them maybe feel like they're not good enough or the thing they built is no longer good enough.

And that actually can replace, you know, that as like, Hey, you're the thing you have been doing for the last 10 years. Don't worry about it this thing's better. It's not a great way to talk about something, and it's not a way to like get buy-in or other people excited about it. And so I was very ignorant to those sorts of things of just the human factor of what a new technology like Kubernetes can be in, in an existing organization with established norms and services and technologies.

And, and so it was, it was definitely not a, a great fit, uh, at.

Rich: Yeah. I mean, I'm sure that it would be much easier now. Right? And, and you know, it's become such a standard thing that like, you know, if you went into a lot of shops and said, Hey, we're gonna adopt Kubernetes, it wouldn't be a shocker to anyone. But, um, but yeah, back then, you know, it was, it was a lot more bleeding edge and, you know, when you, you mentioned Mesos earlier and, you know, there really was a conversation to be had even, you know, back then as to whether you should be using Mesos or Kubernetes.

Um, but I think that, um, you know, I've been through that same kind of, down that same road. And, um, I, it was a different tool, but I, I once was brought in to a company specifically to implement certain tool. And was told that everybody was on board, and arrived to find out that wasn't the case, you know? And, and I think that, um, it was, it was kind of ironic because, uh, I ended up moving on to another opportunity and, and by the time that happened, that was just about the time where people started to come around

Justin: It,

Rich: that, it was the right move.

Justin: It, it takes time. I mean, the technology, the thing that you can build, uh, you can always prototype something faster than you can change someone's mind about it. And in understanding where to get the buy-in and where to spend your time is actually more on the people side of it than the technology side, is a hard thing to learn.

Especially as an early engineer where I'm like, Look at all these problems I can solve. I was like, Actually, no, don't worry about those. Get buy in from one or two people, at first, and then figure out what the problem is and understand the user base a lot better than just going out and saying like, I'm sold on this, so everyone else should be too.

Rich: Yeah, I feel like people really need to see things work too, right? Like you need to have some sort of a POC or something that really clearly shows people like how this is gonna help solve their problems. That, that, just hearing that, or like reading some blog posts or whatever isn't enough to like, uh, really make it for, for most engineers, I

Justin: Yeah, And focusing in on what the next wave of problems would be is always difficult too, because you can compare two things. Here's my POC, it does half of what your existing thing does, but it focuses on these three or four things better. And someone to look at, it's like, well, it doesn't check all the boxes.

It's not a complete solution because the other thing we have now does, you know, X, Y, and Z? Or it does something better. And it's like, actually, yes, we can add some of those things, but what's the important thing that you need and how is what you have now gonna get you to the next scaling or, you know, speed or whatever it is you're trying to do, those are the areas that are really hard to focus on and hard to sell people on because again, it's just like a checklist.

The POC is gonna fail every time. Uh, but understanding where customers are going and where users are going and how technology is advancing, those are the areas that are really important to focus on.

Rich: So I know this was a while ago, but are there, are there any specific things that you can remember that people either didn't understand about Kubernetes or like, you know, kind of pushed back on?

Justin: One of my favorite examples is monitoring and understanding what containers are doing and servers are doing, and how you're actually monitoring them. And, uh, we had tooling internally to do monitoring and it was all, you know, VM based, so things didn't change very frequently and, and containers are coming and going, right?

And we're just like, Oh, you can monitor this, you monitor that. And it's just like, well, how do I, how do I monitor one container? I was like, Well, don't worry about one container, worry about the service. They're like, No, that doesn't make sense. I need to monitor the container. And I had multiple conversations and I, you know, I was talking to vendors and people that help us do some of that stuff.

And, and one person I remember very adamantly wanted to put IPMI in every one of the containers. If you're familiar with IPMI for scraping the, the path of like, it's very old school way of like monitoring devices. Like, right? Like I have a switch that gives me IPMI, I have some light bulbs or whatever.

And it's like, those are the most cryptic sort of paths and numbers and things that you, you can't understand. And in trying to say like, Well, every container needs IPMI, I need an IPMI endpoint that I can scrape. And I was like, Wow, like this is, this is gonna be a different shift because these aren't devices that we install once and they last for 10 years.

This is a container that lasts for second. And, and you able to, you know, scrape those things. Like we have to change the conversation about like, okay, what are you, what is the goal that you're trying to do? And in this tool, I understand was what they wanted and what they knew and how integrations worked for them in their world.

But it was not the conversation we needed for this type of monitoring going forward. And so at that point, it was really about, okay, let's take what you know, let's take the scraping side of it. Let's see your paths. How, do you do that? Now let's look at Prometheus. Hey, cool, it's a pull based model.

There's paths and there's these variables. How do we pull that in to like give you these metrics similarly to what you understood and make sure that you came along to understand the new thing that, hey, look, there's a lots of similarities here. We're not throwing away all of your knowledge of how this stuff works, but we're gonna change a few things to help you get to the next wave of monitoring at different scales or focusing on different things. I don't care if a disk is full, I care if a service is down. I don't care. You know, like those sorts of things are, were way more important. But it was hard to kind of shift some of those things forward because they were so used to, what's the temperature of the CPU?

It's like, don't worry about it. Like there's, it's a cluster of machines, like I don't need that one CPU core temperature anymore. I need to focus on the, the service level. And, and so bringing people up to those sorts of speeds and, and different ways of thinking about without throwing all the stuff away.

These people are super smart. They've had years and years of experience. They've done amazing things. And we don't wanna say like, all that stuff is crap now. Like, you just throw it away, right? Like the people are important and bringing them into how do you do something in a different way with newer tooling.

Those were really hard.

Rich: I've absolutely seen what you're talking about. Like I've been on teams before where I was the person who was trying to make that argument that like, we shouldn't care about you know, monitoring every instance of an app really closely and like looking at things that way. What we really care about is service health, and like for people who are used to doing it that old way, I think that is really hard to get their heads around sometimes.

Justin: And it's not to say, hard drive being full isn't a problem. Right. But like the service being available is the thing we should focus on. I, the same cluster I was troubleshooting or, or or setting up centralized log collection. And so I ran a container with the "yes" command. And if you're familiar with the "yes" command, it just spits out Ys, right?

It just does it as fast as possible. And there's actually studies on like the "yes" command is faster than a screen can print. Like, it'll print yeses faster than anything else. And yeah, it's amazing, like how fast it can do, "Print a y." And so I ran this container collecting some, you know, logs centrally. And I come back the next day and all the services are down.

I broke the entire cluster because every one of the hard drives filled up with yeses. They are all Ys the entire hard drive. These are bare metal servers. These are like 500 terabyte drives and they just had yeses throughout the entire thing because it printed so fast and Kubernetes failed over to the next one.

It's like, oh, that drive's dead. Hey, go to this one now. And it just stuck my pod on the next machine, filled up the hard drive. So that's important too. Like you need to be aware of those things. But the service meter, when I came in, it was like kubectl, like where'd my API go? Like, what's going on? And, and then I had to dig into it and those were like, Oh wow, the hard drive full, What happened?

I'm like, Oh wow. There's like a terabyte file with all the Ys. This is not good. So, It's important, but like knowing where what level or what layer of the stack you kind of need to focus on is, is even more important.

Rich: That's, that's super interesting.

Justin: The "yes" command

Rich: Let's shift a little

Justin: has caused more outages than benefits in my life. I'll tell you that I have other stories, but it's always fun.

Rich: Maybe, maybe, uh, if I start a podcast about the "yes" command, um, you'll be my first guest. Um, I, I wanna shift gears a little bit and get into, um, your, uh, career as what I'll call a Kubernetes influencer. So, so you've become known for making these amazing TikToks where you explain concepts like what a container is, or, um, the ones about the Kubernetes auto scalers, you know, how they work.

Um, were, were very popular and I think that you've done. Um, at least one conference talk that was a TikTok video. And I wanted to, to get into this a little bit and, and ask you about how you got into making TikTok. So like what about that, uh, platform specifically appeals to you?

Justin: I, I, I always try to teach the way I like to learn, and so until something makes sense to me, I just don't have an idea of how I would, you know, you can't teach something you don't know. And

Rich: Yeah.

Justin: I learned as a kid a lot from people like shows like Bill. Bill Nye the Science Guy, was like my jam. I love that show.

All of his like crazy like ways to represent things and show things off. And uh, and as an adult, like I love Alton Brown, right? Like, like his cooking style of like, he shows what's going on inside of it. And even like Adam Savage, like MythBusters, like they do like these very like extra like visually representative things of just like, this is what's happening. I'm gonna show you this thing. And at my time at Disney Animation, I learned so much more about how important the visual side of things are for like, I can tell you a, I can show you a story in literally 10 seconds of animation. And, and I saw those over and over again. I was just like, this is amazing.

Like you have no words, no music, but I understood the entire life cycle of that story wrapped up in just that little ball bouncing or whatever it was. And so those sorts of things just amazed me at how fast people could connect to the stories, connect to the things, and learn something from it. And I started TikTok like everyone else during Covid and just like watching it.

And the more and more I started watching it, I was like, I'm learning a lot from this platform, from all the people that were like, I wish I knew this before I was 30. I wish I, you know, like all of those sort of things was like, I had no idea you could do that. And a lot of them were like practical, sort of like what we used to call life hacks like in older generation.

I was like, Oh, these were life hacks, I'm. Like actually it's just something that everyone should really know. And it was just common sense for a certain age group or, or certain, you know, group of society. Uh, but a lot of people still didn't know it. And that information isn't evenly distributed and, and knowing how things work isn't something that is just universal.

And, and so I, I've been trying to do this way before TikTok. Uh, I have a video literally still on hard drive from 2010, I believe, when I was writing for a website called How to Geek. And I was trying to get them into, doing video productions. And I have a very terrible video of me explaining how hard drives works.

And it was around when SSDs were coming out and I was explaining the difference between what is a, a spindle drive and an SSD. And I had, uh, literally like a corkboard for an SSD of like, they do wear out, like you can pin it so many times. And I had filing cabinets and I was explaining RAID. And I literally had like videos where like I was taking pieces of paper and I was putting part of it in this drawer and part of it in the next filing cabinet in the other drawer to explain how you replicate data.

And then I can put these things back together if I lost a filing cabinets. And so that was just like 12 years ago I was filming these videos because I was like, this is how I like to learn it. This is how I like to explain it. And, and I was doing those things in sort of like a medium long form formats and they were kind of hard.

They were, they were hard to kind of keep the sort of, uh, inertia or at least just the engagement of just like, Hey, we're making this exciting and fun cuz you do have to go like deep into like, this is this thing that I want you to understand better.

Rich: Yeah.

Justin: On the TikTok side for, if it's a minute, I'm not gonna go deep at all.

I'm just gonna tell you something exists and I'm gonna let you go as deep as you want. You can just know that something vaguely works this way or I understand that, oh, this is a thing that's here. Now, what else do I wanna learn about it? And I, I've gone down plenty of rabbit holes myself of just like, Oh wow. What's that thing? Let me open a browser. I gotta look this up. Oh, I found this. It's in Pocket now. I'm gonna read it this weekend. I found a book. And like I can go deep on some of that stuff real quick, but just like, this is interesting. And on the TikTok side, I'm really just trying to help people be more aware about some of these things and give a general idea about, Hey, what is this thing? How does it, how vaguely why does it exist? Cause you have the, the kind of what's, what is that? I can tell you, Hey, here's a website. Go here. I don't really care about that. You have the really deep stuff that's like, this is the code you need to write to do that thing. That's okay, but it's really hard in a minute format.

And there's that middle ground of like, Why does something work this way? Why is this important? Why do I care? And in knowing, getting people to get to the why does do I care? That's kind of the bridge a lot of times for people to know, like, what exists and how do I do it? And I try to focus a lot on my stuff on that sort of, why does this, why does it work this way?

Or, or why is it here? Um, what problems does it solve? That's sort of like the area that I try to do. And again, because I like props, because it, I like visualizations because that's how I think about it. Uh, that's how I represent it. One of my, my first KubeCon talk was I was up on stage and we built "Kubernetes" with a spreadsheet.

I had a Google Sheet, did what I did with containers, and I had four people on stage that were my, actually I had five people, like my scheduler, my controller manager, and three nodes, and I gave them balloons and my, my nodes had balloons to run, and people told me which node to schedule things on, and that's we recreated, I was the API server and we recreated what the workflow that Kubernetes goes through when I say, Hey, give me a job. Okay, it's a Ruby job. Here you go. I'm gonna put a one here. That's it. I just put a one here. Now the scheduler or controller manager needs to expand that. Now the scheduler needs to schedule Now the kubelet needs to pull that job and, and like just going through that workflow a couple times and all of the complexity of Kubernetes kind of makes a little more sense and you're just like, Oh, this is what it's doing. And once you understand where those pieces fit, it was a lot easier to troubleshoot things too, because it's like, Oh, this pod isn't getting assigned.

My scheduler's probably down. Right? Like I, I know immediately like, Oh yeah, I see the pod there, it exists. It's in the, you know, my controller manager made it. Kubelet's not running it. Oh, look at scheduler. And in helping you figure out like why some of those things exist and what problems they solved, um, was really where I tried to focus a lot on like the TikTok side of things.

So I just have a lot of fun with, how do I explain something in an entertaining way. It's really hard to get it to be a minute. I always, I don't try to do the three minute ones. I try to do one minute because that's kind of this attention span area, but also like, YouTube still limits to a minute. Um, it's a little easier.

Like I will give, I will watch almost anything for a minute. I will waste a minute anytime. Like, it doesn't matter, like I'm, just like, Yeah, I'll, a minute? Sure, no problem. If it's three minutes, I'm like,

I don't,

Rich: I'll, take that challenge. I feel

Justin: And

Rich: could make a video you wouldn't watch

but

Justin: But once I get to like three minutes or five minutes, I'm kinda like, ah, I'm busy. You know? Like I don't, I don't know if I, I don't know if I have the time for that. Uh, it's only five minutes, but I'll watch, I'll watch ten one one minute videos to look for my answer before I'll watch one five minute video that I know solves my answers, my question.

Right? Like, I don't know why that is, but just the psychology behind it I'm like, Yeah, no, I'll spend more time on one minute videos. And, and so it's been challenging for me, but it's been really fun. It's been really, I'm having a blast with it just because it's, I get to be a little more on the creative side of like, I write code, I do a lot of development stuff.

I do a lot of technology and then I try to just explain it to everyone else. And I try to make it consumable for a lot more people that maybe don't come from the same background, that learn things the way I do. And just helping them in those ways is like, Hey, it's also okay to be weird. It's also okay to do things and teach things in an unconventional way, and that is totally fine.

And, and giving people some of that freedom. I love seeing other people like do similar formats of like, Hey, I don't have your background, I don't have all of the like wealth of knowledge that you do have from, from your experience. So you're gonna do a different demo that might be similar. You might do a drawing, you might do a different prop, whatever it might

Rich: Yeah.

Justin: But it's great just seeing people take those things and run it for themselves.

Rich: Do you like have a topic first and then think of a way to explain it? Or do you just, do these ideas for the visualizations pop in your head first, or how does it work?

Justin: For most of them I have a list, I have a note, and I just, anytime someone asks me, Hey, how does this work? I write it down. And I don't, I don't care, like I don't, I don't have any clue. I may not even know what it is, right? It's like I, I don't know what that is, but I'm gonna research it. And at some point it might come up and it might be something that's like, Oh, I need to learn this.

Uh, and then I just keep that note. And then at the weirdest times, an idea for how to represent something comes in my head. Uh, I might be out of the store, I'm like, Oh, this is it. I don't know why this is it. Um, but like the, I don't know why it works that way, but having that note and having it written down where I don't have to think about it all the time, but occasionally I'll remember one.

I'm like, Oh, actually, you know what? This is kind of a perfect representation. And they don't, they're not all amazing. They're not all like, Oh, that's the best representation ever. But how it clicked in my head of like, you know what? I just did one on, um, like, uh, probes, application probes, like startup probes and liveliness probes, and, and I had like squeaky ducks and like a, a rubber chicken.

And I'm like, it's giving literally like a sound like to me like whenever, like that health check's happening, it's like squeezing a chicken, those rubber chickens. And I was like, I can hear the honk almost. And, and like that's the thing that's just like, the application is like, yeah, I'm still here. Like, as long as you hear the sound like kubelet just keeps running, it's fine.

And, and those things are important, but just tying those things together, just like, Oh, when I think of it, I'll make a list of three or four things, and if I get some of them, I'll get a theme. I, I did one on persistent volumes, and I just had buckets. I had buckets in my backyard. I'm like, Oh, it's a bucket.

It's just a store. Something to store anything. And I don't, it's not, It's NFS, it's s3, it's, I don't care what it is, but it's a bucket of storage. And now I had a theme and so I went out and I recorded I think four of 'em that were all storage based, based on these buckets. And I'm just like, Okay, cool. And I'm just gonna do it.

I'm just gonna record four of 'em. They're all part of the theme. And now that prop I can reuse and I can always rely back on the like, Hey, this is the theme I have for storage is different size buckets. And the the auto scaling one, I have little plastic containers with water and a big four by four. That's my memory, my memory killer. And like those themes are just something I have now, and I just keep reusing them. Anytime I'm representing something in that space, I just fall back to like, Hey, how would this fit that analogy? Is it, does it fit? If not, it's okay too. But in a lot of cases, I keep reusing the same sort of props of like, Hey, I have a bunch of, uh, like play balls that I represent for like workloads and like, I bought a bag of 'em for my kids for summer and I was like, I can just reuse these. Like, these are, these are workloads. I don't care what it is. I don't care what's inside, but this is how I represent it now. And, and so I using that sort of like inertia of just like, hey, now I have like this garage full of props, um, that work for some of these things.

Uh, it just has been easy to kind of rely back on them and like, if something is new or on the edge, I'm, like how would I represent secrets? How would I represent, you know, something else in this space? Cool. I'll probably be like a locker, a safe like physically

Rich: That's super, super interesting that you're kind of developing this visual language, you know, that you're reusing. Um, that's super cool. Uh, one of the other things that you do is, uh, you do this live stream called Containers from the Couch, and I was, uh, had the pleasure of being on one of the episodes. Uh, I'll, I'll link to that at the show notes. We talked about vcluster one of the open source tools that I work with a lot. Um, I think that people who stream about a bunch of different tools like that are in a really interesting position, that they have a chance to kind of get a view of maybe more of the ecosystem than the average person does. Uh, I'm wondering like if that's been the case for you and and maybe what some of the things are that you learned about.

Justin: Yeah, absolutely. I mean, it's just like starting a podcast, right? You start a podcast so you can talk to cool people and, and maybe you wouldn't get a half hour of their time, you know, if you didn't have a podcast, but you're like, Hey, come on my podcast. And they're like, Yeah, I'll talk to you. And that's how I used to, I had a podcast back in the early 2000s and that's how I talked to cool people.

Cause I'm like, Hey, come on my podcast. And they're like, What's a podcast? And, and so you had to explain that first, but now it's a livestream, right? It's like, Hey, I have this livestream. Would you come show your project or what you're working on or something? And we're gonna livestream and record it. And, uh, I like how we formatted, I didn't start Containers from the Couch, but I've kind of, uh, been very involved, uh, early on. It was all about the live demos. It was all about, we wanted no slides. We want to see a terminal. If our goal is to get to a terminal within 10 minutes, and if we're not in a terminal in 10 minutes, we need to figure out, maybe we needed someone different on the podcast or on the, on this, on the stream to actually show us what's going on.

Because most of our audience were people that were building and they wanted to see something in action and they wanted to see how it worked. And they love seeing when it breaks. They love seeing us troubleshoot things and in doing that live, but again, it's like livestreams are really stressful because they're live.

I can't, I can't edit things really easily. Uh, but all the best feedback I always get was like, I loved it when that broke. I loved it when you troubleshoot. Cuz then they see your thought process. They see how you're gonna go about like, oh, I might be familiar with this, maybe I'm not. But let's just start, like, is, did I run the "yes" command somewhere, , did I fill up another hard drive?

Like let's start there and then work our way up to like figure out where this is broken. And, and those are just a lot of fun. And so yeah, it's the show. Uh, we've kind of split it out. That's where the shorts started. Um, was just like, I had a early this year, I was like, I'm gonna do 10, I'm gonna do 10 shorts.

If people like 'em, I'll keep doing 'em. If nobody likes 'em, it's fine. It was a pretty low effort to kind of get 10 shorts out there. And, and so we were focusing on live streams. Um, we have a bunch of like other pre-recorded content that's in the, like midrange of like 5 to 10 minutes of content. Um, and then the livestreams are, are typically like 40 minutes to an hour.

And so we kind of have those like different formats depending on where you're at in your day, depending on what you're trying to learn, depending on how much time you have.

Rich: Yeah, it's obviously a pretty different thing to like, do a demo, you know, that's like not gonna fit a one minute

Justin: right?

Rich: most things.

Justin: right?

Rich: um, yeah. Are there any specific episodes you wanna like, call out to people that you think are interesting that the listeners might wanna check out?

Justin: Oh, man.

Rich: Not to, not to leave anyone else out,

Justin: Yeah, no, I mean, there's been so many, like, just cool projects. We, I've focused a lot of my projects on similar to, um, TGIK, which, uh, Joe was doing for a while, which was like

Friday, like, let's just pick a project and learn it. Um, we've done some that I wasn't even on that I loved, uh, with, um, HashiCorp on, um, uh, Waypoint, uh, which was awesome just because like, like the HashiCorp team just like came to the stream and was just answering questions and I wasn't even on it, but I was in chat at the time and so like Brent and Adam were running it and it was just like, Oh, this is really cool.

Like, the development team's here. And so I love talking to people that have been building the projects cuz there's like, my mindset was like, I started it here and it kind of grew beyond that. Or I went to these other things. And so, um, some of the shows I've done for like, our projects, I really like, like the Karpenter project I'm, I'm a big fan of like, I've been working with them for a while now.

Um, so that's, that's been fun cuz the cool demos and I, I get to work pretty closely with that team. Um, the, even like the load balancer controller. It's like the most like, matter of fact, like that has to exist in your Kubernetes cluster, but it's kind of boring. But also it's like, wait, I can do what? Like how does that work?

And I learned from the developers like the, the depth of like functionality and like configuration of just like, I had no idea that was possible. And, and those sorts of things are really fun. We're like, Yeah, you can, you can collapse all your, you know, load, balance your service endpoints into one load balancer if you wanted, Right?

Depending on how many services you have. It's like, you don't need 50 load balancers, you can do one and just, you know, the, the actual ALB can trigger that stuff. And I was like, Wow, it's, I, I never knew that was possible. And, and so those sorts of things are, are always fun. The things that I learned things, and, and of course, I mean like the virtual cluster was fun because my favorite, my favorite shows are the ones that end up in a PR, the ones that end up with like, someone committed code because this existed. And that one was one where it's just like, Hey, this thing's really cool. And you have these, this idea of distributions inside of vcluster.

Rich: Yeah.

Justin: We, we have EKS distro, which is the Amazon managed, you know, EKS bits, Uh, people could run that if you wanted to add the flag. Like that is something, and we had plenty of partners, we use it for EKS Anywhere. We use it for, you know, managed EKS, all these places. Like if someone wants to test against our actual bits and they're just like, Oh, I need this version of cluster fast. Yeah, you can use it in EKS, You can create an EKS cluster, but you also can create a virtual cluster for something that's, you know, lower effort of just like, Oh, I just wanted to test something for 5 minutes or 10 minutes, or figure out how this thing works. That's so much faster.

Rich: Yeah. No, I, I love the virtual cluster stuff. It's been a blast to work with. And, um, I'll, I will link to that, uh, that episode we did in the show notes so people can check it out if they would like to. Um, and yeah, I just, uh, I just had a wonderful time on the stream and would definitely recommend it to other people. If you have some technology you'd like to show off, uh, hit, hit Justin up and see if he could get you on there.

Justin: Yeah, in one of our, one of the hard things on it is just like the backlog of things we want to do is just so much, and it's just, there's so many cool projects and there's so many places, so many things we want to feature and, and we're trying to balance that of just like, what should we do in a stream and what should we do in a, a smaller episode?

And we're finding that like this shift to like less than 10 minutes is where a lot of people want to be. And so a lot of the content that we want to show off in streams is becoming more of this like shorter form video because it just gives people an appetite of what exists and we don't need to do an entire 60 minutes on, on showing off, you know, a specific thing. We're just, again, trying to tell people like, this exists, here's what it's for, here's how you use it. If you want to go ahead, like in, in almost everything we feature on the show has been open source and free and available for people just to go try it, which has been really cool.

But that is, has been like a weird balance that we've been learning over the last year with three different formats now and different places to publish the content on Twitch, on YouTube. Um, I, we cross post the shorts different places and just people consume the content different places and, and so it's been really, it's been a learning experience.

Cause I've never been like a, I don't wanna call myself a YouTuber, but I, I, like, I, I manage a YouTube channel and, and I do that as, as a percentage of my work is making sure the content and the information is accurate and, and, good for people that wanna watch it.

Rich: Yeah. I mean, like I said, I think that Kubernetes influencer fits better. So let's go with that. Um, uh, I wanted to ask you, um, if there's some things you can share without having to kill me. Um, if we could talk a little bit about, about Kubernetes specifically at AWS. Um, because I'm, I'm pretty curious about that.

You're obviously the, the big cloud provider. Um, I'm curious, like what kinds of things you see people doing there when it comes to Kubernetes? Like, um, especially the difference between like a managed service like EKS or people rolling in their own Kubernetes clusters.

Justin: I was a big fan of kOps or k-Ops um, back in the day, um, when, when I was at Disney. And, and it, it helped us in a lot of ways run clusters and it had a lot of cool benefits of like rolling upgrades and all of these things that were kind of difficult to do and to get right in, in different ways. Like, you know, there's always like, I could do it in Terraform, I could stand it up and then like, how do I migrate this?

How do I safely make sure this happens? And, and kOps did some of that for me. It just like built in like, Hey, you wanna rolling upgrade of your, of your nodes, we can do that. And, and so it was kind of a layer on top of Terraform to be able to do some of that stuff. I still had to manage some of the bits, control plane, whatever else. I had to manage the size and auto scaling and that sort of stuff.

And, newer versions of that is, is more along the lines of like Cluster API. Cluster API can give me, you know, Kubernetes cluster, um, self-managed or managed in a lot of cases, right? Like I can use a Cluster API cluster and get an EKS cluster and, and it just, you know, manages the other bits around that, or configuration and nodes and those sorts of things.

And it does a lot of the same stuff on top of what, you know, a, a raw Terraform or something might give me, um, mainly around the like upgrade, uh, cycle. And then EKS has been moving in those directions for a lot of things where it was like, originally it was the control plane, bring your own nodes. But the hard part for a lot of operators was like, I can't do upgrades easily.

I always had to like manually like scale down an ASG or manually scale it up and like I had to balance those things and, and so those sorts of things were still tricky. And EKS has, has come a long way since you know, I've been here, not, not to my own like, uh, reasons, but just like the team was going that direction anyway.

Like there was like managed node groups came out and, and really made it a lot easier. Just like, Oh yeah, I want a cluster. I don't want this many nodes. And then like, I don't, just manage them for me, right? It's like that's the, the goal. It's just like, get rid of all of that other stuff that I needed. Yeah, I still see it.

Yeah, it's still there. I can still pick custom AMIs, I can still pick, um, configuration and some things flexible in that, but I really just didn't want to have to upgrade it myself. I, I didn't want to be responsible for like, those downtime situations. Uh, and now I see a lot, even more things, um, such as Karpenter, right?

Like Karpenter's coming out or has been out and lets you do some of that stuff and more where I'm just like, Hey, actually managing a bunch of different managed node groups is kind of hard and I, I have workloads in this cluster. Some of 'em need GPU, some of them might want ARM, some of them want really large instances and we always kind of pick that least common denominator, uh, even though nothing in Kubernetes required us to. The tooling and, and just the, the math of like, is, will this all fit, was just easier if everything was the same size. And, and now we're trying to make a lot of those things even easier, which is like, Hey, you have a Kubernetes cluster now we can also upgrade those things for you and we can handle, you know, when they're going away.

But also a var, a variable, you know, mixed fleet where none of your cluster, none of your nodes are the same size. They don't have to be. Um, Karpenter just a couple weeks ago launched consolidation, which is, it'll look at the cluster over time and figure out are any nodes not being utilized very much. Did we create like a really big one when you scaled up? But a lot of those workloads are gone now. So now maybe that'll fit somewhere else or maybe we can just replace it with a smaller node and we can save you money over time. And now it's not just about, Hey, let me scale this and upgrade it. But now it's also about, hey, let's lower the cost. Let's make sure that we're running this as efficient as possible.

And so a lot of that process has accelerated on the EKS side. Where the other do it yourself, like you have to still find another third party project or something to layer in on top to do some of those things. And so I've been really excited about that stuff where it's just EKS for the operators has been helping so much in those areas.

That's were originally really hard. And I, I mean, I, I think we've been doing a good job. If people disagree, please let me know. I'm, I'm available on Twitter. Um, I, I love to hear the complaints because the complaints are how we grow. The complaints are the, the things that, it's just like, Hey, yeah, there's a gap here. Absolutely. And, and we have an open roadmap of every gap that we know about. Right. Our roadmap is on GitHub. It's like, if there's a gap there, please, plus one it, um, because we, we literally go off of that roadmap and say like, Hey, this has a million plus ones, we should do it. This is, it makes sense to be able to do these things.

Um, and, and getting that feedback is the, the best way for us to know that We absolutely talk to customers, but there's a lot of people out there that don't talk to a TAM regularly or don't, you know, pay for enterprise support or whatever. And those are the, the customers that might be feeling some of the gaps the hardest and making sure that we meet their needs as well has been really difficult.

Um, but I've been, I've been excited just knowing how much EKS has operationalized Kubernetes, uh, beyond just what used to exist with my own Terraform or, um, you know, kOps or Cluster API.

Rich: It's interesting that you mentioned the upgrading, 'cause I feel like that's, that's always one of the big arguments I've heard for using GKE as well. So it's, it seems like that's, that's a common pain point that a lot of people have and that they would prefer somebody else worry about so they don't have to.

Justin: And I was, I was a big user of ECS in the past. ECS is essentially versionless. You never have to upgrade your version of ECS as an agent. Like, it's just, it is something that exists. It'll orchestrate containers and it is simple and it, it allows you to just keep the cluster going. And, and I don't manage any control plane because that's all an API and it's API calls.

I manage the nodes. Uh, but as far as like the agents on them and how those things run, like those, that was a big win, um, at a previous job too, where it's just like, Oh, guess what? Like, I don't have to upgrade this right now because we don't have the time. We don't have the cycles to do it. So I, we will, you know, for new features or whatever.

Um, but there's other options that do exist. You know, Nomad., Again, you can use other tools that if, if that is a big enough pain point or you don't have the skillset available to do that, and you can't for some reason offload it to AWS to do it for you with a managed node group or Karpenter or something, then yeah, use, we have plenty of other tools.

There are other options that allow you to do some of that too, um, without making that like the make or break point of like, Oh, I can't, I can't do these upgrades all the time. It's like, okay, like that's fine. Like if you can't offload it to us, then let's figure out a better solution and let's figure out the right solution for you.

Rich: ECS was actually my next question on the list. So , I, if you can talk about, I wonder if, if you have thoughts about, um, use cases like why you would go with ECS as opposed to EKS.

Justin: I

Rich: Is it just mainly the upgrading stuff?

Justin: Oh, no, no. Not, not at all. Uh, I, I was at, uh, Disney Streaming after Disney Animation and, and Disney+ was built on ECS. Um, there was, it was what we were using, it was how we were, um, deploying services and it scaled to Disney+ needs , um, with a team of, uh, about four of us managing infrastructure.

And I was amazed at how well ECS handles the load. Everything we threw at it every time we were just like, Oh, like we have to scale more. Oh, we don't know. We didn't know what launch was gonna be like. We went from zero to 10 million users in one day and, and that was a huge signup process and a huge load on all of our infrastructure.

And I never got a page. I never got paged for any of our services. The infrastructure, the ECS just, just took it and services ran and things rescheduled as needed. And it was an amazing win for us as a, as a service team being able to ship that. I even, I recorded a re:Invent talk after I joined Amazon with someone that I used to work with.

Zach at uh Disney Streaming. And then he and I were peers on like that infrastructure productivity engineering team. And, and he talked all about like how they, how we built deployments. We had our own deployment tooling and, and it worked internally for what we needed for Disney+. And so, um, I love the like, simplicity of it.

If you're just getting started and you're like, I don't know what I need, Uh, I always tell people, start with ECS. Um, there's two tools that I, I think are fantastic. There's Copilot,, which is not to be confused with all the other Copilots out there. Um, there is an Amazon Copilot which will, which will just run your code on ECS.

It gives you a cluster and it gives you a service and it just runs it and you can add on storage and all this other stuff that you can extend it. Um, but also there's a Docker, a compose plugin, Docker ECS plugin that you can just use your Docker compose file and run it directly on ECS.

And, and so you can run it, Yeah, you can run Docker Compose locally, and then you can run Docker, compose, deploy to ECS. And, and if you're getting started with containers, like do it, like, just, just run with those, like, those are great.

Like you don't, if you, you're not using Kubernetes yet and you don't know that you need it, don't use it. Uh, there's way easier ways to onboard to like, I need to deploy something to the cloud. Um, I also helped launch App Runner, uh, at AWS after I got here, which is again, it's just like, here's a container, give me a web endpoint, right?

Like, I don't care about the load balancer, I don't care about schedulers, I don't care about any of that stuff. I have a container and I just want you to run it for me and it's gonna accept some traffic and, and that works. And you can just run those things too. And so, like, there's a lot of simple, I don't wanna say not, they're not simple, they are still difficult. Like containers development, cloud is difficult. I totally don't wanna just pass that away as like, Oh, everyone should understand this stuff. No. Like, it took me years and years to like, learn just the fundamentals of like, what is this thing doing? Um, but it is more simple than Kubernetes.

If you're jumping in, you're like, Kubernetes, I have to do it. It's like, no, you don't. Like, you can, you can run things at, at Disney+ scale without any Kubernetes. You can run things at, uh, plenty of other scales and, and whatever your need is, complexity for your application, your infrastructure or your organization.

If your team is only three, four people, like you're gonna be managing Kubernetes maybe, um, unless you're offloading some of that stuff, right? Like there's, there are things you're gonna have to do, but you could go with different solutions and offload some of the burden on the team and just a complexity of like who's coordinating things and, and getting rid of the spreadsheet is a, is a goal. But at the same time, like that you can also pick something else that, that also that will work for you.

Rich: Yeah. No, I'm, I'm a big, uh, proponent of using the thing that works for your use case. Right. As opposed to like thinking that, you know, Kubernetes needs to be like the Maslow's Hammer, you know, that you use to like, do every single thing you have to do.

Justin: Yeah, and it was really eye opening for me going from building and running Kubernetes to running ECS and, and I was all about like, Oh, we should move this to Kubernetes. And when I got there to Disney Streaming, we had 11 months to ship Disney+. I was like, there's no way we're doing any Kubernetes.

Like we are not touching that yet. Like maybe in the future, like maybe, maybe at one day it makes sense, but right now like we need to ship and we need to make sure things are stable and we need to scale. And, and ECS handled all of that for us, which was awesome from my perspective.

Rich: Fantastic. Um, I asked for a listener questions and we got one from Charles Landau. Thank you Charles. He's @landau_charles on um, Twitter, and I'll link to that in the show notes. Um, Charles asks, uh, what's in store for the AWS Auth ConfigMap?

Justin: For anyone not aware, EKS clusters have a, a ConfigMap in them, which map authentication for users, for IAM users into the clusters. So you can say, I have an IAM user that's has access to AWS. I need them to have access into this Kubernetes cluster as well. And there's different ways you can manage Kubernetes access.

So there's OIDC, there's all these other options for like, if you want, you know, how people are accessing the cluster. And the AWS Auth ConfigMap was our way to just kind of merge AWS IAM with, with Kubernetes. And it was a native Kubernetes solution, but we also know that there's some gaps there. There's some things that it could be better.

And in managing that ConfigMap becomes a difficult problem because yeah, you can lock it down with RBAC or something. Um, but it's still tricky when you're editing it, if you're deploying changes to it. Uh, it is the linchpin of like you have access to the cluster or you don't. And so if you mess that up, if you don't indent your YAML properly, you can lose access to the cluster. And that is a big concern. And it's something that allowed us to get people integrated, get, get Kubernetes natively integrated into AWS authentication. But at the same time, we know there's gaps and we know we want to eventually replace that.

And so there's something, you know, we're definitely talking to people. If you have, you know, something you would want, ideally authentication to the Kubernetes cluster becomes more of a native AWS service. It's a little more integrated on the IAM side because especially at larger companies, IAM is the thing.

You, you don't RBAC all of AWS, but you I, you know, but IAM is there and we need that a little more native into Kubernetes. And so how we get that integrated into cluster access, um, really is just a matter of like, how do we integrate that into EKS APIs. And how do we get that as something that is a little safer to do.

And make sure that people understand, Hey, you can give access to these things, you know, inside of the cluster. We did things like, um, uh, IAM roles for service accounts, uh, which is like a native way to give you a service account with an annotation on it, and that gives you access to an IAM role in the cluster inside of AWS so you can call different things. And, and how do we make that a little more native from the EKS cluster wide, of, of this isn't a service account, this is just like I need to access API. So we have plans for it, we have some ideas on, on how we want to integrate it and make it just safer to give that as like an option for people of, you know, let's, let's get away from something that is, uh, YAML specific.

Like there is no error checking on it necessarily. You can give all the recommendations of like, put a web hook on it, put a validating, uh, you know, hook on it or something before anyone like mutates it. Um, but in practice, like it just doesn't happen. It's just something that's like, hey, all the great advice in the world on how to make this better. Um, it is still, it's still on us as AWS. Like we should, we.

Rich: Yeah you can't really trust people to do all the

right things.

Well, and we have to own that. We own the IAM side of it, so we have to make sure that we are making that as safe as possible. And so we don't have anything that's like cemented right now of like, Hey, this is what it's gonna be. Uh, but we definitely hear the pain points and we know that like this was a thing that existed and it's been working for a long time.

It's been great to get people that are native Kubernetes users are like, Oh, I love ConfigMaps. I love doing this stuff here. Um, but then like bringing the rest of the people on board and making it safer, um, has been a little more difficult.

Cool. Um, Justin, uh, I had such a blast talking with you, and I feel like I could talk with you for another hour, but I, I need to let you go. Um, I will, uh, link to a bunch of the things that we've talked about in the show notes, including your Twitter and your fantastic TikTok. Is there anything else you wanna mention real quick, uh, before we go?

Justin: No, I don't think so. Um, it's, it's been a blast talking to you. Uh, I'm just, you know, trying to have fun. I'm still, you know, engaged in the Kubernetes community and, and just, I love meeting new people whenever I go to a conference and just hear what people are doing, um, solving their own problems and, and getting themselves unblocked. I just, I love hearing that stuff. So, um, feel free, you know, you can reach out and let me know. Um, whether it's EKS or not, uh, I've been a Kubernetes user for, for long before I was, uh, at Amazon and so, um, it's just been a great community to be part of and kind of grow with.

Yeah. Thanks, Rich.

Rich: Kube Cuddle is created and hosted by me, Rich Burroughs. If you enjoyed the podcast, please consider telling a friend. It helps a lot. Big thanks to Emily Griffin who designed the logo. You can find her at daybrighten.com. And thanks to Monplaisir for our music. You can find more of his work at loyaltyfreakmusic.com. Thanks a lot for listening.

Justin Garrison
Broadcast by