The Cloudcast #301 - SRE and Infrastructure Operations

The Cloudcast - A podcast by Massive Studios

Categorie:

Brian talks with Rob Hirschfeld (@zehicle, Founder/CEO of @RackN) about the concepts of SRE (Site Reliability Engineering), the challenges of maintaining infrastructure software, emerging tools and the next-generation of operations.

Show Links:

Get a free eBook from O'Reilly media or use promo code PC20CLOUD for a discount - 40% off Print Books and 50% off eBooks and videos
[DISCOUNT] Start Serverless Skills Bundle (4 courses) - (only $49 instead of $79)
[FREE] Alexa Development for Absolute Beginners
Sign up for a Datadog account and get a FREE t-shirt
RackN website
RackN and SRE
Google SRE Book

Show Notes:

Topic 1 - Welcome back to the show. Let’s start by talking about the concept of SRE (Site Reliability Engineering). Give us the basics and maybe explain how it differs from what people define in DevOps.

Topic 2 - Application development has been moving faster for quite a while (agile development, etc.). But now infrastructure/operations teams have to deal with faster software - especially around updates (e.g. Kubernetes releases every 3 months). How are companies managing this?

Topic 3 - Given that this pace of operations change may not slow down, how do you think about the challenge in terms of process/operations versus technology/tools?

Topic 4 - What are some of the steps that companies take to better prepare for this type of operational model? Tools, process, skills, etc.

Topic 5 - Do you see SRE as being a progression for existing infrastructure/operations people, or is this more focused on sysadmins or developers that want to get away from building applications?

Feedback?

Email: show at thecloudcast dot net
Twitter: @thecloudcastnet

Visit the podcast's native language site