you passionate about building highly available, performant and scalable
web applications? We are looking for an ambitious Senior Site
Reliability Engineer to join our team of software and infrastructure
engineers. Olo is experiencing tremendous growth, and Reliability at
Scale has become our key mantra. As we enhance our platform to support
the increased demand, it must be positioned for continued stability,
reliability and resiliency...even at 10x scale! You will be challenged
with complex yet interesting problems, and your passion to succeed will
What You’ll Be Doing
- Take ownership of the entire process, from observability and SLIs/SLOs to Incident Response to postmortems and follow-up actions.
- Work to define standards and best practices and help drive those into each team.
- Help us implement and tailor our incident response tools in order to minimize outage durations.
- Brainstorm, define, and build collaborative monitoring solutions with members across multiple product teams.
- Contribute insights across teams to help us improve or re-architect existing systems to support scale, performance and extensibility.
- Constantly re-evaluate our observability tooling to improve architecture, knowledge models, user experience, performance and stability.
- Analyze and mature our processes around Incident Response, Observability, Postmortems and Predictive Monitoring.
- Maintain production services by measuring and monitoring availability, latency and overall system health.
- Influence an engineering culture of reliability, observability, and availability.
- Strive to coach and mentor engineering teams through game days, SRE boot camps and other training and feedback channels.
What We’ll Expect From You
- Strong experience with monitoring systems like Datadog, Sumo Logic, Raygun, New Relic or similar.
- Fluency in at least one Incident Management tool such as FireHydrant, OpsGenie, PagerDuty, VictorOps or similar.
- Some past experience with build and deploy tools such as Jenkins, TeamCity, Octopus, CircleCI, etc.
- You've been in the trenches building highly scalable, efficient, and resilient systems.
- Prior hands-on software development experience highly desired.
- Self-starter: can take high level direction and organize to achieve its objectives.
- Highly motivated individual with a curiosity to learn as you grow.
- Legally able to work in the U.S.
- Willing to roll up your sleeves, work hard and be scrappy!
Nice to Have
- Prior hands-on software development experience.
- Experience with Ansible, Terraform or other Infrastructure-as-Code tools.
- Solid grasp of Immutable Infrastructure concepts.
- Experience with containers and container orchestration frameworks.
- Expertise in guiding Incident Response, in terms of both process and tooling.
What's Important to Olo
- Our families come first. We know they make us who we are and they are who we live and work for every day.
- Olo is our extended family. We’re in this together, fighting for one another. We’re happy to be here. We will not let one another down.
- We learn from and fight through setbacks. We recognize and help one another with direct feedback.
- We care about you. We offer 20 days of paid time off, fully paid health, dental and vision care premiums, stock options, a generous parental leave plan.
- We value diversity. At Olo, we know a diverse and inclusive team not only makes our products better, but our workplace better. Many groups are consistently underrepresented across the tech sector and we are fully committed to doing our part to move the needle.
- Learn more about our culture, values, and mission. https://www.olo.com/images/culture.jpg.
candidates to share any concerns or questions with Olo’s recruiting team.
you know why we give out FitBits!). Check out our culture map:https://www.olo.com/images/culture.jpg.