This role is a True Web Scale SRE (not a devops “sre” or just wrote scripts, but grew up developer and has elevated to TRUE SRE) and has good distributed systems experience. Candidate must be able to support, troubleshoot and fix major systems incident and develop (python, Django) integration solutions with ICM.
Initially will learn the OCM and process and them progress to determine what integrations needed to build and marry the stack (common integration ,so don’t have to worry about multi stack)
· Good programming experience for complex, scaled web and cloud stack and apps
· Strong SRE with Strong background
· Must be strong with Python and Java and Linux (not sys ad, but a 5-6 on a scale of 10)
· Support 7x12 ICM model
• 2+ years experience with troubleshooting in Unix/Linux
• UNIX/Linux systems administration background.
• Programming skills (Python, Perl, Ruby, Java/Scala, or C.)
• 5+ years in a UNIX-based large-scale web operations role.
• Experience with web-based Java/J2EE architectures and JVM configuration.
• Python experience, specifically for systems automation.
Nice to have
· Azure Incident Management System preferred, ICM
· Nice to have GUI/FE skills (for simple presentation layer)
· Azure (AWS, GCP will work too)
Incident Mgmt ICM experience is a plus