Site Reliability Engineer - SRE Team Lea

  • Cleo
  • Salt Lake City, UT, USA
  • May 21, 2021

Job Description

Site Reliability Engineer - SRE Team Lead - Req #186 United States / Product - SRE / Full-time Cleo is a cloud integration technology company focused on business outcomes. Every day,we ensure that each one of our 7,000+ customers' potential is realized by delivering solutions that make it easy to discover and create value through the connections and integration of enterprise applications supporting critical workflows. By providing the industry's most complete and flexible integration offerings,we are helping our clients build trusted relationships across their partner ecosystems today,while providing all the control and visibility they need to advance their business tomorrow. Simply put,Cleo ... never stops The Position Cleo is looking for an SRE Team Lead that is interested in joining a dynamic,growing enterprise software company. Our products are designed to put the client's needs first,value innovation,and solve business problems with brilliant simplicity. This position requires a candidate who is self-driven,willing to learn and become productive quickly. The SRE Team Lead will serve as a leader in planning,production,and engagement with software developers and infrastructure engineers to integrate software development and delivery. We are looking for an individual who enjoys providing continuous improvement to our system and application as well as to our team of engineers. What you will be doing + Continuous improvement of system and application monitoring and automation + Monitoring of infrastructure,systems and application availability,performance and capacity + Leads engagement with software developers and infrastructure engineers to integrate software development and delivery from inception to full operation,ensuring robust released software and systems + Identify and automate manual workarounds and process improvements + Monitor the availability,latency,scalability,and efficiency of all services + Perform periodic on-call duty as part of the SRE team + Experience managing and troubleshooting large AWS infrastructures + Background in system administration scripting (shell,bash,python,etc.) + Mentor direct reports to further develop their soft and hard skills + Evaluate industry best practices and apply them as appropriate,keeping the company apprised of trends and emerging technologies + Ensure the 24x7 availability and reliability of cloud infrastructure + Foster a sense of automation in issue resolution; everything possible should be automated,and only when automation can't resolve an issue should people get involved in the resolution + Lead efforts for updating production with new versions/infrastructures as they are available + Lead capacity planning efforts to determine changes to infrastructure that are needed to support new load and performance characteristics Requirements + BE/MCA in Computer Science or Engineering + 5+ years' experience in Site Reliability Engineering + Knowledge of Amazon S3,EC2,RDS,EFS,ELB,Route 53 is needed + Experience in one or more of: C,C++,Java,Python,Go,Ruby,Scala,NodeJS is a must + Experience in Linux and Unix-Like operating systems + Must be self-directed,flexible,and be able to prioritize and handle multiple projects simultaneously + Outstanding problem solving,troubleshooting and decision making skills required + Mentors others as necessary + Knowledge of CloudFormation,CloudWatch,CodeDeploy,DynamoDB,Lambda,SQS is a Plus Benefits + Competitive base salary + Great Healthcare + Dental + Vision + Unlimited PTO + 401k + Opportunity to work on large,high impact projects + Ongoing training and development Equal Opportunity Employer: Disability/Veteran AJE Ref Number: 592571351