Sr. Site Reliability Engineer

Sage

| Remote

Sorry, this job was removed at 4:34 a.m. (PST) on Saturday, August 28, 2021

View 2012 Jobs

Find out who’s hiring remotely

See all Remote jobs

View 2012 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Are you interested in working at a fast-growing Silicon Valley company voted Best Place to Work ten years in a row and recognized as one of the Best Workplaces for Diversity by FORTUNE?

Every business on the face of Earth must, in some way, do bookkeeping, accounting, and financial planning to operate. At the outset, these functions may seem like mundane facts-of-life in the process of running a business; however, the skill with which a company does them can have a profound impact on their business.

Within the Medium Segment Native Cloud Solutions at Sage, our team helps keep our public and private cloud-based infrastructure and SaaS application highly available and scalable.

The team you will join has broad expertise in Systems Engineering, Cloud Infrastructure Management (public and private), networking, and monitoring. You will help to develop, extend and maintain our mission-critical infrastructure while ensuring reliability and performance.

This role works in partnership with cross-functional teams maintaining our existing and forthcoming technology stack.

Responsibilities:

Research and implement new technological subsystems to modernize our infrastructure and work with various groups to maintain our high uptimes and deliverables to various business partners.
Implement automation and industry best practices to run our large-scale, rapidly growing infrastructure with minimum human intervention.
Address production issues, learn to mitigate them quickly, and find ways to prevent them
Implement monitoring, observability and alerting tools such as dashboards and logging systems to understand the health and availability of our infrastructure and applications.
Configure and maintain software components, i.e., operating systems, web servers, application environments through Python and Bash in a highly customized environment.
Participate in our team's follow the sun on-call rotation and improve on-call practices and procedures.

Requirements:

5+ years of professional experience in managing highly available SaaS environment as a DevOps Engineer/ Systems Engineer/SRE
Proficiency in scripting in Bash and Python – Automation must be part of your DNA.
Fluency in Linux administration in either Redhat or Debian distributions.
Application experience with web technologies such as Apache, Tomcat, haproxy, ngnix etc.
Solid understanding of technology stack fundamentals such as TCP/IP, HTTP, TLS etc.
Strong operational experience with monitoring infrastructure toolsets such Zabbix, Nagios, ELK , Splunk, Prometheus etc.
Experience with Docker, K8S is a plus.
Public cloud AWS experience (EBS, EC2, RDS, Lambda, S3, CloudFront etc.) is a plus.
Experience with Infrastructure as code (Ansible, Terraform, or CloudFormation) is a plus.
Strong analytical and troubleshooting skills.
Excellent interpersonal and communication skills with the ability to work in a dynamic high growth environment.

#LI-BS1

Read Full Job Description

Sr. Site Reliability Engineer

Location

Similar Jobs