Site Reliability Engineer

Sorry, this job was removed at 10:28 a.m. (PST) on Tuesday, March 3, 2020
Find out who's hiring in San Francisco.
See all Developer + Engineer jobs in San Francisco
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Rescale is the top in enterprise big compute and is one of the fastest growing tech companies in Silicon Valley. Our customers range from disruptive and innovative startups to well-known global automotive manufacturers. Our dynamic team is welcoming, collaborative and diverse. Becoming a part of the Rescale team means that you are part of the next generation in big compute. You will become part of the disruption which is turning traditional HPC on its head.

 

Our stack consists of a set of Python and Java services hosted on AWS. These services are used to configure and manage on-demand, isolated HPC clusters running on a number of different infrastructure providers. Engineers and scientists expect our platform to remain highly available in order to run mission-critical simulations across various domains including automotive and aerospace. This is where you come in.

 

We are looking for a Site Reliability Engineer to help improve our service availability and scalability. The ideal candidate will be capable of not only diagnosing and fixing production issues but also making infrastructure improvements to improve the overall resiliency of the platform.

Key Qualifications Include:

2+ years experience in a similar role.

Experience managing, monitoring, and optimizing the performance of cloud-native applications in production.

Demonstrable knowledge of TCP/IP, HTTP, and web security best practices.

In Depth knowledge  of Linux fundamentals and internals.

Proficiency in a modern scripting language. (python, ruby, etc)

Experience with a configuration management tool. (salt, ansible, puppet, chef, etc)

A strong aversion to repeated manual processes.

Ability to participate in an on-call rotation. (about 1 week per month)

Ideally, you will also have experience with:

At least one of: AWS, Azure, or Google Cloud Platform.

Debugging networking challenges related to hybrid, cross-region, and cross-provider deployments.

Scaling and tuning both relational and non-relational datastores.

Traditional HPC schedulers.

Container orchestration tools.

Rescale Perks:

Game Nights

Hack Days

Weekly Tech Talks

Lunches catered every Mon/Wed/Fri

Team happy hours every Friday

Rescale is an Affirmative Action, Equal Opportunity Employer. As part of our standard hiring process for new employees, employment with Rescale will be contingent upon successful completion of a comprehensive background check.

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

33 New Montgomery St, San Francisco, 94105

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about RescaleFind similar jobs