Site Reliability Engineer
Rescale is the top in enterprise big compute and is one of the fastest growing tech companies in Silicon Valley. Our customers range from disruptive and innovative startups to well-known global automotive manufacturers. Our dynamic team is welcoming, collaborative and diverse. Becoming a part of the Rescale team means that you are part of the next generation in big compute. You will become part of the disruption which is turning traditional HPC on its head.
Our stack consists of a set of Python and Java services hosted on AWS. These services are used to configure and manage on-demand, isolated HPC clusters running on a number of different infrastructure providers. Engineers and scientists expect our platform to remain highly available in order to run mission-critical simulations across various domains including automotive and aerospace. This is where you come in.
We are looking for a Site Reliability Engineer to help improve our service availability and scalability. The ideal candidate will be capable of not only diagnosing and fixing production issues but also making infrastructure improvements to improve the overall resiliency of the platform.
Key Qualifications Include:
• 2+ years experience in a similar role.
• Experience managing, monitoring, and optimizing the performance of cloud-native applications in production.
• Demonstrable knowledge of TCP/IP, HTTP, and web security best practices.
• In Depth knowledge of Linux fundamentals and internals.
• Proficiency in a modern scripting language. (python, ruby, etc)
• Experience with a configuration management tool. (salt, ansible, puppet, chef, etc)
• A strong aversion to repeated manual processes.
• Ability to participate in an on-call rotation. (about 1 week per month)
Ideally, you will also have experience with:
• At least one of: AWS, Azure, or Google Cloud Platform.
• Debugging networking challenges related to hybrid, cross-region, and cross-provider deployments.
• Scaling and tuning both relational and non-relational datastores.
• Traditional HPC schedulers.
• Container orchestration tools.
Rescale Perks:
• Game Nights
• Hack Days
• Weekly Tech Talks
• Lunches catered every Mon/Wed/Fri
• Team happy hours every Friday
Rescale is an Affirmative Action, Equal Opportunity Employer. As part of our standard hiring process for new employees, employment with Rescale will be contingent upon successful completion of a comprehensive background check.