Head of Infrastructure and Site Reliability
Every day, scientists around the world use Benchling’s applications, platform, & analytics in their efforts to solve humanity's most pressing problems. For these scientists, Benchling is the central technology they use to conduct their research. Our customers include pharmaceutical giants, leading biotechs, and the world's most renowned research institutes.
Benchling has launched with over 250 enterprises, 1000+ institutions, and 175,000+ scientists actively using the Benchling applications and platform on a daily basis. Availability, resilience, and reliability of the system is paramount to their daily work and business. As leader of our SRE team, you are well versed in modern cloud SRE practices and not phased by maintaining & running mission critical systems for Fortune 500 enterprises.
In addition to SRE, this position encompasses our infrastructure and developer productivity functions. The right leader has come out of a platform-oriented development background, and understands how to develop an internal PaaS that can support a team of 100+ engineers.
WHAT YOU WILL DO
- Build and manage a team of engineers with hybrid responsibilities along the spectrum from Infrastructure development and maintenance to Technical Operations and Site Reliability Engineering, in support of a component & service ownership model that spans the broader engineering organization.
- Own end-to-end availability, reliability and performance of Benchling’s systems, from testing and staging environments, through production. These systems are mission-critical to Benchling’s customers’ scientific operations and business. Oversee the development and maintenance of systems for Build and CI/CD pipelines, observability, monitoring, and process automation for incident response and other key business and technical processes.
- Establish, continuously improve, and mature the foundational processes and culture for Site Reliability Engineering at Benchling, in collaboration with other engineering teams, and cross-functional stakeholders such as Customer Success and Customer Support. Drive the incident response process across engineering.
- Lead the team to develop tooling for infrastructure automation and developer productivity. Manage the team’s day-to-day and sprint-to-sprint work, balancing work in support of the development and product roadmaps with ongoing operational responsibilities.
- Make Benchling a great place to work by actively supporting career development, growth, and mentoring of individuals; and by fostering a fun and inclusive team where everyone feels welcome.
ABOUT YOUR TEAM
- Starting from a team of 3-4, build a team of approximately 8 engineers over a 12-month timeframe, balancing the skill sets and experience levels necessary to build out the team’s functions, and exhibiting a “bar-raiser” mindset, while preserving and building on the team’s culture and values.
ABOUT YOU
- Minimum three years of managing an engineering team, with a track record of building and managing high-performing teams with a culture of quality and continuous improvement; plus a minimum of four years as individual contributor engineer in software development, technical operations, or site reliability engineering.
- The ideal candidate will have an engineering and/or engineering management background that contains a mix of experience from the following two categories. It is not essential that they’ve managed a team acting in both categories -- a mixture of engineering management experience in one and IC engineering in the other is sufficient.
- (1) product/platform engineering, software product development, API engineering, or service development (as service-oriented architecture or microservices architecture).
- (2) technical operations, SRE, continuous delivery or delivery engineering, cloud engineering, production engineering, or DevOps
- (nice-to-have) engineering tools development, tooling and/or automation to help developers move faster and be more productive
- Required Technical skills
- Experience with and deep understanding of AWS technologies and challenges and best practices of operating in an AWS environment
- Understanding of modern cloud service testing and deployment practices and infrastructure-as-code, such as Terraform, Spinnaker, etc.
- Experience with and knowledge of tools and services for logging, monitoring and observability, such as SignalFX, Nagios, LightStep, NewRelic, etc.
- Hands-on experience with software development in Python, Ruby, Java, Go, or Scala, or a similar language.
- BS in Computer Science or a related field, or equivalent experience
Preferred skills:
- Knowledge of and experience operating container orchestration systems -- Amazon ECS (Elastic Container Service) and Kubernetes (and/or Amazon EKS, Elastic Kubernetes Service), and related technologies
- Working manager-level understanding of, or previous hands-on experience with, at least one modern web application and/or web services framework. For example, Python/Flask, Python/Django, Ruby/Rails, Ruby/Sinatra, Java/Spring
- Proficiency with, and understanding of distributed systems, particularly from the standpoints of deployment, reliability, monitoring/observability/debuggability, and performance.
OUR VALUES
- Empower through information. We explain the “why” behind every decision, unless there are highly sensitive circumstances. We're honest about how we're doing, especially in difficult times. We believe that sharing information builds trust and enables better decision-making.
- Rely on tenacity. Hard work is one of the greatest factors to determine success and is fully under our control. We must make the most of every day by bringing the highest level of determination. Dreaming big is not enough.
- Raise the bar. Pushing ourselves and others to improve will be uncomfortable and at times result in failure. However, it's critical to our success. We're dedicated to creating a place where everyone feels challenged to improve.
- Build a lever. We choose to build tools and infrastructure that will help others make world-changing innovations. There's less glory in it, but in the words of Archimedes, "Give me a lever long enough and a fulcrum on which to place it, and I shall move the world."
PERKS AND BENEFITS
- Work with a talented yet humble team
- Competitive compensation & equity package
- 401k
- Medical, dental, and vision insurance
- Monthly health & wellness stipend
- Weekly virtual social events, and annual company retreats
- *$1,000 work-from-home stipend
In following best practices and safety protocols, all Benchling employees are expected to work remotely until we are further advised that it is safe for employees to resume work in their respective office locations.
*To support remote work conditions, Benchling provides each employee a one-time stipend of $1,000 upon commencing employment, and additional discounted employee purchase plans for home-office equipment.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We also consider for employment qualified applicants with arrest and conviction records, consistent with applicable federal, state and local law, including but not limited to the San Francisco Fair Chance Ordinance.