Site Reliability Engineer - Production Kubernetes

| Remote
Sorry, this job was removed at 6:51 a.m. (PST) on Tuesday, April 13, 2021
Find out who’s hiring remotely
See all Remote jobs
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

About Us:

SentinelOne was formed by an elite team of cyber security and defense experts from IBM, Intel, Check Point, Cylance, McAfee, and Palo Alto Networks. SentinelOne is shaping the future of endpoint security through its unified, converged platform that automatically prevents, detects, and responds to threats in real-time. Our unique approach is based on deep inspection of all system processes combined with innovative machine learning to quickly isolate malicious behavior, protecting devices against advanced, targeted threats in real time. 

Our company is built upon a foundation of team-players with innovative problem solving skills. We operate with the utmost integrity to represent the SentinelOne brand and support the 'good' within the cyber community. As we enter our next phase of hyper-growth, we're looking for people that will go the extra mile and join in our passion for building a bigger and better SentinelOne.  If you are enthusiastic about cybersecurity and have a growth mentality, we would love to speak with you about joining our team!

Position Overview:

We are seeking a Site Reliability Engineer - Production Kubernetes whose objective is to "accelerate product delivery".This includes working hand in hand with our CloudOps, Site Reliability Engineering, and Core Engineering teams to enhance operations of our massive kubernetes infrastructure, accelerate of our multi-cloud, on-premise, and air-gapped deployments, and provide expert level guidance on best practices for kubernetes CI/CD automation.  Our industry's best security products ensure our customers operate fearlessly regardless of where and how they operate. As part of our Site Reliability and Cloud Operations Organization, you’ll help SentinelOne continue to grow by ensuring rapid delivery of new features nad stable and reliable operations for our customers that directly impacts revenue.

The Site Reliability Engineer - Production Kubernetes will bring deep expertise designing and supporting highly-scalable, highly-available infrastructure and applications in Kubernetes as well as promoting meshed microservice design patterns in complex working environments. You will serve as a subject matter expert on all aspects of our containerized deployments, including deployment, configuration, scaling, and upgrades. You will provide technical leadership in the team as the subject matter expert on Kubernetes for a complex and highly available production system.

In addition, you will debug problems in production and test environments, advise developers on best practices applicable to the environment, and maintain high-volume clusters in multiple datacenters. You will develop automation that improves deployment speed and service reliability in the containerized environment. You will help mentor other team members and customers on the adoption of new technologies and design principles as well as promote DevOps culture and collaboration.

What will you do?

Significant real world, production expertise in the deployment, management, and monitor of Kubernetes:

  • Co-own production Kubernetes clusters that provide the backbone for our security products and  big data / ml analytics based services. Experience with OpenShift, D2IQ, Platform9, EKS, GKE, or other Kubernetes management tools and services is a plus.
  • Collaborate with Cloud and Infrastructure Operations teams to ensure efficient operations of Kubernetes Cluster in GCP, AWS, Bare-Metal and other deployment targets.
  • Facilitate Continuous Integration/Continuous Deployment capabilities including Canary and Blue/Green deployments. Experience with Spinnaker, Harness, Weave.works, CodeFresh, ShuttleOps, or similar is helpful.
  • Designing and deploying security policies using Open Policy Agent
  • Design and automate gitops workflows with tools such as Jenkins, Ansible, Puiumi, or Terraform required.
  • Extend existing  Observability solutions and exposing metrics that track SLIs and SLOs to ensure meeting SLAs
  • Minimizing and hardening microservices and public-facing API gateway attack surface
  • Observability, capacity planning, system and service performance analysis and tuning
  • Debugging problems in production and test environments
  • Advising developers on best practices applicable to the environment, and maintaining high-volume clusters in multiple datacenters as well as Public and Private Clouds
  • Developing automation that improves deployment speed and service reliability in the containerized environment.

What skills, traits, and experience will ensure your success?

  • 5+ years of experience in large scale, product Kubernetes environments with roles focusing on design, delivery, and operations of at-scale multi-tenant infrastructure.
  • 3+ years CI/CD and Infrastructure Automation Experience with common platforms (Jenkins, Spinnaker, GitOps) and scripting (Terraform, Pulimi, CloudFormation, Ansible, Helm, Python)
  • BA/BS in Computer Science, Information Technology or a related technical field (preferred, but not necessary)
  • Strong background in developing SRE practices and promoting a DevOps culture
  • Self-motivated individual who is interested in expanding their knowledge and skill-set while improving on processes, procedures, and approaches that benefit the organization
  • Thoughtful individual who strives to find the best possible solutions to problems and the follow-through to see them from idea to execution
  • Reliable team player, yet able to work independently, with strong interpersonal and communication skills.

Why us?

You will work on real-world problems and make an impact by protecting our customers from cyber threats. You will be joining a cutting-edge project and will be able to influence the architecture, design, and structure of our core platform. You will tackle extraordinary challenges and work with the very BEST in the industry.

  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Paid Company Holidays
  • Paid Sick Time
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events including regular happy hours and team building events

 

SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

444 Castro Street, Mountain View, 94041

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about SentinelOneFind similar jobs