Trellix Logo

Trellix

Site Reliability Engineer

Reposted 13 Days Ago
In-Office
2 Locations
Senior level
In-Office
2 Locations
Senior level
The Site Reliability Engineer ensures high availability in production environments, manages incidents, troubleshoots issues, and implements monitoring solutions. Responsibilities include improving operational aspects of systems and collaborating with engineering teams to optimize services.
The summary above was generated by AI

Job Title:

Site Reliability Engineer

About Skyhigh Security:

Skyhigh Security is a dynamic, fast-paced, cloud company that is a leader in the security industry.  Our mission is to protect the world’s data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency. 

Since 2011, organizations have trusted us to provide them with a complete, market-leading security platform built on a modern cloud stack. Our industry-leading suite of products radically simplifies data security through easy-to-use, cloud-based, Zero Trust solutions that are managed in a single dashboard, powered by hundreds of employees across the world. With offices in Santa Clara, Aylesbury, Paderborn, Bengaluru, Sydney, Tokyo and more, our employees are the heart and soul of our company. 

Skyhigh Security Is more than a company; here, when you invest your career with us, we commit to investing in you. We embrace a hybrid work model, creating the flexibility and freedom you need from your work environment to reach your potential. From our employee recognition program, to our ‘Blast Talks' learning series, and team celebrations (we love to have fun!), we strive to be an interactive and engaging place where you can be your authentic self. 

We are on these too! Follow us on LinkedIn and Twitter@SkyhighSecurity.

Role Overview:

The Site Reliability Engineer at Skyhigh Security will be responsible for monitoring, maintaining and troubleshooting operational issues of a high availability  production environment.

Job Summary: 

The Site Reliability Engineer at Skyhigh Security will be responsible for monitoring, maintaining and troubleshooting operational issues of a high availability  production environment.

The SRE will also act as a bridge between Operations, Engineering and Product Management teams and you will represent the customer point of view to continue driving enhancements to our products and uptime. SREs are responsible for managing and improving the operational aspects of systems, such as monitoring, alerting, incident response, and vendor interactions. 

Only US CItizens are eligible.

About the role:

  • Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services.

  • Ensure all SRE and operating procedures are maintained and executed.

  • Maintain a 24x7 production environment with a high level of service availability and perform quality reviews, manage operational issues.

  • Perform root cause analysis for major incidents and drive the process by involving required stakeholders.

  • Perform problem management by analyzing metrics, alarms and dashboards to troubleshoot problem areas, report issues to assist in performance tuning and fault finding.

  • Implementation of proactive monitoring, alerting, trend analysis, and self-healing solutions.

  • Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python or any other programming language.

  • Manage and maintain Runbooks and Standard Operating procedures

  • Manage, coordinate, and document all types of maintenance activities and outages.

  • Perform patching and upgrades for vulnerability management.

  • Work closely with the teams to initiate the development of new ideas into internal tools.

  • Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality production service.

  • Capable of working a flexible work schedule in a 24 x 7 environment with rotational shifts

About you:

  • Bachelor’s degree in computer science, electrical engineering or a related area, with 7+ years of SRE experience in a large enterprise organization

  • System admin experience on Linux environments.

  • Experience with end-to-end monitoring setup for infra and applications

  • Experience with Prometheus, Grafana, ELK, Opensearch, Cloudwatch, PagerDuty and other monitoring tools.

  • Solid experience with Cloud Technologies such as AWS and OCI.

  • Good experience with containerized workloads tools like Kubernetes.

  • Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required.

  • Experience with BGP, NAT, TCP/IP, iBGP, Proxies, Cross connects.

  • Experience with L2/L3 switching, knowledge of Juniper and Cisco routing devices.

  • Experience understanding and managing web servers (Apache, Tomcat, Nginx)

  • Ability to script/program with one or more high level languages, such as Python, Go, etc.

  • Experience with any configuration management tools like Salt or Puppet or Ansible or similar.

  • Experience with source control tools such as Github and SVN.

  • Experience with deployment tools Jenkins, Harness etc.

  • Experience with SQL and NoSQL databases like Redis, Crate, Elasticsearch.

  • Experience in performing and writing Root Cause Analysis documents.

  • Strong communication and analytical/problem-solving skills.

  • Systematic approach and to drive problems to resolution.

  • Good to have experience/knowledge of GCP, Azure

  • Experience in Security domain will be added advantage

  • Experience with open-source technologies like Kafka, Hadoop, HBase, Zookeeper, Oozie will be an added advantage.

Company Benefits and Perks:

We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Retirement Plans

  • Medical, Dental and Vision Coverage

  • Paid Time Off

  • Paid Parental Leave

  • Support for Community Involvement

We're serious about our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Top Skills

Ansible
AWS
Bash
Cloudwatch
Crate
Elasticsearch
Elk
Git
Grafana
Hadoop
Harness
Hbase
Jenkins
Kafka
Kubernetes
NoSQL
Oci
Oozie
Opensearch
Pagerduty
Prometheus
Puppet
Python
Redis
Salt
SQL
Svn
Zookeeper

Similar Jobs

5 Days Ago
Hybrid
2 Locations
Senior level
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
This role involves enhancing and ensuring the reliability of Salesforce systems, managing CI/CD pipelines, automating deployment processes, and resolving incidents while collaborating with various teams to drive system performance.
Top Skills: ApexAWSAzureCi/CdCopadoDatadogDevOpsEvent HubMulesoftSalesforceSOQL
6 Days Ago
Hybrid
4 Locations
147K-185K Annually
Mid level
147K-185K Annually
Mid level
Fintech • Machine Learning • Payments • Software • Financial Services
As a Site Reliability Engineer, you will design, develop, and support cloud-based solutions, collaborating with Agile teams and mentoring others.
Top Skills: AnsibleAWSDockerGoJavaKubernetesPythonRubySQLTerraform
7 Days Ago
Remote or Hybrid
2 Locations
160K-180K Annually
Expert/Leader
160K-180K Annually
Expert/Leader
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills: Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account