hatch I.T.

Site Reliability Engineer (SRE)

Reposted 16 Hours Ago

Remote

Hiring Remotely in USA

Mid level

Remote

Hiring Remotely in USA

Mid level

The Site Reliability Engineer (SRE) will ensure system reliability and performance, automate operations, develop CI/CD pipelines, and manage cloud infrastructure.

The summary above was generated by AI

hatch I.T. is partnering with CardioOne to find a Site Reliability Engineer (SRE) to join their team. See deteails below:

About the Role:

CardioOne is seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of their production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, and best practices to enable rapid, reliable delivery of applications. You will report directly to the Senior Director of Engineering.

About the Company:

CardioOne partners with independent cardiologists to provide innovative solutions that improve patient outcomes and reduce costs. Their platform helps their physician partners thrive in today’s fee-for-service environment and prepare for success in value-based care. In February 2024, they partnered with WindRose Health Investors as well as top physician services and payor executives to grow their team and invest in their next phase of growth.

CardioOne offers a magnificent work environment, good working conditions, and competitive pay. They offer medical, dental, vision, and a 401k plan with a match to benefit eligible employees. They offer PTO (Personal Time Off) and sick time to full-time employees. They take pride in creating a culture of employee engagement that translates into an exemplary patient experience. Join them in their mission to positively impact US cardiology.

Responsibilities:

Ensure high availability, scalability, and performance of production systems.
Implement and maintain SLIs, SLOs, and SLAs for critical services.
Conduct capacity planning and performance tuning.
Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt , ansible
Develop automation to minimize manual operations and improve deployment workflows.
Build CI/CD pipelines to support rapid and reliable deployments.
Design and maintain monitoring, logging, and alerting systems (Datadog).
Participate in on-call rotations and lead incident response efforts.
Perform root-cause analysis and develop postmortems to prevent recurring issues.
Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
Optimize system architecture for reliability and fault tolerance.
Implement best practices for security, networking, and service resilience.
Work closely with development teams to design reliable microservices and distributed systems.
Advocate for SRE principles and drive operational excellence across engineering teams.
Mentor engineers on reliability practices, tooling, and automation strategies.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
Strong proficiency with Linux systems and shell scripting.
Experience with cloud platforms (AWS, Azure).
Hands-on experience with Kubernetes/ECS and container technologies (Docker).
Proficiency in at least one programming language: Python or Java
Experience with CI/CD pipelines and DevOps tooling.
Strong understanding of distributed systems, networking, and security fundamentals.
Strong analytical and problem-solving skills.
Excellent communication and cross-team collaboration.
Ability to thrive in fast-paced, high-stakes environments.
A mindset focused on continuous improvement and operational excellence.

Prefered Qualifications:

Experience with observability stacks (OpenTelemetry).
Knowledge of database management (PostgreSQL).
Experience with configuration management tools (Ansible, Chef, Puppet).
Familiarity with zero-downtime deployments and chaos engineering practices.

Top Skills

Ansible

AWS

Azure

Datadog

Docker

Ecs

Java

Kubernetes

Python

Terraform

Terragrunt

Similar Jobs

Jellyfish

Site Reliability Engineer

4 Days Ago

Remote or Hybrid

United States

165K-235K Annually

Mid level

165K-235K Annually

Mid level

Big Data • Cloud • Productivity • Software • Database • Analytics • Automation

The Site Reliability Engineer will support engineering teams, enhance system resilience, and drive scalable infrastructure practices.

Top Skills: Aws ServicesGrafanaHoneycombLinuxPythonTerraform

Capital One

Lead Software Engineer

10 Days Ago

Remote or Hybrid

McLean, VA, USA

205K-257K Annually

Senior level

205K-257K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

The role involves leading technology projects, optimizing distributed systems, collaborating on cloud-based solutions, and mentoring others while leveraging various technologies to enhance services.

Top Skills: AWSCassandraDockerGoKafkaNode.jsOpensearchPostgresPython

NBCUniversal

Site Reliability Engineer

13 Days Ago

Remote or Hybrid

New York, NY, USA

110K-145K Annually

Mid level

110K-145K Annually

Mid level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

The Site Reliability Engineer is responsible for maintaining monitoring systems, developing dashboards, scripting for automation, and handling Level 2 support in broadcast environments. They work with cloud and on-premises infrastructure and ensure system reliability through various monitoring tools.

Top Skills: AnsibleAWSAzureBashC#ChefDataminerDockerElk StackGCPGrafanaKubernetesNode.jsPythonReactSaltSplunkTerraformTypescriptVite

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine