Optura Logo

Optura

Sr. Site Reliability Engineer

Posted 3 Days Ago
In-Office
San Francisco, CA, USA
Senior level
In-Office
San Francisco, CA, USA
Senior level
Design, build, and operate Optura's multi-cloud, HIPAA-aware platform: run Kubernetes across cloud and customer on-prem/air-gapped environments, create unified deployment tooling (Helm/operators/GitOps), own SLOs/capacity/incident response, drive reliability, implement identity/networking/security controls, and build IaC/GitOps patterns in partnership with product and security teams.
The summary above was generated by AI

Optura is healthcare’s AI orchestration platform. We help healthcare organizations transform disconnected AI pilots into a unified, enterprise-scale program that delivers measurable value. Our platform enables teams to design, execute, and monitor intelligent agents that drive automation, insights, and action, while providing the control and observability needed to scale safely. Built for real-world complexity, Optura supports multiple model providers, integrates seamlessly with existing infrastructure, and offers both SaaS and self-hosted options. Our mission: revolutionize how healthcare deploys and operationalizes AI in production.

We’re looking for a Senior Platform Engineer to design, build, and operate the core services that power Optura’s AI Platform. In this role, you will own systems end-to-end. From model and agent orchestration to routing, reliability, and observability. You will partner closely with product and application teams to deliver secure, scalable, HIPAA-aware services. You will play a critical role in shaping the foundation that enables customers to safely deploy AI in real-world healthcare environments.

Location:

Open to remote or San Francisco Bay Area, Nashville Metro Area, or Raleigh, NC Area

What you'll do
  • Architect and own Optura's multi-cloud infrastructure across AWS, GCP, and Azure — provisioning, networking, identity, observability, and cost governance

  • Design and operate Kubernetes platforms that run consistently across our cloud environments and inside customer environments, including BYOC and on-prem (potentially air-gapped) deployments

  • Build a unified deployment framework so Optura ships the same product to SaaS, BYOC, and on-prem customers without bespoke per-customer engineering — Helm charts, operators, install/upgrade tooling, and release pipelines

  • Own SLOs, capacity planning, incident response, and postmortems across the entire infrastructure stack; set the bar for operational readiness

  • Drive reliability and performance through error budgets, chaos testing, latency optimization, and disciplined runbook quality

  • Harden the platform for regulated deployments — HIPAA controls, tenant isolation, audit logging, RBAC, KMS, and secrets rotation

  • Lead the build-out of IaC, GitOps, and progressive delivery (Terraform, Argo CD, Crossplane) as the team's standard

  • Partner with engineering and security to set opinionated guardrails: golden paths, base images, policy-as-code, and CI/CD that the rest of the org adopts by default

What we're looking for

  • 8+ years operating production infrastructure, including 3+ years in a senior SRE, platform, or staff infrastructure role

  • Deep Kubernetes expertise across managed (EKS, GKE, AKS) and self-managed/on-prem distributions — not just running it, but operating it at scale across heterogeneous environments

  • Multi-cloud fluency across AWS, GCP, and Azure, with informed opinions on when to abstract vs. embrace cloud-native primitives

  • Expert with Terraform (or Pulumi/Crossplane) and GitOps tooling

  • Experience shipping infrastructure that runs in customer environments — packaging, install/upgrade UX, air-gapped artifacts, support escalation paths

  • Strong networking, identity, and security fundamentals: VPC design, service mesh, mTLS, OIDC, KMS, secrets management

  • Production observability ownership (Prometheus, Grafana, OpenTelemetry, distributed tracing) and on-call leadership

  • A track record of writing real code — Go, Python, or similar — to extend the platform, not just configure it

What we would like to see
  • Experience shipping HIPAA-regulated workloads, including BYOC or air-gapped customer deployments

  • Background with enterprise software delivery tooling (Replicated, Cluster API, Talos, Rancher, OpenShift)

  • Built internal developer platforms (Backstage, golden paths) that measurably reduced lead time for an engineering org

  • FinOps experience — driving meaningful cloud spend reductions through architecture, not just rightsizing

  • AI/ML infrastructure exposure: GPU scheduling, model-serving stacks, inference autoscaling

  • OSS contributions to infrastructure projects, or strong opinions formed running them at scale

Benefits at Optura:

We offer a competitive compensation and benefits package, including:

  • Health, dental, and vision insurance

  • Generous paid time off

  • Opportunities for professional growth and development

Equal Employment Opportunity:

At Optura.AI, we’re not just building a product; we are intentionally building the team, culture, and equity we want to see in the tech world. That starts with recognizing that innovation thrives when diverse perspectives come together. Optura is an Equal Employment Opportunity Employer, period. We actively welcome and celebrate every candidate regardless of their race, color, religion, age, marital status, sex (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity or gender expression, national origin, veteran or military status, disability (physical or mental), genetic information, or any other protected characteristic.

More than compliance, we are deeply committed to diversity and inclusion because it’s a non-negotiable part of our foundation. We believe a truly diverse and inclusive workplace is the engine for long-term professional growth and competitive business success, directly fueling our mission to innovate. As part of the Optura team, your voice will be heard, your contributions will directly matter to our trajectory, and your unique background and experiences won't just be celebrated—they will be a vital part of our success. Let's build something exceptional, together.

Similar Jobs

2 Days Ago
Hybrid
San Francisco, CA, USA
160K-250K Annually
Senior level
160K-250K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead design and delivery of scalable cloud infrastructure for the Spend product. Embed with development teams to drive reliability, performance, observability, incident response, and automation. Own SLOs, runbooks, DevOps metrics, and collaborate with central DevOps and security teams to ensure compliance and resilience. Lead infrastructure projects including new service launches, data centre migrations, and modernising data pipelines.
Top Skills: Analytics PipelinesAWSData StreamingDevOpsGCPIncident ResponseKubernetesObservabilitySlosSre
11 Days Ago
Easy Apply
Hybrid
5 Locations
Easy Apply
210K-270K Annually
Senior level
210K-270K Annually
Senior level
Healthtech • Information Technology • Software • Telehealth
Lead reliability efforts for Zocdoc's cloud-based, consumer-facing services: monitor and maintain production systems, automate tooling and infrastructure, support scaling and performance, debug production incidents, and work with product teams to improve uptime and reliability.
Top Skills: AWSDistributed SystemsDnsDockerGCPGenaiHTTPHttpsKubernetesLoad BalancerMicroservicesNtpReverse ProxyTcp/IpTlsWeb Application Firewall
24 Days Ago
Remote or Hybrid
United States
175K-200K Annually
Senior level
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account