Fieldguide

Senior Site Reliability Engineer

Posted 4 Days Ago

In-Office or Remote

2 Locations

190K-206K Annually

Senior level

In-Office or Remote

2 Locations

190K-206K Annually

Senior level

As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of production systems, improve system performance, and enhance observability through design and automation.

The summary above was generated by AI

About Us

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and financial audit. Put simply, we build software for the people who enable trust between businesses.

We’re based in San Francisco, CA, but built as a remote-first company that enables you to do your best work from anywhere. We're backed by top investors including Growth Equity at Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and more.

We value diversity, in backgrounds and in experiences. We need people from all backgrounds and walks of life to help build the future of audit and advisory. Fieldguide’s team is inclusive, driven, humble and supportive. We are deliberate and self-reflective about the kind of team and culture that we are building, seeking teammates that are not only strong in their own aptitudes but care deeply about supporting each other's growth.

As an early stage start-up employee, you’ll have the opportunity to build out the future of business trust. We make audit practitioners’ lives easier by eliminating up to 50% of their work and giving them better work-life balance. If you share our values and enthusiasm for building a great culture and product, you will find a home at Fieldguide.

About the Role

As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for ensuring the reliability, scalability, and observability of our production systems. You will apply software engineering principles to infrastructure and operations, designing systems that are resilient, highly available, and capable of scaling with rapid growth.

You’ll work closely with product and platform engineering teams to define and implement reliability standards, improve system performance, and build robust observability practices. This role is central to maintaining a high level of trust in our systems by proactively identifying risks, reducing toil through automation, and driving operational excellence.

What You’ll Do

Design and operate highly scalable, fault-tolerant systems that support production workloads across a distributed cloud environment.
Define and implement Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to guide reliability decisions.
Build and improve observability systems (metrics, logs, tracing) to provide deep visibility into system behavior and performance.
Lead efforts to improve system reliability and performance, including capacity planning, load testing, and performance tuning.
Automate operational processes to reduce manual toil and improve system consistency and resilience.
Partner with engineering teams to design systems with reliability and scalability built in from the start.
Participate in and improve incident response, on-call practices, and post-incident reviews, focusing on root cause analysis and systemic improvements.
Drive continuous improvement of system resilience, including disaster recovery and chaos testing.
Establish best practices for monitoring, alerting, and incident management to ensure rapid detection and resolution of issues.
Advocate for reliability-focused engineering culture, including blameless postmortems and operational excellence.

Who You Are

5+ years of experience in site reliability engineering, infrastructure, or a related software engineering discipline.
Strong experience operating and scaling distributed systems in cloud environments, with AWS preferred.
Hands-on experience building and managing observability platforms (e.g., Datadog, Prometheus, Grafana, CloudWatch).
Experience defining SLOs/SLIs and leveraging them to inform and drive engineering priorities.
Proficiency with Infrastructure as Code tooling, particularly Terraform or equivalent.
Deep understanding of system performance, reliability patterns, and distributed system failure modes.
Experience supporting production systems through on-call rotations and incident response.
Proficiency in at least one programming or scripting language used for automation and tooling.
Strong communication and collaboration skills, with the ability to work effectively across engineering and product teams.

Bonus Points

Experience implementing distributed tracing systems, such as OpenTelemetry or similar frameworks.
Experience with capacity planning and performance benchmarking at scale.
Familiarity with database performance tuning and observability across high-traffic systems.
Exposure to regulated or compliance-heavy engineering environments (e.g., SOC 2, FedRAMP, or equivalent frameworks).
Experience applying chaos engineering practices to proactively test and strengthen system resilience.

More about Fieldguide

Fieldguide is a values-based company. Our values are:

Fearless - Inspire & break down seemingly impossible walls.
Fast - Launch fast with excellence, iterate to perfection.
Lovable - Deliver happiness & 11 star experiences.
Owners - Execute & run the business with ownership.
Win-win - Create mutual value & earn trust for life.
Inclusive - Scale the best ideas with inclusive teams.

Some of our benefits include

Competitive compensation packages with meaningful ownership
Flexible PTO
401k
Wellness benefits, including a bundle of free therapy sessions
Technology & Work from Home reimbursement
Flexible work schedules

San Francisco, California, United States

Similar Jobs

Coinbase

Senior Site Reliability Engineer

12 Days Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The role involves leading AI product development, enhancing CI/CD frameworks, automating IT workflows, supporting AWS services, and driving cloud security best practices.

Top Skills: AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform

Applied Systems

Senior Site Reliability Engineer

14 Days Ago

Remote or Hybrid

65K-160K Annually

Senior level

65K-160K Annually

Senior level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.

Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

OfficeSpace Software

Senior Site Reliability Engineer

Yesterday

Remote

United States

Senior level

Real Estate • Software

As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.

Top Skills: AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Fieldguide

Senior Site Reliability Engineer

Fieldguide San Francisco, California, USA Office

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech