Fieldguide

Staff Site Reliability Engineer

Posted 4 Days Ago

In-Office or Remote

2 Locations

210K-247K Annually

Expert/Leader

In-Office or Remote

2 Locations

210K-247K Annually

Expert/Leader

As a Staff Site Reliability Engineer, you'll lead reliability strategies, design scalable systems, improve observability, and mentor engineers to enhance system performance and resilience.

The summary above was generated by AI

About Us

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and financial audit. Put simply, we build software for the people who enable trust between businesses.

We’re based in San Francisco, CA, but built as a remote-first company that enables you to do your best work from anywhere. We're backed by top investors including Growth Equity at Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and more.

We value diversity, in backgrounds and in experiences. We need people from all backgrounds and walks of life to help build the future of audit and advisory. Fieldguide’s team is inclusive, driven, humble and supportive. We are deliberate and self-reflective about the kind of team and culture that we are building, seeking teammates that are not only strong in their own aptitudes but care deeply about supporting each other's growth.

As an early stage start-up employee, you’ll have the opportunity to build out the future of business trust. We make audit practitioners’ lives easier by eliminating up to 50% of their work and giving them better work-life balance. If you share our values and enthusiasm for building a great culture and product, you will find a home at Fieldguide.

About the Role

As a Staff Site Reliability Engineer (SRE) at Fieldguide, you will play a critical leadership role in defining and driving the reliability, scalability, and observability strategy across our platform. You will operate as a technical leader and force multiplier, influencing system design, reliability standards, and engineering practices across multiple teams.

This role goes beyond operating our internal systems. You will shape how reliability is engineered into our products from the ground up. You’ll lead cross-functional initiatives, establish best practices, and mentor engineers while ensuring our systems remain resilient, performant, and scalable as the company grows.

What You’ll Do

Lead the design and evolution of highly scalable, fault-tolerant distributed systems across our cloud infrastructure.
Define and drive adoption of SLOs, SLIs, and error budgets across engineering teams.
Architect and continuously improve observability platforms (metrics, logging, tracing).
Own reliability strategy and roadmap, proactively identifying risks and driving long-term improvements.
Lead cross-team initiatives to improve system performance, scalability, and resilience.
Establish and enforce best practices for incident response, on-call, and operational excellence.
Drive root cause analysis and systemic improvements through blameless postmortems.
Champion automation and reduction of operational toil.
Guide capacity planning, load testing, and performance optimization efforts.
Design and validate disaster recovery, failover strategies, and resilience testing.
Mentor and coach engineers to elevate reliability engineering maturity.
Partner with Staff engineers across the organization to drive meaningful change
Partner with leadership to align business goals with reliability investments.

Who You Are

10+ years of experience in software engineering, with a focus on distributed systems and production infrastructure.
Extensive experience operating and scaling distributed systems in cloud environments, with a strong preference for AWS.
Deep expertise in system reliability, scalability, and performance engineering at scale.
Demonstrated experience implementing SLO-driven engineering practices and reliability frameworks.
Strong background building and owning observability ecosystems (e.g., Datadog, Prometheus, Grafana).
Proficiency with Infrastructure as Code tooling, particularly Terraform or equivalent.
Proven experience leading incident management, post-mortems, and production operations.
Strong software engineering fundamentals with the ability to contribute to and review complex codebases.
Track record of technical leadership and cross-functional influence across engineering and product teams.
Ability to balance tactical short-term needs with strategic long-term architectural improvements.
Excellent written and verbal communication skills, with the ability to translate complex technical concepts for diverse audiences.

Bonus Points

Experience designing or operating multi-region and globally distributed systems.
Deep expertise in distributed tracing and performance analysis across complex service architectures.
Hands-on experience with database scalability and performance tuning at scale.
Familiarity with compliance-driven engineering environments (e.g., SOC 2, FedRAMP, or similar frameworks).
Experience applying chaos engineering practices to validate and improve system resilience.
Experience building or scaling an SRE function within a high-growth organization.

More about Fieldguide

Fieldguide is a values-based company. Our values are:

Fearless - Inspire & break down seemingly impossible walls.
Fast - Launch fast with excellence, iterate to perfection.
Lovable - Deliver happiness & 11 star experiences.
Owners - Execute & run the business with ownership.
Win-win - Create mutual value & earn trust for life.
Inclusive - Scale the best ideas with inclusive teams.

Some of our benefits include

Competitive compensation packages with meaningful ownership
Flexible PTO
401k
Wellness benefits, including a bundle of free therapy sessions
Technology & Work from Home reimbursement
Flexible work schedules

San Francisco, California, United States

Similar Jobs

Coinbase

Site Reliability Engineer

12 Days Ago

Easy Apply

Remote

USA

Easy Apply

218K-257K Annually

Expert/Leader

218K-257K Annually

Expert/Leader

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Staff Site Reliability Engineer will lead AI-driven innovations, automate cloud infrastructure, implement CI/CD frameworks, and maintain operational IT support at Coinbase.

Top Skills: AnsibleAWSBashChefCi/CdDockerGitGoKubernetesPuppetPythonRubySaltTerraform

MongoDB

Site Reliability Engineer

18 Days Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

MongoDB

Site Reliability Engineer

Yesterday

Easy Apply

Remote or Hybrid

United States

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Fieldguide

Staff Site Reliability Engineer

Fieldguide San Francisco, California, USA Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech