Grafana Labs Logo

Grafana Labs

Staff Software Engineer - Grafana Cloud k6 | USA | Remote

Reposted Yesterday
Easy Apply
Remote
Hiring Remotely in United States
175K-210K Annually
Senior level
Easy Apply
Remote
Hiring Remotely in United States
175K-210K Annually
Senior level
Lead and scale reliability and DevOps/SRE practices for Grafana Cloud k6: define SLIs/SLOs, incident response, observability, runbooks, and guide architecture and cross-team engineering to improve availability and operational ownership.
The summary above was generated by AI

Grafana Labs is a remote-first, open-source powerhouse. There are more than 20M users of Grafana, the open source visualization tool, around the globe, monitoring everything from beehives to climate change in the Alps. The instantly recognizable dashboards have been spotted everywhere from a NASA launch and Minecraft HQ to Wimbledon and the Tour de France. Grafana Labs also helps more than 3,000 companies -- including Bloomberg, JPMorgan Chase, and eBay -- manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo).

We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.

You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity.

This is a remote opportunity, and we would be interested in applicants in United States time zones.  The Opportunity 

We are the team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics, used by teams globally to ensure resilient, high-performing systems.  This opportunity is with the Grafana Cloud k6 squad, who build and operate our performance testing product. Grafana Cloud k6 is built around the OSS k6 and targeted at users looking to run performance tests at scale. Our enterprise and SaaS offerings allow customers to load test their systems by running distributed tests from 15+ regions worldwide, using hundreds of thousands of virtual users sending millions of requests per second. We ingest huge volumes of data generated by k6, which can be used to view, correlate and analyze metrics from each test.

k6 is a product used by other engineers, and as such, we are looking for people enthusiastic about building high-quality tools they would want to use themselves. Due to our small teams and fast development pace, you will have a substantial and immediate impact on how the end product is architected, developed, and how the engineering team operates.

Your role will focus on establishing and scaling a cross-team culture of engineering excellence by setting standards and guiding adoption of strong DevOps/SRE practices that improve reliability, availability, and operational ownership. As this foundation matures, the role is expected to expand into broader application and product development leadership, contributing architectural and technical depth beyond operational excellence.

What will you be doing? 
  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
  • Provide visibility into system health through clear operational metrics and reliability reporting.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
  • Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
  • Share knowledge through clear, high-quality documentation and technical communication—internally and, where appropriate, externally—to help teams build and operate systems more effectively.
  • As the reliability foundation matures, grow into broader application and product development leadership, contributing architectural and technical depth beyond operations.

We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction.
We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards.
You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).

Requirements:
  • Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale
  • Strong programming background in a modern language (Python and Go are primary, but prior experience is not required)
  • Experience designing, building, and operating large-scale distributed systems
  • Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes)
  • Experience with test automation, including performance and functional testing
  • Ability to influence engineering practices through clear technical communication, reviews, and collaboration
  • Strong interpersonal skills and ability to work effectively across teams
  • Familiarity with modern software engineering processes and delivery practices
  • Self-driven and comfortable operating with a high degree of autonomy and ambiguity

Bonus Points For:

  • Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS)
  • Familiarity with observability tooling and platforms (e.g. the Grafana stack)
  • Experience working with Python, Go, JavaScript and/or Jsonnet 
  • Experience building or operating event-driven or asynchronous systems
  • Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics
  • Interest in, or experience with, building testing frameworks or developer tooling
 

Compensation & Rewards:

In the US, the Base compensation range for this role is $174,986 - $209,983. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process. All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.

 

*Compensation ranges are country specific. If you are applying for this role from a different location than listed above, your recruiter will discuss your specific market’s defined pay range & benefits at the beginning of the process.

Why You’ll Thrive at Grafana Labs:

  • 100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose.
  • Scaling Organization – Tackle meaningful work in a high-growth, ever-evolving environment.
  • Transparent Communication – Expect open decision-making and regular company-wide updates.
  • Innovation-Driven – Autonomy and support to ship great work and try new things.
  • Open Source Roots – Built on community-driven values that shape how we work.
  • Empowered Teams – High trust, low ego culture that values outcomes over optics.
  • Career Growth Pathways – Defined opportunities to grow and develop your career.
  • Approachable Leadership – Transparent execs who are involved, visible, and human.
  • Passionate People – Join a team of smart, supportive folks who care deeply about what they do.
  • In-Person onboarding - We want you to thrive from day 1 with your fellow new ‘Grafanistas’ to learn all about what we do and how we do it. 
  • Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.

Equal Opportunity Employer: We will recruit, train, compensate and promote regardless of race, religion, color, national origin, gender, disability, age, veteran status, and all the other fascinating characteristics that make us different and unique. We believe that equality and diversity builds a strong organization and we’re working hard to make sure that’s the foundation of our organization as we grow.

Grafana Labs may utilize AI tools in its recruitment process to assist in matching information provided in CVs to job postings. The recruitment team will continue to review inbound CVs manually to identify alignment with current openings.

#LI-Remote

For information about how your personal data is used once you’ve applied to a job, check out our privacy policy. 
 

Top Skills

AWS
Docker
Go
Grafana
Grafana Loki
Grafana Mimir
Grafana Tempo
JavaScript
Jsonnet
K6
Kubernetes
Python

Similar Jobs

26 Minutes Ago
Remote or Hybrid
170K-230K Annually
Expert/Leader
170K-230K Annually
Expert/Leader
Consumer Web • eCommerce • Machine Learning • Software • Sports • Analytics
The Director of Growth Marketing Operations will lead the growth operations function, design marketing infrastructure, manage marketing technology stack, analyze data pipelines, ensure privacy compliance, and enable marketing teams to execute successful multi-channel campaigns.
Top Skills: LookerPower BISalesforce Marketing CloudSegmentService CloudSnowflakeTableau
29 Minutes Ago
Remote or Hybrid
63K-105K Annually
Mid level
63K-105K Annually
Mid level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Client Value Executive manages relationships with clients, aiming to retain revenue and develop business opportunities through consultative sales and collaboration with internal teams.
Top Skills: ExcelPowerPointSalesforceTeamsWordZoom
30 Minutes Ago
Remote or Hybrid
155K-180K Annually
Senior level
155K-180K Annually
Senior level
Fintech • Mobile • Social Impact • Financial Services
Lead high-impact design initiatives for growth, mentor peers, drive product design, and collaborate on user-centered experiences. Focus on onboarding and subscription management optimization.
Top Skills: Design ResearchDesign SystemsMobile AppsUi DesignUx Design

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account