Aerospike Logo

Aerospike

Staff Site Reliability Engineer

Sorry, this job was removed at 06:18 p.m. (PST) on Sunday, Aug 24, 2025
Remote
Hiring Remotely in USA
Remote
Hiring Remotely in USA

Similar Jobs

4 Days Ago
Easy Apply
Remote
USA
Easy Apply
218K-257K Annually
Senior level
218K-257K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, monitoring, and incident response for AI infrastructure; build automation and CI/CD tooling; manage Kubernetes/Docker production workloads; partner with infrastructure, security, and compliance; improve observability and documentation; develop internal full‑stack tooling in Go or Python.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxLog AggregationNetwork SecurityPuppetPythonRubySaltTerraform
11 Days Ago
Remote
United States
223K-302K Annually
Expert/Leader
223K-302K Annually
Expert/Leader
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
14 Days Ago
Remote or Hybrid
Expert/Leader
Expert/Leader
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
The Staff Site Reliability Engineer is responsible for ensuring the reliability, performance, and security of workplace collaboration services, focusing on automation, incident management, and operational excellence while providing technical leadership and mentoring to engineers.
Top Skills: Ai EngineeringAzure Virtual DesktopDefender For Office 365Exchange OnlineGraph ApiIntuneJamf ProMicrosoft 365Microsoft Entra IdMicrosoft PurviewOnedrivePowershellSharepoint OnlineTeams

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.

Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases. 

 At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability.

If you're ready to shape the future of data, join us.

Staff Site Reliability Engineer

As a Staff Site Reliability Engineer at Aerospike, you’ll be a technical leader within our global SRE organization, helping drive reliability, performance, and scalability across our hybrid and multi-cloud environments. You’ll bring deep operational experience and lead by example—mentoring others, designing resilient systems, and championing modern SRE practices across new and legacy platforms.

You’ll play a key role in shaping the direction of our infrastructure initiatives, from Kubernetes-based platforms like AKS and the Aerospike Kubernetes Operator to existing services in AWS and GCP. Your impact will span teams and systems as you solve complex problems, influence architecture, and foster a culture of ownership, resilience, and continuous improvement.

Key Responsibilities
  • Provide technical leadership across multiple systems and environments, proactively identifying risks, shaping architecture decisions, and improving reliability and performance at scale.
  • Lead key infrastructure efforts including Kubernetes platform expansion (AKS, AKO), and application of SRE principles to legacy systems and new cloud offerings.
  • Define, measure, and enforce reliability standards through SLIs/SLOs, observability tooling, and incident response frameworks.
  • Mentor and guide other SREs by leading design sessions, conducting technical deep dives, and reviewing code, configurations, and infrastructure decisions.
  • Partner with product, engineering, and cloud teams to align reliability goals with delivery objectives.
  • Lead root cause analyses and implement systemic fixes for issues spanning multiple platforms or services.
  • Drive automation-first approaches using IaC, CI/CD pipelines, and scripting to reduce toil and increase deployment confidence.
  • Influence cross-functional roadmaps, identifying areas for innovation, technical debt reduction, and long-term scalability.
  • Participate in the global on-call rotation, bringing senior-level calm and clarity during incidents and escalations.
Required Experience
  • 8+ years of experience in SRE, DevOps, or infrastructure engineering, including significant time operating production systems at scale.
  • Deep hands-on experience with at least one major public cloud (AWS, GCP, Azure), and working knowledge of the others; Azure experience is a plus.
  • Production experience with Kubernetes, including operating clusters, Helm, operators, and supporting microservices in real-world environments.
  • Strong proficiency in infrastructure-as-code tools such as Terraform and CI/CD automation platforms.
  • Expertise in observability tools and practices (Datadog, Prometheus, Grafana, ELK, etc.) and using them to define SLIs and SLOs.; DataDog experience is a plus
  • Programming and scripting ability in one or more languages (Python, Go, Bash, etc.).
  • Experience with large-scale incident response and post-incident review practices.
  • Proven ability to mentor other engineers and influence technical strategy across multiple teams.
  • Strong communication skills to articulate complex concepts to technical and non-technical stakeholders.
Preferred Skills and Qualifications
  • Hands-on experience managing and optimizing database deployments and services in production environments, ensuring high availability and performance.
  • Familiarity with Aerospike or other distributed databases is a plus.
  • Kubernetes or cloud certifications (CKA, CKS, AWS/GCP DevOps/Architect) a plus but not require
  • Track record of influencing architectural decisions across teams or domains.

Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.

Join us at Aerospike and be part of a dynamic team that is shaping the future of data management. Salary Range for California Based Applicants: [$145,000 - $185,000] (actual compensation will be determined based on experience, location, and other factors permitted by law).



HQ

Aerospike Mountain View, California, USA Office

2525 E Charleston Road, Mountain View, CA, United States, 94043

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account