Kody Logo

Kody

Senior Site Reliability Engineer- Palo Alto, the US

Posted 7 Days Ago
In-Office
Palo Alto, CA, USA
Senior level
In-Office
Palo Alto, CA, USA
Senior level
Lead Site Reliability Engineer responsible for ensuring platform scalability and uptime on AWS. Own CI/CD and GitHub repository practices, run deployment pipelines, manage incidents and post-mortems, implement observability and logging, and coordinate technical alignment across US and international teams with bilingual communication.
The summary above was generated by AI

Senior Site Reliability Engineer (Payments Infrastructure)
Kody is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and operational excellence of our global payment platform. You will own production observability, incident response, service-level management, and cloud infrastructure reliability across mission-critical payment processing systems operating in Europe, Asia, and North America.
Responsibilities

  • Participate in a follow-the-sun production on-call rotation as a primary incident responder.
  • Diagnose, triage, mitigate, and coordinate resolution of production incidents across payment services, Kubernetes platforms, databases, messaging systems, and cloud infrastructure.
  • Define and maintain SLOs, SLIs, error budgets, alerting standards, and operational readiness processes.
  • Drive reliability improvements through automation, observability, capacity planning, performance optimization, and post-incident reviews.
  • Partner with engineering teams to improve resilience, security, and operational maturity in PCI-DSS-regulated environments.
  • Lead incident management during SEV1/SEV2 events and improve response effectiveness and MTTR.
  • Cross-Border Collaboration: Act as a key technical bridge between our US operations and international engineering hubs, leveraging bilingual communication to streamline complex technical alignment.

Requirements
  • 5+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Cloud Infrastructure roles supporting mission-critical production systems.
  • Strong hands-on experience with AWS, Kubernetes (EKS), Terraform, PostgreSQL, Redis, Kafka, Linux, networking, and modern observability platforms.
  • Deep understanding of distributed systems, cloud-native architectures, high availability, disaster recovery, capacity planning, and performance optimization.
  • Proven experience operating payment, banking, fintech, or other highly regulated systems with stringent security, compliance, and uptime requirements.
  • Strong knowledge of SRE principles, including SLOs, SLIs, error budgets, incident management, alert governance, and operational excellence.

Leadership & Operational Excellence

  • Demonstrates strong ownership and accountability, taking end-to-end responsibility for service reliability and customer impact.
  • Possesses a strong sense of urgency during production incidents while maintaining sound judgment and structured decision-making under pressure.
  • Applies a systematic and methodical approach to troubleshooting, root-cause analysis, and incident resolution in complex distributed environments.
  • Data-driven mindset with the ability to leverage metrics, telemetry, trends, and service-level indicators to prioritize reliability investments and operational improvements.
  • Continuously drives engineering excellence through iterative improvement, automation, standardization, and elimination of operational toil.
  • Proven ability to lead cross-functional incident response efforts, coordinate stakeholders, and communicate effectively during high-severity production events.
  • Champions a culture of operational readiness, continuous learning, post-incident improvement, and blameless accountability.
  • Demonstrates strong mentoring and technical leadership skills, influencing engineering teams to build reliable, scalable, and resilient systems by design.

Benefits
  • Competitive packages aligned with California market standards
  • Lead a dynamic and innovative team in a very rapidly growing company
  • Collaborative, inclusive environment where your contributions are recognized and valued

Similar Jobs

21 Days Ago
In-Office
Sunnyvale, CA, USA
170K-196K Annually
Senior level
170K-196K Annually
Senior level
Software • Cybersecurity
The Senior Site Reliability Engineer will ensure the reliability and performance of cloud systems, manage AWS/Azure infrastructure, optimize performance, lead incident responses, and implement security best practices.
Top Skills: AWSAzureAzure DevopsDockerGitlab Ci/CdGoJenkinsKubernetesPowershellPython
22 Days Ago
In-Office
San Francisco, CA, USA
210K-240K Annually
Senior level
210K-240K Annually
Senior level
Artificial Intelligence • Marketing Tech • Software • Big Data Analytics
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, improve system reliability, manage CI/CD pipelines, and collaborate across teams for operational excellence.
Top Skills: AnsibleArgocdAWSBashDatadogDockerElkGithub ActionsGrafanaKubernetesLinuxOpentelemetryPrometheusPythonTerraform
24 Minutes Ago
In-Office or Remote
San Francisco, CA, USA
113K-148K Annually
Senior level
113K-148K Annually
Senior level
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Design, implement, and scale finance systems (primarily Oracle Cloud Fusion) to automate and streamline core finance processes, enable international expansion, drive AI-enabled automation, manage solution design and testing, and support finance teams for internal and SOX audits.
Top Skills: Accounting HubAi ToolsApple MacosCash ManagementFdi ReportingGoogle Workspace (G Suite)KyribaNavan Travel And ExpenseOracle Cloud Fusion ErpPayablesReceivablesRevenue ManagementSlackSubledger AccountingSubscription ManagementWorkivaZip Procurement To Pay

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account