Articul8 AI Logo

Articul8 AI

Senior Site Reliability Engineer (SRE) - (Dublin, CA)

Reposted 10 Days Ago
Be an Early Applicant
In-Office
Dublin, CA
Senior level
In-Office
Dublin, CA
Senior level
The Senior Site Reliability Engineer will ensure the reliability and scalability of our Generative AI SaaS platform, implement automation, and support incident response efforts.
The summary above was generated by AI
About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities
  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.

  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.

  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.

  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.

  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.

  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.

  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.

  • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.

  • Implement and enforce security best practices across all systems and environments.

  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

QualificationsRequired
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience

  • 8+ years of experience in DevOps, SRE, or similar roles

  • Strong experience with cloud platforms (AWS, GCP, or Azure)

  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)

  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)

  • Solid background in containerization technologies (Docker, Kubernetes)

  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)

  • Strong understanding of CI/CD pipelines and automation

  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems

Preferred
  • Experience supporting AI/ML systems in production

  • Knowledge of GPU infrastructure management and optimization

  • Familiarity with distributed systems and high-performance computing

  • Experience with database systems (SQL and NoSQL)

  • Certifications in cloud platforms (AWS, GCP, Azure)

  • Experience with chaos engineering and resilience testing

  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow’s AI at Articul8 AI!

Top Skills

AWS
Azure
Bash
CloudFormation
Docker
Elk Stack
GCP
Go
Grafana
Kubernetes
Prometheus
Python
Terraform

Articul8 AI Dublin, California, USA Office

4120 Dublin Blvd, Suite 250, Dublin, California , United States, 94568

Articul8 AI Santa Clara, California, USA Office

3979 Freedom Circle Mission Towers, Suite 340, , Santa Clara, CA , United States, 95054

Similar Jobs

13 Days Ago
In-Office
Costa Mesa, CA, USA
166K-220K Annually
Senior level
166K-220K Annually
Senior level
Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
The Senior Site Reliability Engineer will build, deploy, and maintain critical infrastructure for Business Systems, enhancing CI/CD processes and promoting system reliability.
Top Skills: AnsibleAWSAzureBashCloudFormationDockerGoGoogle Cloud PlatformHelmKubernetesPuppetPythonRustTerraform
5 Days Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
140K-170K Annually
Senior level
140K-170K Annually
Senior level
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills: ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
19 Days Ago
Remote or Hybrid
2 Locations
160K-180K Annually
Expert/Leader
160K-180K Annually
Expert/Leader
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills: Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account