Astera Logo

Astera

Site Reliability Engineer

Posted 3 Days Ago
Be an Early Applicant
Hybrid
Emeryville, CA, USA
Entry level
Hybrid
Emeryville, CA, USA
Entry level
The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.
The summary above was generated by AI
About Astera:

Astera is a private foundation with a $2.5B endowment on a mission to steer science and technology toward an abundant future for all. Unlike traditional foundations, we operate like a high-velocity startup with unprecedented access to computational resources and complete freedom from funding pressures or profit motives. This allows us to focus on ambitious goals and attract incredibly creative scientists and engineers from leading academic institutions and from frontier AI labs.

Neuro-AI is our large-scale AI research program, pursuing a neuroscience-informed approach to engineering AGI. This is not yet-another-lab scaling LLMs in a hope of achieving general intelligence. We are integrating neuroscience, AI, and bioengineering to understand and digitally model the architecture of the human brain.

Position Summary:

We are looking for a Site Reliability Engineer to own the digital infrastructure that powers our research.

This includes compute resources that we rent from third parties, container registries, and dashboards. The main objective is to make sharing these resources easy and efficient, ensuring the infrastructure is reliable and accessible to the right people.

This role spans a broad spectrum of activities:

  • Compute Access: Ensure easy and efficient access to compute resources for our researchers.

  • Resource Visibility: Provide clear visibility into resource utilization and cluster health.

  • Auto-Scaling: Enable automatic scaling of compute resources based on demand.

  • Access Management: Ensure the right people have access to the right resources.

  • Reproducibility: Drive towards deterministic deployments and reproducible research environments.

  • Process Automation: Automate operational processes where it makes sense to increase efficiency.

Current stack: Ansible, Kubernetes, Docker, Tailscale, Python, Grafana, Prometheus, and Talos Linux. We're not religious about any of it.

Qualifications:
  • Ownership: You are comfortable being the person accountable when the cluster is unhealthy or capacity is tight.

  • Systems Intuition: You understand how schedulers, containers, networking, storage, and hardware interact. You can reason about failure modes and design systems that degrade predictably.

  • Operational Rigor: You value observability, reproducibility, and clear operational boundaries. You leave systems in a state that other engineers can understand, operate, and debug without you.

  • Pragmatism: You can support experimental research workloads without forcing everything into a rigid "production" mold. You know when to stabilize and when to allow controlled chaos to speed up discovery.

Location & Visa:
  • This role is in-person in Emeryville, CA.

  • Visa sponsorship may be available for qualified candidates.

Top Skills

Ansible
Docker
Grafana
Kubernetes
Prometheus
Python
Tailscale
Talos Linux

Similar Jobs

2 Days Ago
Remote or Hybrid
Santa Clara, CA, USA
166K-290K Annually
Expert/Leader
166K-290K Annually
Expert/Leader
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Technical Lead Site Reliability Engineer will drive reliability, lead a team, optimize infrastructure, and manage CI processes at Veza, focusing on cloud automation and SRE leadership.
Top Skills: AWSBazelGitopsHelmKubernetesLinuxTerraform
5 Days Ago
Hybrid
Menlo Park, CA, USA
169K-224K Annually
Senior level
169K-224K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech
Lead the design and operation of a fault-tolerant cloud infrastructure, implement infrastructure-as-code, manage Kubernetes reliability, and mentor engineers.
Top Skills: AnsibleAWSAzureBashCloudFormationDatadogGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesOpentelemetryPowershellPrometheusPythonTerraform
6 Days Ago
Remote or Hybrid
US
65K-135K Annually
Mid level
65K-135K Annually
Mid level
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
The Site Reliability Engineer will ensure system reliability and scalability, manage infrastructure, automate tasks, and collaborate cross-functionally while mentoring junior engineers and supporting production environments.
Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account