Edison Scientific Logo

Edison Scientific

Sr. Infrastructure Engineer

Posted 2 Hours Ago
Be an Early Applicant
Easy Apply
In-Office
San Francisco, CA
200K-350K Annually
Senior level
Easy Apply
In-Office
San Francisco, CA
200K-350K Annually
Senior level
Design, build, and operate large-scale Kubernetes clusters and custom operators (CRDs) to orchestrate thousands of persistent AI agent workloads. Drive cluster scaling, autoscaling, scheduling, storage, networking, observability, and IaC. Troubleshoot distributed infrastructure, set best practices, and partner with backend, ML, and research teams to deliver a production-grade platform for long-running scientific workloads.
The summary above was generated by AI
About

Edison Scientific focuses on building and commercializing AI agents for science, and shares FutureHouse’s mission to build an AI Scientist - scaling autonomous research, productizing it, and applying it to critical challenges such as drug development.

Role

As a Senior Infrastructure Engineer, you'll play a key role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus will be the orchestration for our agents at scale — building and managing clusters that orchestrate thousands of persistent, stateful workloads, developing custom resource definitions (CRDs) and operators, and ensuring the reliability and efficiency of our compute layer at scale.

Our mission is to build an AI scientist, and you'll own the infrastructure foundation it runs on. AI agents performing long-running scientific research demand resilient scheduling, lifecycle management, and resource orchestration far beyond typical cloud-native workloads. This role will influence platform architecture, establish infrastructure best practices, and partner closely with backend engineers, ML engineers, and researchers to deliver a production-grade environment that lets science move faster.

At Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact, making sound architectural tradeoffs, and building foundations that allow teams and science to move faster.

Responsibilities
  • Architect, implement, and operate Kubernetes clusters that support thousands of concurrent, persistent resources (agents, jobs, services) with high availability and efficient resource utilization.
  • Design and develop custom resource definitions (CRDs) and Kubernetes operators to model and manage domain-specific workloads such as AI agent lifecycles, research pipelines, and long-running compute tasks.
  • Drive the strategy for cluster scaling, node pool management, autoscaling policies, and resource quota frameworks to handle rapid workload growth.
  • Build and maintain infrastructure-as-code (Terraform, Pulumi, or similar) for reproducible, version-controlled environment management.
  • Design and implement robust scheduling, placement, and affinity strategies to optimize cost, performance, and fault tolerance for heterogeneous workloads (CPU, GPU, memory-intensive).
  • Establish and uphold best practices around observability, monitoring, alerting, and incident response for infrastructure systems (Prometheus, Grafana, Datadog, or similar).
  • Own storage and networking strategy within Kubernetes — including persistent volume management, CSI drivers, service mesh, network policies, and ingress architecture.
  • Troubleshoot complex, cross-system infrastructure issues and guide others through effective debugging and remediation in distributed environments.
  • Collaborate closely with backend, ML, and research teams to understand workload requirements and translate them into reliable infrastructure patterns.
Qualifications
  • 5+ years of professional infrastructure or platform engineering experience, with deep hands-on Kubernetes expertise in production environments.
  • Experience designing and implementing custom resource definitions (CRDs) and Kubernetes operators (using frameworks such as Kubebuilder, Operator SDK, or controller-runtime).
  • Track record of operating and scaling Kubernetes clusters supporting thousands of persistent or long-lived resources (stateful workloads, persistent pods, long-running jobs).
  • Deep understanding of Kubernetes internals — API server, etcd, scheduler, controller manager, kubelet — and how they behave at scale.
  • Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS) and associated networking, storage, and IAM primitives.
  • Proficiency in at least one systems or backend language for operator development and infrastructure tooling.
  • Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or Crossplane) and GitOps workflows.
  • Strong working knowledge of container networking (CNI plugins, service mesh, network policies), storage (CSI, persistent volumes, StatefulSets), and security (RBAC, Pod Security Standards, secrets management).
  • Ability to operate autonomously, make sound technical judgments, and drive projects from concept through production.

Bonus points for:

  • Experience with data-intensive platforms, scientific computing, or ML/AI infrastructure.
  • Prior experience in startups or small teams with significant architectural ownership and ambiguity.
  • Experience scaling systems, teams, or platforms through periods of rapid growth.

Location + Compensation

  • Collaboration is at the heart of discovery.  We work on-site to stay close to the science, move faster as a team, and share the kind of energy that only happens when smart, curious people build together- in a space that we love to be in!
    • Location:  San Francisco (Dogpatch)
  • At Edison Scientific, we know that titles can cover a range of experience levels. Actual base pay will depend on factors such as skills, experience, and scope of responsibility. Compensation ranges may evolve as we continue to grow. In addition to base pay, team members are eligible for equity, benefits, and other perks.
    • Compensation:  $200,000- $350,000+ and equity

Top Skills

Kubernetes,Crds,Operators,Kubebuilder,Operator Sdk,Controller-Runtime,Terraform,Pulumi,Crossplane,Gitops,Prometheus,Grafana,Datadog,Csi Drivers,Service Mesh,Cni Plugins,Statefulsets,Persistentvolumes,Rbac,Pod Security Standards,Secrets Management,Etcd,Api Server,Scheduler,Controller Manager,Kubelet,Aws Eks,Gcp Gke,Azure Aks
HQ

Edison Scientific San Francisco, California, USA Office

San Francisco, California, United States, 94107

Similar Jobs

7 Hours Ago
Hybrid
Concord, CA, USA
104K-168K Annually
Senior level
104K-168K Annually
Senior level
Fintech • Financial Services
Lead and support global production infrastructure for banking applications: design, implement, and test complex systems; drive cloud security, secrets management, automation and application modernization; troubleshoot production issues using monitoring tools; lead migrations and CI/CD implementations; ensure operational resilience and collaborate with stakeholders to reduce recovery time and prevent recurring problems.
Top Skills: Linux,Apache,Apache Tomcat,F5 Ltm,Avi,Appdynamics,Splunk,Grafana,Glassbox,Dynatrace,Jenkins,Udeploy,Ansible,Salt,Python,Powershell,Pivotal Cloud Foundry,Cloud Platforms,Kafka,Redis,Mongodb,Cassandra,Thousandeyes,Synthetic Monitoring,Kubernetes,Openshift,Web Services,Perl,Shell,Jira
3 Days Ago
Easy Apply
In-Office
Long Beach, CA, USA
Easy Apply
128K-176K Annually
Senior level
128K-176K Annually
Senior level
Aerospace • Hardware • Robotics • Software • Manufacturing
Lead enterprise storage architecture for telemetry, manufacturing, and test workloads. Design and implement file, block, and object storage across datacenters and cloud. Integrate storage with virtualization, compute, and data-driven applications; evaluate emerging storage technologies; ensure high availability, backup, and disaster recovery for mission-critical systems.
Top Skills: San,Nas,Object Storage,Iscsi,Fibre Channel,Nvme-Of,Vmware,Kvm,Proxmox,Hyper-V,Azure,Gcp,Aws,Ceph,Minio,Red Hat Odf,Csi,Rook,Openebs,Portworx,Software-Defined Storage,Single-Namespace File Systems
4 Days Ago
In-Office or Remote
Costa Mesa, CA, USA
191K-253K Annually
Senior level
191K-253K Annually
Senior level
Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
Lead integration of Anduril's Lattice OS and maritime platforms into IL5/IL6 and sovereign cloud environments. Design secure edge-to-cloud architectures, integrate UUVs and tactical networks, support multinational field tests, produce integration documentation, and liaise with allied stakeholders to ensure interoperable mission systems.
Top Skills: Lattice Os,Devops,Devsecops,Encrypted Mesh Networking,Edge Compute,Distributed Systems,Tactical Networking,Classified Networks

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account