Top Infrastructure Engineer Jobs in San Francisco, CA

Reposted 2 Days AgoSaved
In-Office
San Francisco Bay Area, CA
405K-485K Annually
Senior level
405K-485K Annually
Senior level
Artificial Intelligence • Natural Language Processing • Generative AI
As a Staff Infrastructure Engineer, you'll define and drive strategies for cloud-based compute clusters, ensuring secure, reliable, and scalable infrastructure. Responsibilities include lifecycle management, collaborating with cross-functional teams, and mentoring engineers.
Top Skills: AWSAzureGCPGoKubernetesPythonRustTerraform
Reposted 2 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
295K-380K Annually
Entry level
295K-380K Annually
Entry level
Artificial Intelligence • Machine Learning • Generative AI
The role focuses on building and maintaining infrastructure for ML training. Responsibilities include API design, improving performance, and debugging across systems.
Top Skills: Distributed SystemsGpusNetworkingPythonPyTorchStorage
Reposted 2 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Internship
Internship
Software
The intern will join the Design Verification Infrastructure team to develop and maintain the Verification Platform, enhancing design verification technologies through collaboration and software development in Scala and Python.
Top Skills: ChiselCirctEda ToolsPythonScala
Reposted 8 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
260K-330K Annually
Senior level
260K-330K Annually
Senior level
Artificial Intelligence • Productivity • Software
As a core engineer on the Web Infrastructure team, you will enhance Notion's web client performance and development speed by improving load times, interaction latency, and providing tooling for product engineers.
Top Skills: ReactWebpack
Reposted 3 Days AgoSaved
In-Office
San Francisco Bay Area, CA
250K-290K Annually
Senior level
250K-290K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Payments • Software • Financial Services
The Lead Voice Infrastructure Engineer will design and operate telephony services, improve core workflows, and enhance reliability within AI-driven communication systems.
Top Skills: CC++GoPstnPythonRtpSipWebrtc
Reposted 3 Days AgoSaved
In-Office
San Francisco Bay Area, CA
200K-350K Annually
Senior level
200K-350K Annually
Senior level
Artificial Intelligence
The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.
Top Skills: AWSAzureCloudflareGCPKubernetesTerraform
9 Days AgoSaved
In-Office
San Francisco Bay Area, CA
188K-275K Annually
Senior level
188K-275K Annually
Senior level
Cloud • Information Technology • Machine Learning
Lead end-to-end technical delivery of large-scale bare-metal GPU clusters for strategic customers: facility/rack design, GPU cluster bring-up, InfiniBand/RoCE fabric validation, HPC benchmarking and remediation, operational models for BMaaS, and cross-team product feedback. Act as primary technical customer contact, run proofs-of-concept, collaborate with engineering teams, and support security-sensitive, production-ready supercomputers.
Top Skills: AnsibleBare Metal As A Service (Bmaas)BashBiosBmcFirmwareGb200Gpu ClustersHigh-Speed FabricHpcIb_Write_BwInfinibandKubernetesLinuxNcclNvidia HgxNvlinkPxe BootPythonRoceSlurmTcp/Ip
Reposted 22 Days AgoSaved
Remote
San Francisco Bay Area, CA
145K-200K Annually
Senior level
145K-200K Annually
Senior level
Fintech • Information Technology • Software
The Senior Infrastructure Engineer will improve system reliability and efficiency, develop standards and tooling, and collaborate with engineering teams to optimize workflows and cloud infrastructure.
Top Skills: Aurora PostgresqlAWSCicdDatadogDockerKubernetesLinuxOpensearchPrometheusPythonSumologicTerraform
4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
100K-150K Annually
Senior level
100K-150K Annually
Senior level
Artificial Intelligence • Information Technology • Software • Consulting
Design, build, and operate scalable GPU/accelerator infrastructure for large-scale training and inference. Implement scheduling, storage, networking (RDMA/InfiniBand/NCCL), observability, fault tolerance, security, and developer tooling. Partner with ML teams for capacity planning, cost optimization, automation, and operational runbooks.
Top Skills: C++Ci/CdDeepspeedFsdpGoGpuInfinibandJaxKubernetesLinuxMegatron-LmNcclPythonPyTorchRayRay TrainRdmaSlurm
4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
100K-150K Annually
Senior level
100K-150K Annually
Senior level
Artificial Intelligence • Information Technology • Software • Consulting
Design, build, and operate petabyte-scale data pipelines and storage for AI training and evaluation. Implement ingestion, cleaning, versioning, lineage, high-throughput loaders, labeling/active-learning workflows, privacy controls, observability, and cost/performance optimizations while collaborating with ML researchers.
Top Skills: Apache BeamSparkCi/CdGpusJavaKotlinPythonRayScala
4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
350K-475K Annually
Mid level
350K-475K Annually
Mid level
Artificial Intelligence • Information Technology
Build, operate, and maintain research infrastructure (evaluation frameworks, RL training systems, experiment tracking, visualization). Develop scalable distributed pipelines, ensure reproducibility and observability, and partner with researchers and infrastructure teams to accelerate ML research and tooling adoption.
Top Skills: JaxPythonPyTorchRayRustSpark
Reposted 9 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Security Software Engineer will design and implement security controls for MongoDB Atlas, collaborating across engineering teams and ensuring adherence to high security standards.
Top Skills: ApparmorC/C++CgroupsEbpfGoGrafanaJavaKubernetesPythonRustSeccompSelinuxSplunkTerraformVictoria Metrics
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 9 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Reposted 2 Hours AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
200K-358K Annually
Expert/Leader
200K-358K Annually
Expert/Leader
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
The Staff ML Engineer will design and operate Samsara's ML platform, collaborating with teams to enhance ML features and improve safety outcomes. Responsibilities include overseeing system reliability, leading technical direction, and mentoring engineers.
Top Skills: AWSCloud InfrastructureKubernetesMachine LearningRaySpark
4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
210K-300K Annually
Senior level
210K-300K Annually
Senior level
Agency • Professional Services • Consulting
Architect and build backend infrastructure for an AI-driven enterprise email platform. Design scalable distributed systems ingesting millions of messages/day, implement secure, compliance-ready and on-prem-capable systems, and serve as a hands-on senior engineer driving a zero-to-one product build.
Top Skills: Backend InfrastructureDistributed SystemsEncryptionEnterprise EmailOn-Prem DeploymentPythonTypescript
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
200K-280K Annually
Senior level
200K-280K Annually
Senior level
Software
As a Senior Backend Infrastructure Engineer, you'll develop systems that improve reliability and productivity for AI agent infrastructure, including deployment, observability, and data management.
Top Skills: AWSCi/CdClickhouseDjangoDockerFastapiKubernetesModalNode.jsPostgresPythonRedisTerraformTypescript
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
350K-500K Annually
Senior level
350K-500K Annually
Senior level
Software
As a Staff Frontend Infrastructure Engineer, you'll build tools and systems for frontend engineers, focusing on design systems, developer tooling, and code quality infrastructure.
Top Skills: EslintNext.JsPrettierReactTypescriptVitest
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
320K-405K Annually
Senior level
320K-405K Annually
Senior level
Artificial Intelligence • Natural Language Processing • Generative AI
The ML Infrastructure Engineer will develop and scale AI safety systems infrastructure, optimize machine learning pipelines, and ensure reliable system performance while collaborating with research teams.
Top Skills: AirflowAWSGCPJaxKubernetesPythonPyTorchSparkTensorFlow
Reposted 4 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
147K-221K Annually
Senior level
147K-221K Annually
Senior level
Legal Tech • Professional Services
The Principal Cloud Infrastructure Engineer will design, analyze, and maintain cloud infrastructure, ensuring compliance with security standards and integrating application development with IT infrastructure.
Top Skills: Azure DevopsBicepFunction AppsGraph ApiKubernetesLogic AppsAzurePower PlatformTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Mid level
Mid level
Information Technology
As a ML Platform & Infrastructure Engineer, you'll design CI/CD pipelines for ML workflows, build evaluation infrastructure, and develop SDKs and tools to enhance experimentation. You'll track and visualize model performance while optimizing resources.
Top Skills: AWSDockerGCPKubernetesPython
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Healthtech • Information Technology • Professional Services • Consulting
The Senior Cloud Infrastructure Engineer will design, deploy, and manage AWS infrastructure focusing on enterprise networking, security, and reliability while supporting cloud migrations in a HIPAA-regulated environment.
Top Skills: AWSCiscoCloudFormationHipaaPalo AltoSd-WanTerraformVMware
Reposted 4 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
165K-200K Annually
Mid level
165K-200K Annually
Mid level
Artificial Intelligence • Computer Vision • Software
As an Infrastructure Engineer, you will secure, scale, and maintain core infrastructure, collaborate across teams, and optimize machine learning workflows.
Top Skills: AWSBash ScriptingGCPGithub ActionsHelmKubernetesNode.jsPythonPyTorchSpaceliftTensorFlowTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
250K-270K Annually
Senior level
250K-270K Annually
Senior level
Hardware • Software
The Staff Infrastructure Engineer will shape technical direction, support critical systems, and enhance automation across infrastructure, ensuring reliability and best practices.
Top Skills: AWSGCPLinuxTerraform
YesterdaySaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
244K-287K Annually
Expert/Leader
244K-287K Annually
Expert/Leader
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Lead product vision and multi-year strategy for developer infrastructure across the code lifecycle. Own roadmap for CI/CD, release automation, testing, deployments, and production readiness; drive migrations to simplify systems, measure quality with scorecards, partner with Engineering/SRE/Security, and integrate emerging (AI) capabilities to improve developer velocity and reliability.
Top Skills: Ai-Powered TestingBuild SystemsCi/CdDeployment PipelinesDora MetricsGenerative AiRelease AutomationSecuritySreTesting Infrastructure
5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Mid level
Mid level
Computer Vision • Gaming • Sports • Esports
Lead bring-up, administration, and operations of a large GPU/AI training cluster. Serve as bridge between researchers and hardware, ensuring SLURM jobs, parallel filesystems, networking, and monitoring operate reliably. Work across provisioning, storage, VPN/access, and traditional Linux sysadmin tasks; assist with physical racking and on-site datacenter needs. Collaborate closely with a small research team in Tokyo or San Francisco.
Top Skills: AnsibleCephGpuGrafanaHpcK8SKubernetesLdapLinuxMaasNvidia HgxParallel File SystemsPrometheusSlinkySlurmTailscaleVastWarewulfWeka
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account