Bobyard

MLOps Engineer

Posted 20 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA

140K-175K Annually

Senior level

In-Office

San Francisco, CA

140K-175K Annually

Senior level

Design, build, and maintain production ML deployment, serving, and inference pipelines. Implement infrastructure-as-code (Terraform), CI/CD, GPU provisioning and cost optimization. Build monitoring/observability, collaborate with ML and full-stack teams, and occasionally contribute to React/Django product work.

The summary above was generated by AI

Position Overview

Bobyard builds AI systems that automate takeoffs for contractors, saving them dozens of hours per project. Delivering this reliably at scale requires production-grade ML infrastructure, deployment systems, and cloud architecture that do not break under real customer usage.

You will have very high autonomy in designing, executing, and iterating on our infrastructure. We are a startup, and we move fast. You will be the person responsible for turning research models into reliable production systems and building the foundation that allows engineering to ship quickly and safely. We look for world-class engineers who think in systems, take ownership of reliability and cost, and can go heads down to build durable infrastructure.

Responsibilities

Design and maintain ML deployment and model serving infrastructure
Build end-to-end pipelines for model packaging, inference, monitoring, and scaling
Implement infrastructure-as-code across all cloud resources (Terraform target state)
Own CI/CD pipelines, release processes, and deployment automation
Manage GPU provisioning, utilization, and cloud cost optimization
Build monitoring, alerting, and observability across services
Work closely with ML and fullstack engineering to ship production systems
Contribute to product development (React + Django) when infrastructure priorities allow

Desired Attributes

Strong PyTorch knowledge with understanding of speed and memory bottlenecks and inference optimization
Comfortable managing GPU services (AWS, GCP,...), model containers, versioning and scaling
Experience owning infrastructure at a small team or startup
Cloud-native and pragmatic — chooses simple, reliable solutions
High ownership mindset — you don’t wait to be told what to fix
Cost-aware and disciplined about cloud spend
Full-stack capable — can ship features in React or Django when needed
Fast learner who can navigate unfamiliar systems and tools quickly
Passion for building foundational systems that enable product velocity

This is a full-time & in-person role in the SF Bay Area. Learning rate and ownership are vital factors. If you can build the infrastructure that our models and customers depend on — at the speed and quality the market demands (or if you can prove that you will acquire the ability to do so fast enough), we would love to work with you.

Top Skills

AWS

Ci/Cd

Containers

Django

GCP

Gpu Provisioning

Model Serving

Monitoring

Observability

PyTorch

React

Terraform

1800 Owens St, San Francisco, California, United States, 94158

Similar Jobs

Profluent

ML Platform / MLOps Engineer

Yesterday

Easy Apply

In-Office

Easy Apply

180K-250K Annually

Mid level

180K-250K Annually

Mid level

Artificial Intelligence • Healthtech • Biotech • Pharmaceutical

The ML Platform/MLOps Engineer will build and maintain infrastructure for large-scale ML systems, ensuring efficiency, reliability, and security while supporting the research team from experiments to production models.

Top Skills: Ci/CdCloud InfrastructureDockerGCPKubernetesMl PipelinesMlflowMlopsPyTorch

NVIDIA

Senior AI-HPC Cluster Engineer - MLOps

23 Days Ago

In-Office or Remote

184K-357K Annually

Senior level

184K-357K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Lead design, deployment, and operation of large-scale GPU-accelerated HPC clusters. Build scalable automation and tooling, support researchers, analyze and optimize performance, perform root-cause investigations, and collaborate cross-functionally to maintain compute, networking, and storage systems.

Top Skills: Slurm,Kubernetes,Lsf,Mpi,Nccl,Centos,Rhel,Ubuntu,Enroot,Docker,Podman,Python,Bash,Go,Golang,Rust,C,C++,Nvidia Gpus,Cuda,Mlperf,Pytorch,Megatronlm,Tensorflow,Infiniband,Rdma,Roce,Amazon Efa,Lustre,Gpfs,Prometheus,Opensearch,Grafana

NVIDIA

Senior MLOps Engineer, GenAI Framework

15 Days Ago

In-Office

Santa Clara, CA, USA

152K-242K Annually

Senior level

152K-242K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Build and maintain CI/CD pipelines and release processes for Megatron-LM and NeMo; implement scalable DevOps solutions, manage clusters and servers, automate regression detection, and collaborate with DL framework and infrastructure teams to optimize performance and quality.

Top Skills: AnsibleArtifactoryBashCmakeCublasCudaCudnnDockerGitGithub ActionsGitlabJenkinsJIRAKubernetesLinuxLustreMakeMegatron-LmNemo FrameworkPerforcePythonPyTorchShell ScriptingSlurm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Bobyard

MLOps Engineer

Top Skills

Bobyard San Francisco, California, USA Office

Similar Jobs

ML Platform / MLOps Engineer

Senior AI-HPC Cluster Engineer - MLOps

Senior MLOps Engineer, GenAI Framework

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech