Bobyard Logo

Bobyard

MLOps Engineer

Posted 20 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA
140K-175K Annually
Senior level
In-Office
San Francisco, CA
140K-175K Annually
Senior level
Design, build, and maintain production ML deployment, serving, and inference pipelines. Implement infrastructure-as-code (Terraform), CI/CD, GPU provisioning and cost optimization. Build monitoring/observability, collaborate with ML and full-stack teams, and occasionally contribute to React/Django product work.
The summary above was generated by AI
Position Overview

Bobyard builds AI systems that automate takeoffs for contractors, saving them dozens of hours per project. Delivering this reliably at scale requires production-grade ML infrastructure, deployment systems, and cloud architecture that do not break under real customer usage.

You will have very high autonomy in designing, executing, and iterating on our infrastructure. We are a startup, and we move fast. You will be the person responsible for turning research models into reliable production systems and building the foundation that allows engineering to ship quickly and safely. We look for world-class engineers who think in systems, take ownership of reliability and cost, and can go heads down to build durable infrastructure.

Responsibilities
  • Design and maintain ML deployment and model serving infrastructure

  • Build end-to-end pipelines for model packaging, inference, monitoring, and scaling

  • Implement infrastructure-as-code across all cloud resources (Terraform target state)

  • Own CI/CD pipelines, release processes, and deployment automation

  • Manage GPU provisioning, utilization, and cloud cost optimization

  • Build monitoring, alerting, and observability across services

  • Work closely with ML and fullstack engineering to ship production systems

  • Contribute to product development (React + Django) when infrastructure priorities allow

Desired Attributes
  • Strong PyTorch knowledge with understanding of speed and memory bottlenecks and inference optimization

  • Comfortable managing GPU services (AWS, GCP,...), model containers, versioning and scaling

  • Experience owning infrastructure at a small team or startup

  • Cloud-native and pragmatic — chooses simple, reliable solutions

  • High ownership mindset — you don’t wait to be told what to fix

  • Cost-aware and disciplined about cloud spend

  • Full-stack capable — can ship features in React or Django when needed

  • Fast learner who can navigate unfamiliar systems and tools quickly

  • Passion for building foundational systems that enable product velocity

This is a full-time & in-person role in the SF Bay Area. Learning rate and ownership are vital factors. If you can build the infrastructure that our models and customers depend on — at the speed and quality the market demands (or if you can prove that you will acquire the ability to do so fast enough), we would love to work with you.

Top Skills

AWS
Ci/Cd
Containers
Django
GCP
Gpu Provisioning
Model Serving
Monitoring
Observability
PyTorch
React
Terraform
HQ

Bobyard San Francisco, California, USA Office

1800 Owens St, San Francisco, California, United States, 94158

Similar Jobs

Yesterday
Easy Apply
In-Office
2 Locations
Easy Apply
180K-250K Annually
Mid level
180K-250K Annually
Mid level
Artificial Intelligence • Healthtech • Biotech • Pharmaceutical
The ML Platform/MLOps Engineer will build and maintain infrastructure for large-scale ML systems, ensuring efficiency, reliability, and security while supporting the research team from experiments to production models.
Top Skills: Ci/CdCloud InfrastructureDockerGCPKubernetesMl PipelinesMlflowMlopsPyTorch
23 Days Ago
In-Office or Remote
3 Locations
184K-357K Annually
Senior level
184K-357K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Lead design, deployment, and operation of large-scale GPU-accelerated HPC clusters. Build scalable automation and tooling, support researchers, analyze and optimize performance, perform root-cause investigations, and collaborate cross-functionally to maintain compute, networking, and storage systems.
Top Skills: Slurm,Kubernetes,Lsf,Mpi,Nccl,Centos,Rhel,Ubuntu,Enroot,Docker,Podman,Python,Bash,Go,Golang,Rust,C,C++,Nvidia Gpus,Cuda,Mlperf,Pytorch,Megatronlm,Tensorflow,Infiniband,Rdma,Roce,Amazon Efa,Lustre,Gpfs,Prometheus,Opensearch,Grafana
15 Days Ago
In-Office
Santa Clara, CA, USA
152K-242K Annually
Senior level
152K-242K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Build and maintain CI/CD pipelines and release processes for Megatron-LM and NeMo; implement scalable DevOps solutions, manage clusters and servers, automate regression detection, and collaborate with DL framework and infrastructure teams to optimize performance and quality.
Top Skills: AnsibleArtifactoryBashCmakeCublasCudaCudnnDockerGitGithub ActionsGitlabJenkinsJIRAKubernetesLinuxLustreMakeMegatron-LmNemo FrameworkPerforcePythonPyTorchShell ScriptingSlurm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account