Mach9 Logo

Mach9

ML Infrastructure Engineer

Reposted Yesterday
In-Office
San Francisco, CA, USA
160K-200K Annually
Mid level
In-Office
San Francisco, CA, USA
160K-200K Annually
Mid level
The ML Infrastructure Engineer at Mach9 will design and maintain CI/CD pipelines for ML workflows, optimize real-time inference services, and build data management systems, collaborating closely with ML researchers.
The summary above was generated by AI
The role

At Mach9, ML infrastructure engineers build and maintain the systems that power production AI models for civil engineering and surveying. Our ML pipeline spans 10,000+ miles of labeled survey data, image segmentation networks, and 3D prediction models serving real-time inference to surveyors and engineers in the field.

This role is ideal for mid-career ML infrastructure engineers with experience building for both training and inference.

You'll build training pipelines that handle deep transformer models on hundreds of terabytes of 3D point cloud and image data. You'll also architect our inference infrastructure, delivering both heavy offline detection algorithms and real-time responsive inference that integrates directly with our CAD software.

Responsibilities
  • Design and build a centralized system for versioning training data, generated datasets, and model artifacts, with full lineage tracking from raw source data through to trained model outputs.

  • Develop and maintain reliable, reproducible ML training and data generation pipelines.

  • Refactor and harden existing training and data generation scripts into composable, testable, and maintainable components.

  • Create CI/CD workflows for validating data pipelines and model training runs, including automated correctness checks and regression detection.

  • Build tooling that enables ML engineers to launch, monitor, and debug training jobs with minimal friction.

  • Optimize and scale real-time model inference services to meet latency and throughput requirements in production, including profiling, batching strategies, and resource-efficient serving.

  • Own the deployment path from trained model artifact to production endpoint, ensuring reliable rollouts, rollback, and monitoring.

Requirements
  • 3+ years of work experience in relevant fields.

  • Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.

  • Strong communication skills and the ability to work closely with ML researchers and engineers to understand their workflows and translate them into robust systems.

  • Experience designing and building data versioning, artifact management, or dataset lineage systems (e.g., DVC, LakeFS, Weights & Biases, or custom solutions).

  • Hands-on experience with ML pipeline orchestration tools (e.g., Airflow, Prefect, Metaflow, or similar).

  • Experience with model serving and inference optimization — profiling latency, reducing memory footprint, or scaling serving infrastructure to meet real-time constraints.

  • Ability to read and refactor ML training code — you don't need to design model architectures, but you need to understand what training pipelines are doing well enough to make them reliable.

  • Proficient with Python, PyTorch.

Bonus qualifications
  • Familiarity with AWS infrastructure services.

  • Experience with containerized ML workflows and GPU-accelerated training environments.

  • Experience with model optimization techniques (e.g., quantization, TensorRT, ONNX Runtime, distillation).

  • Knowledge of infrastructure-as-code tools (e.g., AWS CDK, Terraform).

  • Experience building or operating ML systems that handle large unstructured datasets (imagery, 3D data, sensor data).

HQ

Mach9 San Francisco, California, USA Office

San Francisco, CA, United States

Similar Jobs

10 Days Ago
Remote or Hybrid
2 Locations
185K-335K Annually
Senior level
185K-335K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Lead design and development of scalable, high-performance ML training infrastructure. Drive distributed training performance optimization, observability, and developer experience. Own cross-functional infrastructure initiatives, set technical direction and standards, and mentor engineers to deliver platform capabilities that support large-scale model training.
Top Skills: AWSAzureDistributed TrainingFsdpGCPGpu ComputingPipeline ParallelismPythonPytorch 2.XTensorFlow
4 Days Ago
Hybrid
Palo Alto, CA, USA
133K-235K Annually
Junior
133K-235K Annually
Junior
Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
The Software Engineer will optimize ML infrastructure for training and inference, develop scalable systems, and work closely with ML engineers on producing high-performance models.
Top Skills: C++Caffe2FlinkJavaPythonPyTorchRayScalaScikit-LearnSparkSpark MlTensorFlow
10 Hours Ago
In-Office
San Francisco, CA, USA
190K-300K Annually
Mid level
190K-300K Annually
Mid level
eCommerce • Mobile • Retail
The role involves developing ML systems, designing low-latency infrastructure, and collaborating on AI/ML initiatives while ensuring reliability and performance at scale.
Top Skills: Apache KafkaAws Ec2Aws EcsAws EksAws KinesisAws LambdaAws S3Aws SagemakerDatadogDynamoDBElasticsearchFlinkGrafanaPostgresPythonRedis

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account