TinyFish Logo

TinyFish

MLOps Engineer

Posted Yesterday
Remote or Hybrid
2 Locations
Senior level
Remote or Hybrid
2 Locations
Senior level
Build and maintain reproducible data pipelines, experiment orchestration, CI/CD for models, Terraform-based ML infrastructure, observability, security controls, and automation to deploy and operate ML systems in production.
The summary above was generated by AI
Position Overview

As the first dedicated ML Ops Engineer, you’ll own the tooling and infrastructure that make our ml engineers wildly productive and ensure we are able to efficiently iterate on ML models, prompts, and datasets and deploy our AI systems into a predictable production environment. You’ll bridge the gap between research and DevOps—designing reproducible dataset pipelines, automated experiment workflows, and Terraform-based cloud deployments that scale.

Key Responsibilities

Dataset Management

• Design version-controlled data pipelines (feature stores, data registries) using tools such as Delta Lake, Apache Iceberg
• Implement systems for data validation, lineage tracking, and automated quality checks (e.g., Great Expectations).

Experiment Execution & Tracking

• Build and maintain experiment orchestration with platforms like MLflow, torchx, and Apache Airflow.
• Provide templated systems and tools to ML Engineers that easily launch training/evaluation data processing systems
• Automate hyper-parameter sweeps and A/B tests, exposing clear dashboards for results.

CI/CD

Models/Agents

• workflows that package, test, and promote models and agents through staging to production.
• Implement canary deployments and rollbacks for models/agents services

Terraform Infrastructure-as-Code

• Author and maintain Terraform modules for all ML infra—networking, GPU/TPU clusters, object storage, secrets, monitoring.
• Enforce best practices for state management, workspaces, and automated plan/apply stages via CI.

Observability & Reliability

• Integrate logging, tracing, and metric collection (Prometheus, Grafana, Datadog) across data pipelines and model endpoints.
• Set SLIs/SLOs for data freshness and model latency; implement alerts and runbooks.

Security & Compliance• Work with Security to implement IAM least-privilege, key rotation, and data-encryption policies.
• Support audit requirements (SOC 2, GDPR, HIPAA where applicable).

Minimum Qualifications
  • 5+ years combined experience in DevOps, Data Engineering, or ML Ops roles.

  • Strong Terraform skills; ability to craft reusable modules and navigate complex state.

  • Production experience with at least one cloud provider (AWS, GCP, or Azure).

  • Proficiency in Python and containerization (Docker); familiarity with Kubernetes or serverless batch systems.

  • Hands-on knowledge of ML experiment platforms (MLflow, Kubeflow, Weights & Biases, or similar).

  • Experience with workflow execution frameworks (Kubeflow, Apache Airflow)

  • Understanding of modern data-versioning/feature-store concepts and tools.

  • Solid grasp of CI/CD principles, Git workflows, and infrastructure testing.

  • Excellent communication skills—capable of partnering with Data Scientists, Software Engineers, and Security teams.

Preferred (Nice-to-Have)
  • Experience with GPU orchestration (NVIDIA DGX, Karpenter, or Ray).

  • Familiarity with IaC security scanning (Checkov, tfsec).

  • Exposure to policy-as-code (OPA/Gatekeeper).

  • Prior work in real-time streaming (Kafka, Flink) and online feature serving.

  • Contributions to open-source ML Ops projects.

Reporting Structure

Reports to: Director of Infra

HQ

TinyFish Palo Alto, California, USA Office

Palo Alto, CA, United States

Similar Jobs

5 Hours Ago
Remote or Hybrid
220K-260K Annually
Senior level
220K-260K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Build and scale infrastructure for large 2D/3D media datasets; design CI/CD and continuous training pipelines for multimodal models; implement data versioning and storage for reproducibility and high-throughput access; deploy monitoring and observability systems; coordinate cross-functionally with ML, annotation engineers, and TPMs.
Top Skills: AirflowConfluenceDockerGitGit ServerJIRAKubeflowKubernetesPrefectPythonSlackUnix Shell
Yesterday
In-Office or Remote
Mid level
Mid level
Artificial Intelligence • Software • Consulting • Cybersecurity • App development • Generative AI • SEO
Join a talent community connecting MLOps engineers with future roles. Candidates should have experience deploying, monitoring, and automating ML models, building CI/CD pipelines, using containerization and IaaC, operationalizing ML on platforms like MLflow/Kubeflow/SageMaker, strong Python and ML framework skills, and cloud experience (AWS/GCP/Azure).
Top Skills: AWSAzureDockerGCPGitlab Ci/CdJenkinsKubeflowKubernetesMlflowPythonPyTorchSagemakerScikit-LearnTensorFlowTerraformVertex Ai
Yesterday
Remote or Hybrid
United States
Senior level
Senior level
Information Technology • Database • Consulting
Design, build, and operate end-to-end ML pipelines including data ingestion, feature engineering, training, deployment, and monitoring. Deploy and scale models on AWS or GCP, implement CI/CD, containerization, orchestration, model lifecycle management, observability, and mentor junior engineers to productionize personalization, recommendation, and NLP solutions.
Top Skills: Apache AirflowAws EksAws LambdaAws SagemakerAws Step FunctionsCloudFormationDockerFeastGcp Cloud FunctionsGcp Vertex AiGithub ActionsGkeGrafanaJenkinsKfservingKubernetesMlflowPrometheusPythonPyTorchRay ServeScikit-LearnSeldonSparkSQLTensorFlowTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account