Sesame (sesame.com) Logo

Sesame (sesame.com)

Data Engineer, Machine Learning

Reposted 11 Days Ago
In-Office
San Francisco, CA, USA
170K-240K Annually
Senior level
In-Office
San Francisco, CA, USA
170K-240K Annually
Senior level
Build and maintain production data pipelines and infrastructure to prepare multimodal conversational, voice, sensor, and telemetry data for ML training and evaluation. Implement dataset versioning, lineage, quality monitoring, and tooling to enable reproducible, discoverable datasets and cost/performance-optimized large-scale processing while enforcing data governance and privacy.
The summary above was generated by AI

About Sesame

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice agents part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

About the Role

We're looking for a Data Engineer to build and maintain the data pipelines that feed Sesame's AI models. You'll collaborate directly with machine learning engineers and researchers — your job is to make sure they have the right data, in the right shape, at the right time to train, evaluate, and ship models.

Sesame's data is rich and complex: conversations, voice, sensor signals, and product telemetry. You'll design the systems that take raw, unstructured, multimodal data and turn it into clean, versioned, well-documented datasets that ML teams can trust and build on confidently.

This is a deeply technical, infrastructure-focused role — closer to ML engineering than traditional data analytics. You'll be deeply embedded with ML teams, understanding their workflows and building infrastructure that accelerates the full model development lifecycle — from data collection and labeling through training and evaluation.

Responsibilities
  • Design and build production data pipelines that prepare conversational, voice, and multimodal data for model training and evaluation.

  • Partner directly with ML engineers to understand data requirements for new models and experiments, and deliver datasets that meet those needs.

  • Build and maintain infrastructure for dataset versioning, lineage tracking, and reproducibility — so any training run can be traced back to its exact data.

  • Develop data quality frameworks that catch issues before they become model quality issues: schema validation, drift detection, and coverage monitoring.

  • Optimise large-scale data processing for cost and performance across Sesame's cloud infrastructure.

  • Build tooling that makes it easy for ML engineers and researchers to discover, explore, and request data independently.

  • Define and enforce data governance and privacy standards, particularly around sensitive conversational and voice data.

  • Contribute to architecture decisions around Sesame's broader data platform as the team and data volume grow.

Required Qualifications:
  • 5+ years in data engineering, with meaningful experience supporting ML or AI teams specifically.

  • Strong SQL and Python skills — you'll use both daily.

  • Experience building and operating ETL/ELT pipelines at scale using modern data platforms and tooling.

  • Experience with workflow orchestration systems such as Airflow, Dagster, or Prefect.

  • Hands-on experience with ML data workflows: training data pipelines, dataset versioning, data labeling pipelines, or model evaluation data.

  • A solid understanding of how ML teams work — you don't need to train models; what matters is understanding what makes a good training dataset and why data quality directly affects model performance.

  • Comfort working with unstructured and semi-structured data — audio, text, JSON logs — not just clean relational tables.

  • Strong communication skills. You'll be embedded with ML engineers and need to bridge data systems and model requirements effectively.

Preferred Qualifications:
  • Vector databases, embedding storage, or feature stores.

  • Data from hardware or embedded systems: telemetry, sensors, real-time streams.

  • Distributed compute frameworks for large-scale data processing such as Ray or Spark.

  • Kubernetes and managed Kubernetes environments such as GKE or EKS.

  • Data privacy frameworks, especially around voice or conversational data.

  • Building internal tooling or self-serve data platforms.

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities. Contact [email protected] for assistance.

Full-time Employee Benefits: 

  • 401 (k) max employer match: 3.5% of compensation

  • 100% employer-paid health, vision, and dental benefits for you and your dependents

  • Unlimited PTO and sick time

  • Flexible spending account with employer matching up to $1,650/year (medical FSA)

  • Guardian Employee Assistance Program (EAP)

  • Opportunity to share in the company's success with competitive stock options

Benefits do not apply to contingent/contract workers.

Similar Jobs

14 Days Ago
In-Office or Remote
7 Locations
277K-415K Annually
Expert/Leader
277K-415K Annually
Expert/Leader
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Design, build, and operate production ML systems that generate trusted signals for ranking, retrieval, recommendations, propensity/churn/LTV, and next-best-action decisioning. Define signal/data contracts, own feature and candidate generation through serving, experimentation, monitoring, and feedback loops, and evaluate long-term business impact, trust, fairness, and compliance. Partner across product, data, modeling, risk, and compliance and apply AI/agents to accelerate engineering and operations.
Top Skills: Agent-Assisted Operations ToolingBatch PipelinesCloud InfrastructureCoding AgentsData WarehousesEmbeddingsEvaluation HarnessesEvent StreamsExperimentation SystemsFeature StoresJavaKotlinKubernetesLakehousesLightgbmModel-Serving InfrastructureObservability ToolingPythonPyTorchRanking/Retrieval SystemsRecommendation FrameworksSemantic SearchSQLTensorFlowWorkflow OrchestrationXgboost
3 Days Ago
Remote or Hybrid
7 Locations
277K-415K Annually
Expert/Leader
277K-415K Annually
Expert/Leader
Blockchain • Fintech • Mobile • Payments • Software • Financial Services
Design, build, and operate production ML signal systems—ranking, retrieval, recommendations, propensity, and next-best-action—covering feature/candidate generation, serving, experimentation, monitoring, and feedback. Define signal contracts (freshness, provenance, confidence), evaluate long-term impact (trust, fairness, compliance), and partner across product, data, and risk teams to deliver reusable customer-intelligence capabilities.
Top Skills: Batch PipelinesCloud InfrastructureCoding AgentsData WarehousesEmbeddingsEvent StreamsExperimentation SystemsFeature StoresJavaKotlinKubernetesLakehousesLightgbmModel-Serving InfrastructureObservability ToolingPythonPyTorchRanking/Retrieval SystemsRecommendation FrameworksSemantic SearchSQLTensorFlowWorkflow OrchestrationXgboost
5 Days Ago
In-Office
Dublin, CA, USA
Mid level
Mid level
Artificial Intelligence • Software
Design, build, and own end-to-end data acquisition and large-scale processing pipelines. Implement ML models to improve data quality, lead data ingestion/crawling projects, deploy scalable distributed systems, and maintain backend storage and indexing in Kubernetes environments.
Top Skills: DatasketchDockerGitHadoopInfrastructure-As-CodeKey-Value DatabasesKubernetesMultiprocessingPythonPyTorchRayScrapySeleniumVpns

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account