ECHO AI Jobs

ML Infrastructure Engineer

ECHO AI

ML Infrastructure Engineer

Reposted 23 Hours Ago

Hybrid

San Francisco, CA, USA

180K-230K Annually

Senior level

Hybrid

San Francisco, CA, USA

180K-230K Annually

Senior level

Design, build, and optimize infrastructure for data and modeling in ML ecosystems, enabling experimentation and development of advanced neural models.

The summary above was generated by AI

Company Overview

Echo Neurotechnologies is an exciting new startup in the Brain-Computer Interface (BCI) space, driving innovation through advanced hardware engineering and AI solutions. Our mission is to deliver cutting-edge technologies that restore autonomy to people living with disabilities and improve their quality of life.

Team Culture

Join a small, dedicated team of knowledgeable and motivated professionals. Our early-stage environment offers the opportunity to take ownership of broad decisions with significant and long-lasting impact. We emphasize continuous learning and growth, fostering cross-functional collaboration where your contributions are vital to our success.

Job Summary

We are seeking a Senior Machine Learning Infrastructure Engineer to join our team. The person who fills this role will design, build, and scale infrastructure to power massive-scale data, modeling, and analysis platforms, playing a critical role in shaping a high-performance, production-grade ML ecosystem to support rapid experimentation with diverse datasets spanning neural signals, behavior, and more. This person will have significant ownership over the ML R&D platform, working closely with domain experts to architect new cloud infrastructure, data pipelines, and modeling flows. The work will ultimately enable the development of cutting-edge models for neuroscientific discovery and neural decoding, empowering brain-computer interface technology to improve the lives of patients living with severe neurological conditions.

Key Responsibilities

Create flexible and performant ML infrastructure
- Design and build systems ML cloud infrastructure to enable massive-scale modeling and analytics
- Support diverse model exploration, hyperparameter optimization, pretraining, fine-tuning, and evaluation processes
- Design and optimize scalable distributed training pipelines, with support for features such model sharding, cross-GPU communication, and real-time training monitoring
- Create, operate, and maintain robust ML platforms and services across the model lifecycle
- Make informed architecture decisions that balance performance, cost, reliability, and scalability
Build diverse and scalable data platforms
- Design, build, and optimize massive-scale databases and data pipelines for scalable, flexible, and reliable data access
- Explore research-driven, tailored data solutions using existing and simulated data, comparing performance and efficiency across solutions for typical data-access patterns
- Create infrastructure and pipelines for ingesting internal and external datasets with varied shapes, formats, and associated metadata
- Design and assess custom data formats for efficient storage and slicing of high-dimensional time-series data
- Enable efficient data movement, preprocessing, and artifact management for data lineage and modeling reproducibility
Meet company standards for delivered solutions
- Establish best practices for reliability, observability, reproducibility, and operational excellence across the ML ecosystem
- Make informed and collaborative decisions with domain experts across the software & ML teams
- Foster visibility and reproducibility within the company by maintaining extensive documentation of design decisions, evaluations of viable alternatives for selected solutions, pipeline assessments, etc.
- Support ML R&D operations while preparing for eventual incorporation into product pipelines

Required Qualifications

Bachelor's degree in Computer Science, Electrical Engineering, or a related technical discipline
5+ years of industry experience in software engineering, large-scale data infrastructure, or systems ML
Extensive proficiency in Python
Familiarity with PyTorch
Experience designing, building, and maintaining high-throughput data pipelines for large and diverse datasets
Experience working with distributed-training frameworks (e.g. FSDP, DeepSpeed, Megatron-LM, Ray, etc.)
Experience building or optimizing ML training pipelines for transformers or other large neural-network models
Demonstrated ability to partner closely with research and modeling teams to productionize workflows
Excellent communication and collaboration skills to work effectively on cross-functional and interdisciplinary teams
Experience having technical ownership over at least one successfully implemented collaborative project

Preferred Qualifications

Advanced degree (MS or PhD) in Computer Science, Electrical Engineering, or a related technical discipline
Proficiency in C++, Go, CUDA, Rust, and/or Java
Experience in data engineering and systems ML for time-series data
Deep understanding of the fundamentals of distributed systems, including scalability, fault tolerance, monitoring, observability, scheduling, performance tuning, and resource management
Experience with cloud-native environments and orchestration (Kubernetes, Docker, etc.)
Experience scaling foundation-model training infrastructure or multi-cluster computing environments

What We Offer

An opportunity to work on exciting, cutting-edge projects to transform patients’ lives in a highly collaborative work environment.
Competitive compensation, including stock options.
Comprehensive benefits package.
401(k) program with matching contributions.

Equal Opportunity Employer

Echo Neurotechnologies is an Equal Opportunity Employer (EOE). We celebrate diversity and are committed to creating an inclusive environment for all employees.

Confidentiality

All applications will be treated confidentially. Applicants may be asked to sign an NDA after the initial stages of the interview process.

Similar Jobs

Snap Inc.

Principal Software Engineer

13 Days Ago

Hybrid

Palo Alto, CA, USA

235K-414K Annually

Expert/Leader

235K-414K Annually

Expert/Leader

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development

Lead technical strategy, architecture, and implementation for ML inference platform services. Design and scale distributed, high-throughput inference systems, collaborate across teams, drive availability, scalability, operational excellence, cost management, and provide company-wide technical direction and mentorship.

Top Skills: Distributed SystemsGpuKubernetesLlm InferenceMl Inference PlatformPyTorchRpcTensorFlow

General Motors

Infrastructure Engineer

24 Days Ago

Hybrid

Sunnyvale, CA, USA

189K-291K Annually

Senior level

189K-291K Annually

Senior level

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing

Design, build, and deploy scalable ML training and evaluation platforms for autonomous driving. Lead architecture and implementation of distributed, high-performance pipelines, drive cross-team prioritization, mentor engineers, and support recruiting and code quality to accelerate ML model development lifecycle.

Top Skills: BazelBlazeBuckC++Cloud InfrastructureCmakeDistributed TrainingDockerGpu/Cpu ClustersKubernetesMlopsPythonPyTorchTensorFlow

Snap Inc.

Staff Software Engineer

13 Days Ago

Hybrid

Palo Alto, CA, USA

195K-343K Annually

Expert/Leader

195K-343K Annually

Expert/Leader

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development

Design, build, and optimize large-scale ML infrastructure: embedding generation, batch inference, data storage/compute, data management, quality systems, and production deployments with ML engineers to improve ranking and recommendation systems.

Top Skills: C++Embedding SystemsFeature StoreFlinkJavaPythonPyTorchRayScalaSparkTensorFlow

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine