Albert Invent Jobs

Staff ML Ops Engineer

Albert Invent

Staff ML Ops Engineer

Reposted 20 Days Ago

In-Office or Remote

Hiring Remotely in Oakland, CA, USA

Senior level

In-Office or Remote

Hiring Remotely in Oakland, CA, USA

Senior level

As a Staff ML Ops Engineer, you'll architect high-performance systems for AI/ML, manage Kubernetes infrastructure, and develop scalable backend APIs, enabling scientific discoveries across the materials science industry.

The summary above was generated by AI

Albert’s mission is to digitalize the world of chemistry. Using data and machine learning, Albert enables R&D organizations to dramatically accelerate the invention of new materials. Our platform helps scientists and engineers build structured data foundations, digitize formulation and testing workflows, and apply AI to innovate faster, smarter, and at scale.

About the role

As our Backend & Infrastructure Engineer, you will architect and build the core systems that power everything our AI/ML team delivers—the APIs, infrastructure, and distributed systems that make intelligent capabilities possible at scale. This is a foundational role: you'll shape how AI gets built and shipped here.

We are seeking a highly motivated and talented individual with deep expertise in Python backend development, Kubernetes, and distributed systems. You'll be embedded with ML engineers and researchers, building robust systems that turn ambitious AI ideas into production realities—whether that's powering agent-based workflows, scaling inference, or enabling scientific computing pipelines. The infrastructure you build will directly enable researchers at the world's largest chemical and materials companies to leverage AI in ways that weren't possible before—accelerating discovery, enabling inverse design of novel materials, and transforming how science gets done.

What you'll do

Infrastructure & Kubernetes:

Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads

Manage containerized services, autoscaling, networking, and resource optimization

Backend Development:

Design and build high-performance Python APIs and services using FastAPI or similar frameworks

Architect backend systems for scalability, reliability, and low latency

Build integrations between AI/ML systems and the broader Albert platform

Distributed Systems:

Build and operate distributed systems that handle compute-intensive and high-throughput workloads

Design for fault tolerance, graceful degradation, and horizontal scalability

Implement async workflows, job queues, and task orchestration as needed

Data Infrastructure:

Architect and maintain data pipelines and storage systems supporting AI/ML workflows

Work with vector databases, caches, and other data stores as required by ML systems

Ensure efficient data access patterns for training and inference workloads

Reliability & Operations:

Implement observability including logging, metrics, tracing, and alerting

Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve

Design CI/CD pipelines and promote automation best practices

Implement infrastructure-as-code practices using Terraform, Helm, ArgoCd, Pulumi, or similar tools

Collaboration:

Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure

Translate ML prototypes and research code into scalable, maintainable systems

Contribute to technical decisions that shape the team's architecture

You will have

Deep expertise in Python backend development and distributed systems

Strong Kubernetes and cloud infrastructure experience

A builder's mindset—you want to create foundational systems that others build on

Genuine interest in science and technology; curiosity about how your work enables scientific discovery

A commitment to building systems that are reliable, maintainable, and scalable

Key competencies

A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering

Experience supporting AI/ML teams or deploying ML systems in production

Experience with GPU workloads and scheduling

Advanced proficiency in Python including async programming and performance optimization

Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting

Strong background in distributed systems and microservices architecture

Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code

Proficiency in REST API development using FastAPI, Flask, or similar

Experience with containerization and CI/CD pipelines

Track record of operating production systems at scale

Preferred/Bonus Points

Familiarity with scientific computing or research environments

Background in or curiosity about chemistry, materials science, or related fields

Familiarity with data engineering tools (Airflow, Dagster, or similar)

Experience with vector databases or search infrastructure

Expertise in observability tools (Prometheus, Grafana, Datadog)

Experience with message queues and event-driven architectures (Kafka, Redis, RabbitMQ)

Contributions to open-source projects

Experience mentoring engineers

Why Albert?

We have a huge impact. Albert is a growing team with a big reach. Our Platform facilitates the invention of materials for tens of thousands of companies and hundreds of thousands of applications - from coatings used on rockets to adhesives used in electric vehicles to 3D printed medical devices. We love distributed teams. Albert’s home-base is in the California Bay Area, but we have multiple offices and employees sprinkled around the globe. In fact, over 50% of our employees work outside of California! An international remote culture is in our DNA. We care about you. Albert works hard to create a positive environment for our employees, and we think your life outside of work is important too. We work hard and we play hard. We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day. We’re always looking for humble, sharp, and creative folks to join the Albert team. If you think you might be a fit please apply!

Oakland, CA, United States

Similar Jobs

NetBox Labs

Support Engineer

An Hour Ago

In-Office or Remote

110K-140K Annually

Senior level

110K-140K Annually

Senior level

Cloud • Software

Provide deep technical post-sale support for NetBox customers: onboard, diagnose and resolve issues across installations, integrations, and upgrades; participate in architecture reviews; meet SLAs; surface product feedback; create runbooks and reduce repeat escalations.

Top Skills: AmqpAws SnsCsvDcimDjangoDockerDocker ComposeGitGitlabGrafanaIpamJSONKafkaKubectlKubernetesLinuxMqttNetboxOauthOidcOpentelemetryPipPostgresPrometheusPythonRest ApisSAMLShell ScriptingSQLSsh

IMC Trading

Data Center Engineer

An Hour Ago

Remote or Hybrid

United States

Senior level

Fintech • Machine Learning • Software • Financial Services

The Data Center Engineer will manage physical servers and network hardware, ensuring their installation, maintenance, and troubleshooting, along with overseeing documentation and vendor coordination.

Top Skills: BashFiber OpticsLinuxNetwork CablingNetwork DevicesPythonServer HardwareStorage Systems

True Anomaly

Security Engineer

An Hour Ago

In-Office or Remote

145K-205K Annually

Senior level

145K-205K Annually

Senior level

Aerospace • Artificial Intelligence • Hardware • Machine Learning • Software • Defense • Manufacturing

The Senior Embedded Security Engineer will enhance the security of the flight software for space vehicles, ensuring high standards for safety and secure coding practices throughout the development lifecycle.

Top Skills: CC++CompilersDebuggersEmbedded SystemsIdesSecurity StandardsStatic Analysis Tools

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Albert Invent

Staff ML Ops Engineer

Albert Invent Oakland, California, USA Office

Similar Jobs

Support Engineer

Data Center Engineer

Security Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech