Axiom Bio Jobs

Data Engineer

Axiom Bio

Data Engineer

Reposted 15 Days Ago

In-Office or Remote

Hiring Remotely in San Francisco, CA, USA

Mid level

In-Office or Remote

Hiring Remotely in San Francisco, CA, USA

Mid level

As a Data Engineer, you will build and maintain data systems for drug safety AI tools, ensuring reliable data processing and storage for research teams.

The summary above was generated by AI

About Axiom

Axiom is building the translational intelligence layer for drug discovery: AI systems that help scientists predict human toxicity earlier, more accurately, and more mechanistically than animal studies or legacy in vitro assays.

Unexpected toxicity is one of the largest reasons drug programs fail. Today, drug discovery teams still rely on fragmented experimental data, animal studies, literature evidence, and expert judgment to decide which molecules are safe enough to advance. We believe this can be dramatically improved.

Axiom generates and curates massive multimodal datasets spanning chemical structures, primary human cell imaging, transcriptomics, proteomics, mass spectrometry, ADME, clinical outcomes, human exposure, and literature-derived toxicity evidence. To date, we've built the largest experimental-to-clinical dataset and we are just getting started. These datasets power the models and agents that help drug hunters understand toxicity risk, mechanism, and safety margin.

We are looking for a data engineer to own the systems that make this possible. You will build the pipelines, infrastructure, APIs, and tooling that turn raw chemical, biological, and clinical data into ML ready training data and customer ready insights.

This is a foundational role. The quality, scale, and reliability of Axiom’s data systems will directly determine how fast our science moves, how good our models become, and how much customers trust our product.

Charter

Be a founding member of the team building the first accurate AI systems for drug toxicity prediction: systems that can help replace animal studies and legacy experiments with human-relevant predictive models.

You will build the data foundation that makes Axiom’s science and product possible.

What you will do

You will own core data infrastructure across Axiom’s research, ML, lab, and product systems.

You will:

Build and maintain Axiom’s core data platform for ingesting, processing, validating, storing, and serving chemical, biological, clinical, and customer datasets.
Build Axiom’s LabOS: the software layer that connects lab protocols, compound logistics, assay execution, plate/well metadata, instrument outputs, QC checks, data processing, model inference, and customer-facing results.
Turn raw experimental outputs into clean, versioned, ML-ready datasets across high-content imaging, transcriptomics, proteomics, mass spectrometry, ADME, dose-response, and clinical outcome data.
Design simple, reliable APIs and data interfaces that let scientists, ML researchers, and product engineers access the data they need without fighting infrastructure.
Build LLM powered systems for literature research, clinical data extraction, evidence curation, and dataset generation.
Develop distributed systems for running large-scale LLM jobs that clean, normalize, deduplicate, and structure biological and clinical data.
Scale inference pipelines for image models, graph neural networks, chemical models, and mechanistic agents.
Automate ETL from diverse sources, including lab instruments, CRO outputs, public databases, customer files, internal research tables, cloud storage, and literature-derived datasets.
Create rigorous data validation, testing, monitoring, lineage, and observability systems so Axiom can trust the datasets that drive model training, evaluation, customer delivery, and scientific decisions.
Work closely with scientists to understand messy real-world data needs and translate them into robust infrastructure.
Support customer-facing data delivery systems, including raw data transfer, processed feature exports, model predictions, compound metadata, and versioned result packages.
Build infrastructure that accelerates every team at Axiom.

What we are looking for

We are looking for someone who combines engineering taste, scientific curiosity, and extreme ownership.

You might be a great fit if:

You have built large-scale data platforms used by many internal teams or external users.
You are excited by messy, heterogeneous scientific data and want to make it clean, reliable, searchable, and useful.
You can move fluidly between backend engineering, distributed systems, ML infrastructure, data modeling, DevOps, and user-facing tooling.
You are comfortable talking to scientists, understanding their workflows, and building systems that make their work dramatically faster.
You care deeply about correctness, reproducibility, versioning, and data quality.
You have experience building AI- or LLM-powered data systems, especially for research workflows, retrieval, curation, or structured extraction.
You enjoy turning ambiguous research needs into simple, reliable infrastructure.
You want to own critical systems at an early-stage company.
You are deeply curious about biology, chemistry, drug discovery, AI, product, and business.
You could work in big tech, but you would rather build the data foundation for a company trying to change how medicines are discovered.

Technical skills we value

We do not expect every candidate to have all of these, but we are especially excited by experience with:

Python, Pandas, NumPy, Polars, PyArrow, DuckDB, SQL, and the broader Python data ecosystem.
Distributed systems and large-scale compute using Kubernetes, Slurm, Modal, Ray, Anyscale, Daft, Dask, Spark, or similar tools.
Cloud infrastructure on AWS, GCP, or Azure.
Infrastructure as code with Terraform, Pulumi, or similar tools.
CI/CD, automated testing, deployment systems, and production observability.
Data warehouses, lakehouses, object storage, and columnar formats such as Parquet.
Workflow orchestration tools such as Airflow, Dagster, Prefect, Flyte, or Argo.
LLM-powered data extraction, retrieval systems, evaluation harnesses, embeddings, and human-in-the-loop review systems.
ML inference infrastructure for image models, graph neural networks, chemical models, or large language models.
APIs, backend services, and internal tools that make complex data easy to use.
Scientific, biological, chemical, clinical, or healthcare data systems.
Petabyte-scale data processing.

The kind of person who thrives here

Axiom is not a normal company, and this is not a normal data engineering role.

We are looking for someone who wants to build the systems underneath a new kind of scientific company. The data is messy. The scale is large. The requirements change quickly. The users are brilliant and demanding. The stakes are high.

The people who thrive here:

Move with urgency.
Have exceptional engineering taste.
See what needs doing and do it.
Care deeply about reliability and correctness.
Can build fast without creating chaos.
Enjoy working with scientists and ML researchers.
Are excited by messy biological and chemical data.
Think in systems, not one-off scripts.
Want their work to multiply the output of the entire company.
Are not satisfied with incremental improvements.
Want to build a generational company.
We are looking for someone with a relentless observe-orient-decide-act loop: someone who constantly identifies bottlenecks, builds the right abstractions, and makes everyone around them faster.

San Francisco, CA, United States, 94107

Similar Jobs

PwC

Data Engineer

9 Days Ago

Remote or Hybrid

77K-202K Annually

Senior level

77K-202K Annually

Senior level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Design and build data infrastructure, pipelines, and integration solutions using cloud and big-data tools. Develop data lakes/warehouses, ensure data quality and security, apply data modeling and DAGs, use Databricks, Airflow, and Hadoop, and collaborate with clients to deliver actionable insights.

Top Skills: Apache AirflowApache HadoopAWSAzure Data FactoryDagsData LakeData WarehouseDatabricksDimensional ModelingAzure

PwC

Data Engineer

9 Days Ago

Remote or Hybrid

99K-232K Annually

Mid level

99K-232K Annually

Mid level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Lead design and implementation of data infrastructure, pipelines, and integrations using cloud platforms. Manage teams and client accounts, ensure data quality, security, and compliance, deploy scalable solutions (Databricks, Snowflake), mentor junior staff, and identify data-driven business opportunities.

Top Skills: Amazon Web Services (Aws)Azure Data FactoryDatabricksSnowflake

CrowdStrike

Data Engineer

6 Days Ago

Remote or Hybrid

USA

195K-320K Annually

Expert/Leader

195K-320K Annually

Expert/Leader

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

As a Principal Data Engineer, you will design and implement LLM, AI-powered security data platforms, mentor engineers, and drive the adoption of data solutions across teams.

Top Skills: AirflowAWSBigQueryDaskDockerFlinkGCPKafkaKubeflowKubernetesLangchainLlamaindexMlflowMlops ToolsOciPulsarPythonSagemakerSnowflakeSparkVertex Ai

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Axiom Bio

Data Engineer

Axiom Bio San Francisco, California, USA Office

Similar Jobs

Data Engineer

Data Engineer

Data Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech