Axiom Bio Logo

Axiom Bio

Data Engineer

Reposted 21 Days Ago
In-Office
San Francisco, CA, USA
Mid level
In-Office
San Francisco, CA, USA
Mid level
As a Data Engineer, you will build and maintain data systems for drug safety AI tools, ensuring reliable data processing and storage for research teams.
The summary above was generated by AI
About Axiom

Axiom is building the translational intelligence layer for drug discovery: AI systems that help scientists predict human toxicity earlier, more accurately, and more mechanistically than animal studies or legacy in vitro assays.

Unexpected toxicity is one of the largest reasons drug programs fail. Today, drug discovery teams still rely on fragmented experimental data, animal studies, literature evidence, and expert judgment to decide which molecules are safe enough to advance. We believe this can be dramatically improved.

Axiom generates and curates massive multimodal datasets spanning chemical structures, primary human cell imaging, transcriptomics, proteomics, mass spectrometry, ADME, clinical outcomes, human exposure, and literature-derived toxicity evidence. To date, we've built the largest experimental-to-clinical dataset and we are just getting started. These datasets power the models and agents that help drug hunters understand toxicity risk, mechanism, and safety margin.

We are looking for a data engineer to own the systems that make this possible. You will build the pipelines, infrastructure, APIs, and tooling that turn raw chemical, biological, and clinical data into ML ready training data and customer ready insights.

This is a foundational role. The quality, scale, and reliability of Axiom’s data systems will directly determine how fast our science moves, how good our models become, and how much customers trust our product.

Charter

Be a founding member of the team building the first accurate AI systems for drug toxicity prediction: systems that can help replace animal studies and legacy experiments with human-relevant predictive models.

You will build the data foundation that makes Axiom’s science and product possible.

What you will do

You will own core data infrastructure across Axiom’s research, ML, lab, and product systems.

You will:

  • Build and maintain Axiom’s core data platform for ingesting, processing, validating, storing, and serving chemical, biological, clinical, and customer datasets.

  • Build Axiom’s LabOS: the software layer that connects lab protocols, compound logistics, assay execution, plate/well metadata, instrument outputs, QC checks, data processing, model inference, and customer-facing results.

  • Turn raw experimental outputs into clean, versioned, ML-ready datasets across high-content imaging, transcriptomics, proteomics, mass spectrometry, ADME, dose-response, and clinical outcome data.

  • Design simple, reliable APIs and data interfaces that let scientists, ML researchers, and product engineers access the data they need without fighting infrastructure.

  • Build LLM powered systems for literature research, clinical data extraction, evidence curation, and dataset generation.

  • Develop distributed systems for running large-scale LLM jobs that clean, normalize, deduplicate, and structure biological and clinical data.

  • Scale inference pipelines for image models, graph neural networks, chemical models, and mechanistic agents.

  • Automate ETL from diverse sources, including lab instruments, CRO outputs, public databases, customer files, internal research tables, cloud storage, and literature-derived datasets.

  • Create rigorous data validation, testing, monitoring, lineage, and observability systems so Axiom can trust the datasets that drive model training, evaluation, customer delivery, and scientific decisions.

  • Work closely with scientists to understand messy real-world data needs and translate them into robust infrastructure.

  • Support customer-facing data delivery systems, including raw data transfer, processed feature exports, model predictions, compound metadata, and versioned result packages.

  • Build infrastructure that accelerates every team at Axiom.

What we are looking for

We are looking for someone who combines engineering taste, scientific curiosity, and extreme ownership.

You might be a great fit if:

  • You have built large-scale data platforms used by many internal teams or external users.

  • You are excited by messy, heterogeneous scientific data and want to make it clean, reliable, searchable, and useful.

  • You can move fluidly between backend engineering, distributed systems, ML infrastructure, data modeling, DevOps, and user-facing tooling.

  • You are comfortable talking to scientists, understanding their workflows, and building systems that make their work dramatically faster.

  • You care deeply about correctness, reproducibility, versioning, and data quality.

  • You have experience building AI- or LLM-powered data systems, especially for research workflows, retrieval, curation, or structured extraction.

  • You enjoy turning ambiguous research needs into simple, reliable infrastructure.

  • You want to own critical systems at an early-stage company.

  • You are deeply curious about biology, chemistry, drug discovery, AI, product, and business.

  • You could work in big tech, but you would rather build the data foundation for a company trying to change how medicines are discovered.

Technical skills we value

We do not expect every candidate to have all of these, but we are especially excited by experience with:

  • Python, Pandas, NumPy, Polars, PyArrow, DuckDB, SQL, and the broader Python data ecosystem.

  • Distributed systems and large-scale compute using Kubernetes, Slurm, Modal, Ray, Anyscale, Daft, Dask, Spark, or similar tools.

  • Cloud infrastructure on AWS, GCP, or Azure.

  • Infrastructure as code with Terraform, Pulumi, or similar tools.

  • CI/CD, automated testing, deployment systems, and production observability.

  • Data warehouses, lakehouses, object storage, and columnar formats such as Parquet.

  • Workflow orchestration tools such as Airflow, Dagster, Prefect, Flyte, or Argo.

  • LLM-powered data extraction, retrieval systems, evaluation harnesses, embeddings, and human-in-the-loop review systems.

  • ML inference infrastructure for image models, graph neural networks, chemical models, or large language models.

  • APIs, backend services, and internal tools that make complex data easy to use.

  • Scientific, biological, chemical, clinical, or healthcare data systems.

  • Petabyte-scale data processing.

The kind of person who thrives here

Axiom is not a normal company, and this is not a normal data engineering role.

We are looking for someone who wants to build the systems underneath a new kind of scientific company. The data is messy. The scale is large. The requirements change quickly. The users are brilliant and demanding. The stakes are high.

The people who thrive here:

  • Move with urgency.

  • Have exceptional engineering taste.

  • See what needs doing and do it.

  • Care deeply about reliability and correctness.

  • Can build fast without creating chaos.

  • Enjoy working with scientists and ML researchers.

  • Are excited by messy biological and chemical data.

  • Think in systems, not one-off scripts.

  • Want their work to multiply the output of the entire company.

  • Are not satisfied with incremental improvements.

  • Want to build a generational company.

  • We are looking for someone with a relentless observe-orient-decide-act loop: someone who constantly identifies bottlenecks, builds the right abstractions, and makes everyone around them faster.

HQ

Axiom Bio San Francisco, California, USA Office

San Francisco, CA, United States, 94107

Similar Jobs

3 Days Ago
Remote or Hybrid
CA, USA
85K-120K Annually
Mid level
85K-120K Annually
Mid level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
As a Data Engineer, you'll lead data engineering projects, develop data transformations, build workflows with Apache Airflow, and ensure data quality and integrity while collaborating with stakeholders.
Top Skills: AirflowCi/CdDbtGitPythonRedshiftSnowflake
5 Days Ago
In-Office
San Jose, CA, USA
116K-174K Annually
Mid level
116K-174K Annually
Mid level
Artificial Intelligence • Fintech • Software
The Solutions Data Engineer will manage data integrations end-to-end, utilizing FloQast's Data Platform and Data Studio to connect source systems and transform data for reliable outputs across applications.
Top Skills: APIsExcelPythonSftpSQL
6 Days Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
102K-154K Annually
Mid level
102K-154K Annually
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
As a Data Engineer II, you will build, scale, and optimize data platforms, focusing on ETL/ELT pipelines, and collaborate with data scientists and AI engineers to support data-driven applications and insights.
Top Skills: DatabricksDbtPythonRdsRedshiftSnowflakeSparkSQL

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account