WayStation Logo

WayStation

Data Engineer

Posted 5 Days Ago
In-Office
Redwood City, CA, USA
Senior level
In-Office
Redwood City, CA, USA
Senior level
Own and build the end-to-end data layer: extraction pipelines from messy supplier emails and PDFs, the unified data model, reliable observable pipelines, and ML-driven extraction evaluation. Drive extraction accuracy, data quality, lineage, monitoring, and automation to scale coverage and enable engineering and product to depend on the data layer.
The summary above was generated by AI
Data Engineer

The owner of the data layer the entire product is built on — from raw supplier email to structured system of record.

Location: Redwood City, CA (In-person, 5 days/week)

Experience: 8+ years building production data systems, including hands-on early-stage startup experience (required); document extraction / ML / NLP pipelines a strong plus

Company: Waystation AI

About Waystation AI

Waystation is building the operating system for procurement in consumer packaged goods (CPG).

Today, ingredient and packaging sourcing still runs through inboxes, PDFs, and spreadsheets. It's slow, opaque, and costly. Waystation replaces that chaos with an AI-powered procurement platform that creates structure, visibility, and leverage — without forcing suppliers into portals.

The result: real ROI. One customer saved over $200,000 in the first three months, paying for their annual contract in the first 30 days.

Waystation is led by repeat founder Ryan Caldbeck (previously founded CircleUp) and backed by Founder Collective, Homebrew, Slow Ventures, 87 Capital, Floodgate, and SuccessVP. We have paying customers, real usage, and a product that works.

The Role

Structured data isn't a feature of our product — it is the product. We take the messiest input imaginable (hundreds of thousands of disconnected supplier emails and PDFs — specs, COAs, pricing, certs) and turn it into a clean, queryable system of record shared across procurement, QA, and R&D.

You own that layer end to end. The extraction pipeline, the data model, the infrastructure the rest of engineering builds on — it's yours, not a slice of it. The quality of what every user sees, what every model trains on, and what every customer ROI claim rests on flows through what you build. No one will hold your hand. You'll have unusual access and unusual scope, and you'll be expected to use both. You'll move fast and ship scrappy — a rough system working today beats a perfect one next quarter. We don't have the resources to gold-plate, and neither do you.

What You'll Do
  • Own the extraction pipeline. Turn messy supplier emails and documents — specs, COAs, pricing, certs, multi-language, bad scans — into structured, validated data.

  • Push accuracy and prove it. Drive extraction past today's 85%+ and build the eval harness that measures it, per document type, so the number is real and not a vibe.

  • Own the data model. Unify suppliers, documents, RFPs, pricing, and certifications into one source of truth — and build for institutional memory, so every email compounds into leverage.

  • Build infrastructure others depend on. Ship reliable, observable pipelines and own data quality, lineage, and the monitoring that catches problems before customers do.

  • Treat extraction as an ML problem. Eval sets, regression testing, accuracy tracking over time — turn customer-reported errors into systematic improvements, not one-off patches.

  • Build leverage. Reach for models and agents first. Automate the long tail instead of grinding it.

What We're Looking For

We'll back the right engineer over the right résumé. We care about a defined edge, depth, and ownership — not polish.

You're a strong fit if you:

  • Have built in the chaos — required. You've done real work at an early-stage startup (seed or Series A), where there was no playbook, no infrastructure handed to you, and never enough hours. You know the difference between building from zero and maintaining someone else's system. A purely big-company background isn't a fit for this seat.

  • Move fast and stay scrappy. You ship, learn, and iterate in the open rather than polishing in private. Constraints — fewer people, less tooling, no time — energize you instead of stalling you. You find the version that works now and earn the polish later.

  • Have one superpower. There's a thing you're genuinely better at than almost anyone — data systems, extraction, ML pipelines — and you can name it and point to results that prove it. A sharp edge and the slope to outgrow the job, not evenly good at everything.

  • Have real depth. 8+ years building production data systems. Deep with Python, SQL, and modern data tooling. You can architect a system as easily as you can ship a fix — and you do both at startup speed.

  • Own whole problems. You take messy things start to finish and close them without being asked. When the data is wrong, you fix the system, not the symptom.

  • Build leverage. You reach for tools, automation, and agents to scale yourself instead of grinding manually. We live in Claude Code — you should want to, too.

  • Are all in. This is a rocket ship you want to plant a flag on and ride through the messy middle — not a stepping stone. We're betting on you; we need you betting on us.

  • Have grit. You've ground at something hard for a long time, through the part where it stopped being fun and the feedback loop ran far longer than your next review. You don't flinch when the work gets ugly.

Bonus: document extraction, NLP, or ML pipelines; regulated document-heavy domains; CPG, supply chain, or procurement; multi-language data (Chinese, Spanish).

What Success Looks Like

You'll ramp fast and gear toward a scorecard built on four measures:

  • Extraction accuracy. A measurable climb past existing accuracy (precision & recall) across document types — proven by the evals you built, not asserted.

  • Pipeline reliability. Data-quality and uptime the product can depend on. Bad or missing data gets flagged automatically, before a customer ever sees it.

  • Coverage of the long tail. More supplier formats and document types handled cleanly. The set of things that break the pipeline keeps shrinking.

  • Leverage for the team. The data layer becomes something the rest of engineering builds on without thinking about it.

Values
  • We are reliable, credible, and authentic

  • We are solution-oriented

  • We are proud of our work, our customers, and ourselves

What We Offer
  • Competitive base salary + meaningful equity — real ownership, with upside tied to the outcomes you drive

  • Ownership of the data layer the entire product is built on, working directly with a repeat founder & CEO — a front-row seat to how an AI-native company gets built

  • A real product with real ROI — value you can measure

  • Full health, dental, and vision coverage

  • Unlimited vacation — we care about outcomes, not hours

  • An in-person team that values craft and ambition

How to Apply

Don't send a cover letter. Send two things:

  • A hard system you owned. One pipeline or data problem, taken start to finish — what was true before, what you built, what was true after.

  • Something you automated or built with AI. An eval harness, an agent, a workflow that scaled you — anywhere you replaced manual work with a system.

Short is fine. We're reading for ownership and judgment, not polish.

Similar Jobs

3 Days Ago
Hybrid
Mountain View, CA, USA
139K-207K Annually
Senior level
139K-207K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Design, build, and maintain Databricks-based ETL/ELT pipelines and medallion-layer data products. Integrate enterprise applications (including DataStage and ServiceNow), implement data quality, governance, and operational support, and collaborate with cross-functional teams to deliver scalable, production-ready data assets for analytics, reporting, and AI.
Top Skills: SparkDatabricksDatabricks LakehouseDatastageGitPythonServicenowSQL
11 Days Ago
In-Office or Remote
San Mateo, CA, USA
165K-350K Annually
Senior level
165K-350K Annually
Senior level
Artificial Intelligence • Legal Tech
Founding data engineer responsible for consolidating multiple data sources into a BigQuery warehouse, building ETL/ELT pipelines, creating self-serve data tools (including natural-language/LLM agents), enabling analytics and personalization, and defining data engineering standards and infrastructure for a growing AI product.
Top Skills: BigQueryData LakeEtl/EltGoogle Cloud PlatformLlmsPythonSQLTerraformText-To-Sql
14 Days Ago
Remote or Hybrid
Richmond, CA, USA
77K-202K Annually
Senior level
77K-202K Annually
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Senior Data Engineer on PwC's Managed Data, Analytics & Insights team to design, build and manage advanced data ecosystems. Responsibilities include designing data solutions and scalable pipelines, solving complex problems, mentoring junior staff, maintaining high delivery standards, and building client relationships while aligning solutions to business context.
Top Skills: DatabricksKafka

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account