Product.ai Jobs

AI Engineer

Product.ai

AI Engineer

Posted An Hour Ago

Be an Early Applicant

In-Office

Los Angeles, CA

170K-500K Annually

Senior level

In-Office

Los Angeles, CA

170K-500K Annually

Senior level

Own and evolve a production fleet of long-running AI agent automations: design runtimes, liveness checks, verification systems, token economics, model-routing, and a shared knowledge base. Build deterministic alarms, oracle-separated checkers, regression corpora, and CI/monitoring that prevent silent failures. Work directly with the founder, ship verification-first features, and replace ad-hoc review with scalable verification and escalation paths.

The summary above was generated by AI

A builder with a high technical bar whose leverage is judgment, not keystrokes - in the Office of the CEO.

Product.ai is the verified truth layer for shopping - the intelligence that tells you what's actually true about a product, including when not to buy. Profitable. Bootstrapped. No outside investors. No board. 20 people outbuilding companies 10× our size.

Strong people find us and keep finding us - they apply over months and years, because the field moves fast and the exact profile we need moves with it.

Why This Role ExistsMost of our software is now written by AI agents. So the job that matters is no longer typing the code - it's deciding what to build, designing the systems the agents run inside, and knowing how you'll prove the work is correct. You'll be a builder closer to a product engineer than a heads-down coder: your leverage is judgment and taste, not typing speed.

Your first surface is the agent harness behind the Office of the CEO - the live automation that lets 20 people move like a company many times our size. A recruiting-evaluation pipeline scoring 30+ candidates a day across open roles. A merchant-discovery pipeline landing ~1,200 merchants a day. The content, data, and ops automation underneath it all. This year these crossed a threshold: agent runs that go 1-4 hours unattended became a normal unit of work for us, and that fleet now deserves a dedicated owner.

But you won't stay boxed into one surface. This is a generalist builder's seat, working directly with the founder. The harness is where you start, not the ceiling of what you'll touch.

The System You'll Need to Model

A fleet of production automations whose failure mode is silent death. Pipelines here rarely fail loudly - they stop, and the cost accrues invisibly until someone notices days later. The real engineering problem is liveness: designing alarms and deterministic checks so that no automation in the fleet can die unnoticed.
Long-lived agent runs that go 1-4 hours unattended. They hold together not because someone watches them, but because they run on architectural law (the rules an agent run is bound by), a fuel budget, and verification built in. You design what governs a run, what it's allowed to spend, and what proves it worked.
Verification the agent cannot author. A generative model cannot reliably grade its own output, so a verifier that shares the generator's context will launder its own mistakes. The architecture is external truth anchors, regression corpora, and oracle-separated checkers - a separate judge that never sees what the builder saw. This separation is the whole game, and it's also the company's thesis: verified truth a model can't fake.
Token spend judged by what it moved, not what it cost. Every run is instrumented for the outcome it produced, and budget gets redirected toward what's working while the run is still going. We're quality-maximalist: the expensive thing is a redo cycle, never tokens.
A shared knowledge base the agents stand on - a brain of 8,600+ indexed documents your automations query to answer their own questions, governed by the same architectural law your work is. Your systems read from it, feed it, and are bound by it.
Architecture that moves weekly. We built this harness before unattended runs were even possible at scale, and the ground keeps shifting. You'll model where it's going and act without waiting for a brief.

If reading that energizes you, keep going. If it feels overwhelming or underspecified, this isn't the right fit.

What You Will Own

The agent harness for the Office of the CEO. Recruiting evaluations, merchant discovery, ops automation - the run designs, the runtime they execute in, and the architectural law that governs them. When a new automation is needed, you decide how it runs, what governs it, and what proves it correct.
Verification that scales as the work compounds. Ad-hoc human review collapses somewhere around 100-150 artifacts a day, and we're heading straight through that ceiling. You build what replaces it: regression corpora, oracle-separated checkers, sampling protocols, and escalation paths that put a human in the loop only where real judgment is needed.
A liveness check on every automation. Each system ships with a deterministic companion that proves it's alive and correct - you own the standard and the coverage. Nothing runs in production without one; nothing dies unnoticed.
Token economics and model-routing for the fleet. You decide which model runs which job and why, pointing real compute at business outcomes and adjusting the spend mid-run. You'll own falsifiable outcomes - each with an evidence test a stranger could run.

Who You AreYou form working models of running systems on your own. You can read a pipeline you've never seen and sketch its failure modes the same day, notice where your model is wrong, and update fast. When a build comes out wrong, you fix the spec that produced it, not just the symptom in front of you. You write clearly, because clear writing is evidence of clear thought - and here, what you write becomes the law your agents execute.

You move between architecture and implementation without getting stuck at either altitude: designing a verification gate in the morning, shipping it alarmed and instrumented by the afternoon. You have strong, earned opinions about guardrails, evaluations, and agent behavior, and you make good calls in the gray instead of queuing questions. High agency is your resting state.

You can do this job by hand, and prove it - that depth of craft is exactly what lets you direct agents and still trust the result. You've built automations that ran in production for months, and you can say precisely how you knew when one was failing. Evaluation harnesses, CI gates that actually block, data pipelines with verification companions, agent systems with regression suites. You treat agents as leverage you verify, not autocomplete you trust - depth is what lets you trust the verdict. We care about the artifact and the reasoning behind it far more than where you built it or what's on your diploma.

Who this isn't for. This isn't the seat for everyone. It's wrong if your code is whatever the model handed you and you couldn't say why it's right - directing agents without depth of your own breaks down fast here. It's wrong if you're comfortable letting an agent grade its own work. It's wrong if you work best as a watched assistant or want a ticket queue to execute and report done. It's wrong if you think mainly in projects, timelines, and programs - the steering here happens in real time, mid-run. And it's wrong if you mostly optimize for logos on a resume. You'll be happiest if you ship the verification companion with the feature, fix the spec instead of the symptom, and want your visibility to come from registered architecture decisions and outcomes moved - not hours, meetings, or activity.

How We EvaluateWe don't run traditional systems-engineering interviews.

Written artifact. Submitted with your application. Show us a system you built and the hardest failure you personally diagnosed in it - what broke, how you found it, and what you changed. A postmortem, a monitoring design, a runbook, code, a live dashboard. This is the first filter, and writing quality plus depth of diagnosis are what we read for - a real story of a failure you owned is the thing we can't fake-detect any other way.
Video screen. Brief and async: 5-6 questions, about 15 minutes total, done whenever works for you.
Calls with company stakeholders. Short conversations with key members of the team.
Conversation with the founder. Chemistry and comprehension - can you model the system you just read about?
Paid work trial. One to two weeks of real work in our real environment. We watch how you ground yourself, whether you write the spec before the build, how you verify what your agents produce, and whether your self-assessment is honest.

Compensation & OwnershipTotal first-year comp: $250,000-$500,000 (base + equity + profit sharing). We hire at two levels and place you by your demonstrated work, not your résumé: an Engineer level (≈$250K-$350K total / $170K-$220K base) and a Senior Engineer level (≈$400K-$500K total / $300K-$340K base). We set your level from what you've actually built - a sharp builder early in their craft and a seasoned architect are both welcome, each at the level their work earns.

Ownership is real: Profits Interest Units (Class B, $0 strike, capital-gains treatment), annual profit sharing from free cash flow, annual tender liquidity, 100% family premium coverage, and an effectively unlimited token budget steered by ROI.

Similar Jobs at Product.ai

Product.ai

Artificial Intelligence Engineer

An Hour Ago

Hybrid

250K-450K Annually

Senior level

250K-450K Annually

Senior level

Artificial Intelligence • Big Data • Consumer Web • eCommerce

Lead verification architecture for LLM-driven agent loops: design oracle-separated verifiers, build regression corpora and eval harnesses, own throughput and truth-quality gates, set model-routing rules, and author corpora and specs that guarantee adversarially verified claims at scale.

Top Skills: Agent LoopsAi AgentsCi For Non-Deterministic SystemsEval HarnessesGolden SetsLlmsModel RoutingRegression CorporaToken-InstrumentationVerifier Agents

Product.ai

Chief Of Staff

An Hour Ago

Hybrid

200K-400K Annually

Expert/Leader

200K-400K Annually

Expert/Leader

Artificial Intelligence • Big Data • Consumer Web • eCommerce

Own end-to-end operational outcomes and run long-lived AI agents to move them. Improve recruiting funnel throughput, operate an ownership-equity program, manage company cadence, vendors, and workplace, and instrument agent compute and outcomes. Measured on outcome movement and CEO decision capacity returned.

Product.ai

Forward Deployed Engineer - Agent Platforms

An Hour Ago

Hybrid

200K-425K Annually

Senior level

200K-425K Annually

Senior level

Artificial Intelligence • Big Data • Consumer Web • eCommerce

Own agent-platform partnerships end-to-end: source and close integrations, build the production demo/integration (MCP/APIs), set pricing/monetization, and ship verified-commerce capabilities into major AI ecosystems. Success measured by live integrations, first MCP/API revenue, and reusable pricing/packaging playbooks.

Top Skills: Agent FrameworksAPIsApple App IntentsChatgpt AppsClaude ConnectorsGoogle Gemini ExtensionsMcpMcp Server

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine