Drata Jobs

Senior Platform AI Engineer

Drata

Senior Platform AI Engineer

Reposted 22 Days Ago

Hybrid

San Francisco, CA, USA

192K-260K Annually

Senior level

Hybrid

San Francisco, CA, USA

192K-260K Annually

Senior level

The Senior Platform AI Engineer will develop production infrastructure for AI features, focusing on server architecture, agent orchestration, and AI model lifecycle management. The role requires building efficient systems and collaborating with various engineering teams to optimize workflow and enhance deployment processes.

The summary above was generated by AI

Our Mission & Values:
At Drata, we help companies earn and keep the trust of their users, customers, partners, and prospects. We’re the proof layer that shows great companies deserve the trust they aim to build.

We live our values every day. Built on Trust means consistency is everything. Act with Integrity by always doing the right thing. Being Customer-Obsessed keeps the people we serve at the center of our work. Competitive Fire drives us to push ourselves harder than anyone else. Diversity brings unique perspectives that lead to better solutions. Automation First ensures we save time and money by making efficiency a priority.

Our Culture & Work Style 🚀

At Drata, we’re not just building software - we’re building a mindset. Everything we do springs from:

Be a Driver (Owner‑Operator Mentality): Own your work. Improve relentlessly. Deliver results.
Move at Drata Speed (Precision & Velocity): Fast decisions. Quick learning. Immediate impact.
Stay Mission-Driven (Customer‑Obsessed): Challenge assumptions. Deliver value. Stay hungry.

We pair that high-velocity culture with a thoughtful hybrid model because we believe flexibility and collaboration both matter. That’s why in the Bay we come together in-office Tuesday through Thursday our high‑impact collaboration days where teams align, strategize, and innovate. Mondays and Fridays are flexible, giving you space for focused work, balance, and autonomy.

If you thrive when you’re empowered, energized, and working with smart, mission-driven people, you’ll feel at home here.

Why Join The Drata Team?

The best way to understand the Driver’s Mindset is to see it in action. We’re an award-winning, mission-driven team of 600+ people worldwide, united by a culture that values trust, speed, and continuous growth.

See the Speed: Watch our CEO, Adam Markowitz, discuss the hyper-growth journey, from $0 to $100M ARR in just four years
Hear the Voice of the Team: Explore our "Life at Drata" page for employee testimonials on our collaborative and the growth opportunities available.
Experience the Impact: See why we are consistently recognized on Fortune's Best Workplaces lists.
Connect with Us on Socials: LinkedIn - follow us for company updates, employee stories, and career news.

Job Summary:

Drata's AI Platform team builds the production infrastructure that powers AI features across our compliance platform — from MCP servers that make Drata's data available to AI agents, to LLM workflow orchestration that automates SOC 2, TPRM, and policy analysis. You'll own the systems that sit between our AI models and our customers: tool definitions that agents actually understand, deployment pipelines that handle model upgrades without breaking output quality, and orchestration layers that manage multi-step agent workflows with persistent state.

This is not a traditional infrastructure role. You'll debug prompt templates alongside Terraform modules. You'll design API schemas optimized for LLM token budgets, not just HTTP throughput. When a model upgrade changes behavior across 15 workflows, you'll assess quality impact — not just confirm the containers are healthy.

You'll work closely with our agent developers, product engineers, and an embedded SRE partner, sitting at the intersection of AI development and production reliability.

Our north star is simple: minimize the time it takes to launch a new agent in production. You're someone who asks "are we solving the right problem?" before writing the first line of code, who builds systems that make five other engineers faster, not just yourself, and who's equally proud of what they chose not to build.

What you'll do:

MCP Server Development & AI-Optimized API Design

Design and build MCP (Model Context Protocol) servers that expose Drata's platform to AI agents. This means making architectural decisions about tool granularity, naming conventions for agent disambiguation, response compression for LLM context windows, and workspace isolation for multi-tenant access. You'll own the protocol layer that determines whether agents can reliably find and use the right tools — writing semantic parameter descriptions, contextual hints, and tool schemas that optimize for model comprehension, not just developer ergonomics.

Agent Orchestration & Workflow Infrastructure

Build and operate the infrastructure for deploying multi-step agent workflows — state management across complex reasoning chains, tool routing and execution runtimes, and long-running agentic processes that persist over time. Own the orchestration layer that coordinates agent planning, tool calls, and human-in-the-loop patterns. Design systems that handle agent failure modes gracefully: retries on ambiguous tool outputs, fallback strategies when models produce unexpected results, and observability into multi-step execution traces.

LLM Operations & Model Lifecycle Management

Own the operational side of our LLM workflows: model upgrades across production pipelines (assessing behavior changes, not just version bumps), prompt versioning and A/B testing, AI workflow deployment with custom container compatibility, and output quality monitoring.
Manage token capacity planning — understanding model costs, context limits, batching strategies, and rate governance across workflows. When an AI workflow fails, you'll investigate whether it's a prompt template issue, a model behavior change, or an infrastructure problem. Making that distinction requires understanding both systems.

Production AI Infrastructure & RAG Systems

Operate and evolve our production AI stack: vector storage and indexing (designing chunking strategies and metadata schemas for retrieval quality), document parsing pipelines, multi-region deployment, and cost optimization across LLM providers. You'll make RAG architecture decisions — embedding strategies, retrieval filtering, data model coordination — where the engineering challenge is search quality, not just system uptime. Implement caching layers and token-aware request routing to manage spend as AI workloads scale.

Platform Enablement & Developer Experience

Build CI/CD patterns specific to AI workflows (reproducible deployments, SDK version compatibility, workflow rollback semantics). Own AI-specific observability — token usage dashboards, response quality metrics, agent execution traces, and cost-per-workflow tracking alongside traditional infrastructure monitoring. Enable product engineering teams to ship AI features faster by providing reliable, well-documented platform primitives.

What you'll bring:

7+ years of software engineering experience, with 2+ years building or operating AI/ML infrastructure in production. You're strong in Python (our AI services are built in Python), with TypeScript/Node.js a nice-to-have. You've worked with LLM APIs, vector databases, or AI orchestration platforms and understand the difference between "the service is up" and "the model output is good." You're comfortable across the stack: writing Terraform one day, debugging a prompt template the next, and designing an agent orchestration framework the day after.

Specifically, you bring experience in several of these areas: cloud infrastructure (AWS preferred — ECS, S3, Bedrock), container orchestration, infrastructure-as-code, CI/CD pipeline design, API design, workflow orchestration engines, and distributed systems. You've worked with at least some AI-specific tooling: LLM APIs (Claude, OpenAI, etc), model serving frameworks (vLLM, SageMaker etc), vector databases, embedding pipelines, prompt management platforms, or agent frameworks.

You communicate clearly about technical tradeoffs, especially when explaining AI-specific infrastructure decisions to stakeholders who think in terms of traditional reliability engineering. You own what you see broken, not just what's assigned to you, and you can spot when an architecture decision will fail at scale and say so early, clearly, and with an alternative.

How we support you:
At Drata, our people are our strongest advantage—and we prove it with support that exceeds industry standards. Our total rewards package is designed to power your well-being, accelerate your growth, and keep your work-life balance thriving.

Explore how we invest in your Life at Drata.

Shared Success: We provide stock equity to ensure that as the company grows, you share directly in that success. Equity gives every employee a sense of ownership and the opportunity to celebrate our wins together—because your contributions don’t just support our progress; they help drive our collective success.
Health & Wellness: Up to 100% employer-paid premiums for medical, dental, and vision coverage for employees and their dependents, along with comprehensive wellness benefits and healthcare concierge services designed to support your needs beyond traditional insurance.
Financial Well-being: A comprehensive suite of financial benefits, including a 401(k) plan, company-paid life and disability insurance, tax-advantaged spending accounts, and a range of discounted voluntary offerings to help you customize and strengthen your overall financial position.
Family Support: We want to support you in life's most important moments, so we offer a paid Parental Leave policy, after six months of employment. Employees also receive access to Kindbody fertility and family-building benefits and dedicated leave specialists who help guide you through the entire process.
Growth & Development: Generous annual stipends for both professional and personal development, empowering you to invest in your continued growth. You’ll also have access to a wide range of internal learning opportunities, ensuring you can build new skills, deepen your expertise, and advance your career with confidence.
Time Off & Flexibility: We believe that to do your best work, you should get the time you need for rest, rejuvenation and recovery. Drata offers a flexible vacation policy, paid holidays, and other perks to recharge.

This role will receive a competitive base salary, benefits, and stock, typically in the form of Restricted Stock Units (RSUs). The applicable salary range for this role is: $192,000 - $259,800.

A variety of factors are considered when determining someone’s leveling and compensation–including a candidate’s professional background and experience. These ranges may be modified in the future and final offer amounts may vary from the amounts listed above.

Similar Jobs

Capital One

Artificial Intelligence Engineer

Yesterday

Hybrid

230K-286K Annually

Senior level

230K-286K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

Lead design, development, deployment, and optimization of large-scale AI/ML systems (training, inference, orchestration, observability). Collaborate cross-functionally to build responsible, scalable generative AI and agentic platforms, improve performance and cost, and shape long-term ML systems roadmap.

Top Skills: AWSAzureC#C++GoGCPGuardrailsJavaKubeflow PipelinesKubernetesLlm InferencePolarsPythonRayScalaSimilarity SearchVectordbs

Clear Street

Platform Engineer

5 Days Ago

Easy Apply

Remote or Hybrid

USA

Easy Apply

200K-350K Annually

Senior level

200K-350K Annually

Senior level

Fintech • Software • Financial Services

Build the core AI platform and high-performance Rust backend powering an AI-native trading copilot. Implement streaming responses, low-latency tool execution, caching, orchestration of models and tools, secure APIs, auditing/tracing, and safe, auditable trading action infrastructure. Partner with product and frontend to enable AI-native UX and translate trading domain requirements into resilient platform primitives.

Top Skills: CachingLlm ApisModel ServingPostgresReactReact NativeRustStreamingTokio/AsyncTypescript

Capital One

Artificial Intelligence Engineer

12 Days Ago

Hybrid

San Jose, CA, USA

230K-286K Annually

Senior level

230K-286K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

Lead design, build, deploy, and support GenAI platform services (foundation model training, LLM inference, similarity search, guardrails, evaluation, observability). Optimize LLM performance and scalability, collaborate cross-functionally, contribute to technical vision and roadmap.

Top Skills: AWSAws UltraclustersAzureC#C++GoGCPHugging FaceJavaLarge Language ModelsLlm InferenceNemo GuardrailsPythonPyTorchScalaSimilarity SearchVectordbs

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine