Risk Labs Foundation Logo

Risk Labs Foundation

Senior LLM Systems Engineer

Posted 21 Days Ago
Remote
Hiring Remotely in USA
100K-200K Annually
Senior level
Remote
Hiring Remotely in USA
100K-200K Annually
Senior level
The Senior LLM Systems Engineer will enhance LLM-driven oracle systems focusing on accuracy, performance, resilience, and operational quality, while building robust evaluation and monitoring tools for production systems.
The summary above was generated by AI
Why This Role Exists:

We are hiring a Senior LLM Systems Engineer to own and improve the LLM-driven components of our oracle automation stack. This person will focus on the accuracy, performance, resilience, and operational quality of the systems that use models to reason about wide ranging prediction market rules, evidence, and oracle outcomes.

This is a production systems role, not a research-only or prompt-only role. You will build the evaluations, observability, tooling, fallbacks, and feedback loops that make LLM behavior measurable and dependable in real-world conditions.

What You'll Own:
  • LLM Accuracy: improve prompts, model selection, tool usage, structured outputs, retrieval, and evaluation coverage so the system gets more decisions right over time.

  • System Performance: reduce latency, token usage, and cost while preserving decision quality and operational reliability.

  • Resilience: design validation, retries, fallbacks, uncertainty handling, and human review paths for ambiguous, adversarial, incomplete, or conflicting inputs.

  • Evaluation and Monitoring: build datasets, regression tests, dashboards, traces, and review loops that make model quality visible and prevent repeated failures.

  • Agent and Tooling Architecture: Improve agent orchestration and tool use across internal services, APIs, search workflows, databases, and external data sources.

  • Production Operations: help debug live issues, investigate regressions, improve runbooks, and reduce repeated operator friction.


What Success Looks Like:
  • The oracle automation system handles a wider range of market and resolution scenarios with higher measured accuracy.

  • LLM quality is tracked through evaluations and regressions instead of judged only through manual spot checks.

  • Engineers and operators can inspect model behavior, tool usage, reasoning paths, and uncertainty when investigating outcomes.

  • Latency and cost improve without hiding quality regressions.

  • The system fails more gracefully when data is missing, tools fail, sources disagree, or cases are genuinely ambiguous.


Skills & Experience

Required
  • 3+ years of professional software engineering experience in Python, TypeScript, or similar production languages.

  • Hands-on experience building production systems that use LLMs, agents, retrieval, structured outputs, or model-powered workflows.

  • Experience designing evaluations, test datasets, regression checks, quality metrics, or manual review loops for AI systems.

  • Strong debugging ability across APIs, databases, queues, logs, model outputs, and external data sources.

  • Practical understanding of prompt engineering, tool calling, structured output validation, retrieval, and common LLM failure modes.

  • Ability to reason carefully about correctness in uncertain or adversarial environments.

  • High agency, strong ownership, and clear written communication.


Nice to Have
  • Experience with oracle systems, prediction markets, DeFi protocols, or other crypto infrastructure.

  • Experience with UMA, optimistic oracle mechanisms, Polymarket, or similar systems.

  • Experience building agentic systems that use tools, search, browser automation, APIs, or database queries.

  • Experience with LLM tracing, model monitoring, evaluation frameworks, or AI observability tools.

  • Experience optimizing model cost and latency at scale.

  • Experience with Postgres, data pipelines, queue-based systems, background jobs, or event-driven architectures.

  • Familiarity with blockchain operational constraints, especially RPC limits, indexing, event logs, finality, and chain-specific behavior.

  • Experience with GCP, Cloud Run, GitHub Actions, Terraform, or similar infrastructure.


Tech Stack

Our tech stack includes Python, TypeScript, Postgres, GCP, Cloud Run, GitHub Actions, Terraform, React, Node.js, Solidity, and LLM APIs from major model providers.


Compensation and Benefits
  • Pay packages include competitive salaries & meaningful long term equity participation

  • Salaries for this role range from $100-200k (USD)

  • Will pay in stablecoins or fiat

  • Philosophies for a culture that show we care: Take vacation when you need it, family care, training and development (just to name a few)

  • 100% remote, which means we encourage you to create the work environment that you thrive in

  • At least two team wide offsites a year

Our Values:
  • We value curiosity.

  • We value openness, honesty and directness.

  • We value integrity.

  • We value iterative learning.

  • We value taking smart risks.

  • We value being high agency.


Why Work at Risk Labs?

You’ll work on production AI systems that directly impact live protocol operations, where improvements to accuracy, resilience, and observability materially affect real-world outcomes.

Risk Labs is the core team behind UMA and Across, building infrastructure that pushes crypto forward. We value ownership, curiosity, thoughtful risk-taking, and direct communication.

 

Closing

Studies show that women and people of colour are less likely to apply unless they meet every qualification. Risk Labs is committed to building a diverse, inclusive, and authentic workplace. If you're excited about this role, even if your experience doesn't align perfectly, we encourage you to apply.

Risk Labs is an equal opportunity employer and does not discriminate based on race, religion, gender, sexual orientation, age, disability, or veteran status.

Our Team and Backers

Our global team blends deep technical expertise with diverse business perspectives. Backed by investors including Placeholder, Blockchain Capital, Bain Capital, Coinbase, and Dragonfly.

Similar Jobs

3 Hours Ago
Remote or Hybrid
Pennsylvania, USA
71K-166K Annually
Junior
71K-166K Annually
Junior
Digital Media • Information Technology • News + Entertainment
Full‑stack .NET developer responsible for writing, maintaining and optimizing code, designing APIs and system architecture, implementing unit/integration tests, supporting deployments, troubleshooting performance issues, and collaborating with QA and stakeholders. May work variable hours including nights/weekends.
Top Skills: AjaxAngularAsp.NetBootstrapperC#Continuous IntegrationCSSEntity FrameworkGitHTMLIisIocJavaJavaScriptJqueryJSONLinqMvc 5Net Core 2.0Net FrameworkOrmSalesforce Experience CloudSap AbapSQL ServerTfsTypescriptVb.NetVisual StudioWeb ApiXML
3 Hours Ago
Remote or Hybrid
Pennsylvania, USA
84K-196K Annually
Senior level
84K-196K Annually
Senior level
Digital Media • Information Technology • News + Entertainment
Design, build, test, and deploy scalable Salesforce solutions across multi-cloud environments using Apex, LWC, Visualforce, declarative tools and integrations (MuleSoft/REST/SOAP). Lead configuration, data modeling, CI/CD, code reviews, troubleshooting, and Agile delivery while mentoring junior engineers and supporting platform governance and long-term architectural alignment.
Top Skills: ApexCi/CdCopadoCSSFlowsGitHTMLJavaScriptLightning App BuilderLightning Web Components (Lwc)Media CloudMulesoftRest ApisSales CloudSalesforce Experience CloudService CloudSoap ApisSOQLSoslVisualforce
3 Hours Ago
Remote or Hybrid
65K-139K Annually
Senior level
65K-139K Annually
Senior level
Digital Media • Information Technology • News + Entertainment
Sell Comcast Business solutions to mid-market and enterprise multi-location customers by developing territory strategy, prospecting leads, delivering face-to-face presentations, and managing customer relationships. Collaborate with partners and internal teams to meet financial targets, ensure service excellence, and maintain accurate sales records. Requires knowledge of network design, SDWAN, security, and related networking technologies.
Top Skills: 23)Business Continuity/Disaster RecoveryCustomer Premises Equipment (Cpe)CybersecurityEthernetLanManNetwork SecurityNetworking Protocols (Layers 1SdwanVoipVpnWanWdm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account