Apollo Research Logo

Apollo Research

Backend Software Engineer (Research team)

Posted 4 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA, USA
100K-270K Annually
Senior level
In-Office
San Francisco, CA, USA
100K-270K Annually
Senior level
Build and maintain backend tools for AGI safety research: eval libraries, orchestration for parallel agentic evaluations, LLM proxy and telemetry, CI optimizations, data warehousing, and researcher-facing tooling. Lead feature development, collaborate with researchers, and promote good software design and reliability.
The summary above was generated by AI
Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable. 
 
ABOUT THE OPPORTUNITY
 
We’re looking for Backend Software Engineers who are excited to build tools for frontier AGI safety research, e.g. building and maintaining evals libraries and tools for monitoring and controlling our own LLM traffic.
 
REPRESENTATIVE PROJECTS
 
Here is a list of example projects which you might build and ship in your first 6 months.
 
- Internal tooling for efficiently running and analyzing evaluations. For example, a tool that quickly investigates thousands of agentic eval runs in parallel and surfaces interesting information automatically
- Automated evaluation pipelines to minimize the time from getting access to a new model for pre-deployment testing to analyzing the most important results and sharing them
- Orchestration tools that allow researchers to run thousands of agentic evaluations in parallel on remote machines with high security and reliability
- LLM proxy service that enables us to monitor all of our coding agent traffic in real time and identify undesired behavior automatically (in the spirit of Control)
- LLM agents and MCP tools to automate internal software engineering and research tasks, with sandboxes to prevent major failures
- CI pipeline optimisations to reduce execution time and eliminate flaky tests
- Telemetry API and instrumentation of our existing tools, allowing us to monitor usage and improve reliability
- Data warehousing pipeline and service to store thousands of eval transcripts which researchers can study and build datasets from
- Upstream improvements to the Inspect framework and ecosystem, e.g. support for evaluating modern agentic scaffolds.

KEY RESPONSIBILITIES

  • Rapidly prototype and iterate on internal tools and libraries for building and running frontier language model evaluations
  • Lead the development of major features from ideation to implementation
  • Collaboratively define and shape the software roadmap and priorities
  • Establish and advocate for good software design practices, codebase health, and coding agent practices
  • Work closely with researchers to understand what challenges they face
  • Assist researchers with implementation and debugging of research code
  • Communicate clearly about technical decisions and tradeoffs

KEY REQUIREMENTS

  • You must have experience writing production-quality python code
  • We value candidates from diverse backgrounds and recognise that candidates may demonstrate their skills in different ways.
  •  
    For example, we might be impressed if you have:
  • Led the development of a successful software tool or product over an extended period (e.g. 1 year or more)
  • Started and built the tech stack for a company, e.g in a start-up
  • Worked your way up in a large organisation, repeatedly gaining more responsibility and influencing a large part of the codebase
  • Authored and/or maintained a popular open-source tool or library
  • Placed in a prestigious programming competition (IOI, ICPC, etc.)
  • 5+ years of professional software engineering experience
  •  
    The following would be a bonus:
  • Experience working with LLM agents or LLM evaluations
  • Infosecurity / cybersecurity experience
  • Experience working with AWS
  • Interest in AI Safety
  •  
    We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.

LOGISTICS

  • Time Allocation: Full-time
  • Location: This is an in-person role working out of our London or San Francisco office. We offer flexible working hours and wfh arrangements.  
  • Visa sponsorship: We sponsor visas in both the UK and US. Sponsorship isn't guaranteed for every role or candidate, but if we make you an offer, we'll work with you to find the right visa route

BENEFITS

  • This role offers market competitive salary, equity, and competitive benefits.
  • Salary: 100k - 200k GBP (~135k - 270k USD)
  • Flexible work hours and schedule
  • Unlimited vacation
  • Unlimited sick leave
  • Up to 6 months of paid parental leave
  • Comprehensive health, dental and vision insurance
  • Retirement savings with competitive employer matching (e.g. 401(k) for US employees)
  • Lunch, dinner, and snacks are provided for all employees on workdays
  • Paid work trips, including staff retreats, business trips, and relevant conferences
  • A yearly $1,000 (USD) professional development budge

ABOUT APOLLO RESEARCH
 
The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks. At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g. building evaluations), the science of scheming (e.g. model organisms), and scheming mitigations (e.g. anti-scheming, and control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment or collaborate on scheming mitigations. At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.
 
ABOUT THE TEAM
 
The SWE team currently consists of Rusheb Shah, Andrei Matveiakin, Alex Kedrik, and Glen Rodgers. Beyond the SWE team, you will closely interact with the research scientists and engineers as the primary user group of your tools. You can find our full team here
 
Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.
 
INTERVIEW PROCESS
 
Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.
 
About the interview process: Our multi-stage process includes a screening interview, a take-home test (approx. 2 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews, we suggest working on hands-on LLM evals projects (e.g. as suggested in our starter guide), such as building LM agent evaluations in Inspect.

Similar Jobs

41 Minutes Ago
Hybrid
55K-83K Annually
Junior
55K-83K Annually
Junior
eCommerce • Information Technology • Retail • Industrial
Manage an 85-account territory of MRO customers through face-to-face selling. Build account and territory plans, maintain pipeline in CRM, grow sales, and deliver customer-focused solutions.
41 Minutes Ago
Hybrid
60K-89K Annually
Junior
60K-89K Annually
Junior
eCommerce • Information Technology • Retail • Industrial
Outside B2B field sales role managing ~85 medium customers (~$2M portfolio). Develop territory and account plans, build relationships, maintain pipeline, use CRM, conduct face-to-face meetings, negotiate and close sales, and travel occasionally with overnight stays.
Top Skills: Crm System
2 Hours Ago
Hybrid
Livermore, CA, USA
15-24 Hourly
Junior
15-24 Hourly
Junior
eCommerce • Fashion • Retail • Sales • Wearables • Design
Provide elevated, personalized luxury retail service to drive sales and client retention. Use mobile POS and clienteling tools, meet individual and store KPIs, support daily store operations (inventory, merchandising, transactions, pickups), maintain floor and stockroom standards, and participate in brand initiatives and training.
Top Skills: Clienteling ToolsIpadLaptopMobile PosPosShort-Form VideoSocial Selling PlatformsWalkie-Talkie

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account