Krea Logo

Krea

Engineer, Supercomputing & Distributed Systems

Reposted 22 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA, USA
Entry level
In-Office
San Francisco, CA, USA
Entry level
The role involves designing and managing distributed systems infrastructure for AI workloads, optimizing data pipelines, and collaborating on machine learning projects.
The summary above was generated by AI

About Krea

At Krea, we are building next-generation AI creative tools.

We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it.

We believe AI is a new medium that allows us to express ourselves through various formats—text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium.

Supercomputing / AI Infra at Krea

We build and operate the infrastructure for Krea's research and inference. Distributed training, 1000+ K8s GPU clusters, petabyte scale data pipelines, etc. We build a lot of this from scratch — custom distributed datastores, job orchestration systems, and streaming pipelines that replace tools like Kafka and Ray for modern AI workloads at scale.

Example projects:

Distributed data systems

  • Design multi-stage pipelines that turn petabytes of raw data into clean, annotated datasets

  • Run classification models on billions of images

  • Deploy and combine LLMs to caption massive multimedia data

GPU infrastructure

  • Manage distributed training and inference on 1000+ GPU Kubernetes clusters

  • Solve orchestration and scaling for large-scale GPU job processing

  • Scale workloads and research between clusters in multiple datacenters

Distributed training

  • Profile and optimize dataloaders streaming thousands of images per second

  • Profile and debug InfiniBand networking on huge training runs

  • Build fault tolerance systems for large-scale pretraining

  • Collaborate with researchers on evolving RL infrastructure

Applied ML pipelines

  • Find clean scenes in millions of videos using distributed shot-boundary detection

  • Customize and train models to filter billions of images for questions like "is this a screenshot?"

  • Build the systems that bridge raw cluster capacity and research output

Who we're looking for:

Systems people. If you've read a blog post about InfiniBand debugging or building a custom distributed database and thought "I want to do that" — this is that team.

You'll spend your time working heavily with Python, Kubernetes, Torch, and data tools like DuckDB, Arrow, etc. It's OK if you don't have K8s or ML experience — the main thing we hire for is an intuition for distributed systems, and a great mental model of how systems interact and function under different conditions.

Strong candidates may have experience with…

  • Python, PyArrow, DuckDB, SQL, massive relational databases, PyTorch, Pandas, NumPy…

  • Kubernetes

  • Designing and implementing large-scale ETL systems

  • Fundamental knowledge of containerization, operating systems, file-systems, and networking

  • Distributed systems design

  • Distributed training systems (NCCL, InfiniBand, RDMA)

  • Streaming and event processing systems (Kafka, Pulsar, or similar)

  • PyTorch internals, custom dataloaders, and training infrastructure

Similar Jobs

18 Minutes Ago
Remote or Hybrid
2 Locations
212K-244K Annually
Mid level
212K-244K Annually
Mid level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead selection, implementation, and administration of marketing and sales technologies to drive growth and customer engagement. Manage and coach a team, execute digital marketing and creative campaigns, optimize marketing automation and Salesforce analytics, ensure data quality and validation, and partner with stakeholders to improve processes and deliverables from planning through completion.
Top Skills: Adobe Data CollectionAdobe Experience Manager (Aem)Adobe Martech PlatformsAnalytics InstrumentationCdpCRMDom ManipulationHTMLJavaScriptMarketing AutomationSalesforce Crm AnalyticsSalesforce Marketing CloudTypescriptWeb Sdk
18 Minutes Ago
Hybrid
5 Locations
77K-202K Annually
Senior level
77K-202K Annually
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead SAP compliance and security implementations using SAP GRC and BW/4HANA. Analyze client requirements, design controls, conduct audits, train users, remediate issues, and advise on governance and risk management to protect sensitive data and optimize operations.
Top Skills: Sap Bw/4HanaSap GrcSap Gts
18 Minutes Ago
Hybrid
63K-140K Annually
Junior
63K-140K Annually
Junior
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
The Epic Experienced Associate supports the design, configuration, implementation, and optimization of Epic software applications, working with clients and teams to improve operational efficiency.
Top Skills: Business ApplicationsEpic Software Solutions

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account