Pulse (runpulse.com) Logo

Pulse (runpulse.com)

Software Engineer, Inference

Reposted 6 Days Ago
In-Office
San Francisco, CA, USA
150K-230K Annually
Mid level
In-Office
San Francisco, CA, USA
150K-230K Annually
Mid level
Develop and optimize low-latency inference services for OCR and multimodal models, focusing on performance engineering and model serving. Implement autoscaling and capacity planning while building performance dashboards.
The summary above was generated by AI

Overview


Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail.

We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly.

What makes our tech special is our multi-stage architecture:

  • Layout understanding with specialized component detection models

  • Low-latency OCR models for targeted extraction

  • Advanced reading-order algorithms for complex structures

  • Proprietary table structure recognition and parsing

  • Fine-tuned vision-language models for charts, tables, and figures

If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence.

What we are looking for

  • 5 days in-office at our San Francisco office

  • Eager to learn and adapt quickly

  • Prior startup or founding experience is a plus

What we are looking for

  • 5 days in-office at our San Francisco office

  • Eager to learn and adapt quickly

  • Prior startup or founding experience is a plus

About the Role
Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments.

Responsibilities

  • Build inference services with smart batching and caching

  • Optimize kernels, tokenization, and model graphs

  • Evaluate vLLM, TensorRT LLM, and Triton tradeoffs

  • Implement autoscaling and admission control with clear SLOs

  • Own performance dashboards and capacity planning

Requirements

  • 3+ years in performance engineering or ML systems

  • Strong Python, plus C++ or CUDA exposure

  • Experience with GPU profiling and model serving

Nice to have

  • Experience reducing p95 and cost in production ML systems

Sponsorship
Sponsorship available.

Compensation and benefits
Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage.

HQ

Pulse (runpulse.com) San Francisco, California, USA Office

San Francisco, California, United States

Similar Jobs

Yesterday
In-Office
Sunnyvale, CA, USA
Senior level
Senior level
Artificial Intelligence
Lead design and implementation of a globally distributed inference orchestration platform. Drive platform direction, reliability, performance, and production incident leadership. Write and review production code, make high-consequence architectural decisions, and mentor senior engineers while partnering with ML, product, infrastructure, and cloud teams.
Top Skills: C++CertificatesCi/CdCloudGoGpu-Accelerated WorkloadsKubernetesKubernetes CrdsKubernetes OperatorsLoggingMl Inference InfrastructureModel Serving SystemsMtlsObservability (MetricsSlosTlsTracing)
2 Days Ago
In-Office
Palo Alto, CA, USA
135K-185K Annually
Junior
135K-185K Annually
Junior
Aerospace • Other
Design, build, and optimize a high-performance, highly-available LLM inference platform. Work across the stack from distributed infrastructure (load balancing, autoscaling, batching, caching) to low-level GPU/kernel optimizations, tooling, CI/CD, SDKs, and observability to deliver reliable, low-latency inference for internal SpaceX applications.
Top Skills: Build SystemsC++Ci/CdClickhouseDockerGoGrpcKubernetesMongoDBMonitoringPostgresProfilingPythonRustSglangTensorrt-LlmTritonVllm
7 Days Ago
Hybrid
San Francisco, CA, USA
Senior level
Senior level
Artificial Intelligence • Cloud • Generative AI • Infrastructure as a Service (IaaS)
Design, implement, and optimize GPU kernels, kernel compiler, memory planner, and runtime for low-latency generative AI inference. Analyze performance bottlenecks across hardware and software, collaborate with infrastructure teams, and maintain production profiling, benchmarking, and validation tooling while supporting new model architectures and multi-GPU strategies.
Top Skills: BenchmarkingC++Compiler InfrastructureDiffusion ModelsDistributed InferenceGpu KernelsKernel CompilerMulti-GpuProfilingPythonRuntime SystemsTransformer Models

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account