Blackbird.AI Logo

Blackbird.AI

Staff Data Engineer

Posted 2 Days Ago
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
As a Staff Software Engineer, you'll architect and scale data platforms, develop data ingestion pipelines, and integrate AI/ML to enhance data processing capabilities.
The summary above was generated by AI

Blackbird.AI helps organizations discover emergent threats and stay one step ahead of real-world harm through our AI-powered Narrative and Risk Intelligence Platform. Our commitment is to prioritize safety and security, providing the tools to identify potential risks and ensure a safer environment proactively. No matter the job or where it's located, we're all connected by a shared vision: To lead and enhance the landscape of risk intelligence.
As a Staff Data Engineer, you will play a critical role in architecting and scaling our data platform and AI/ML processing infrastructure. You'll be a technical leader responsible for our entire data ecosystem—from ingestion pipelines that process diverse data sources to the lakehouse architecture that powers our narrative analysis capabilities. You'll architect systems that seamlessly support batch and streaming data patterns while building real time alerting on generated insights.

You'll work at the intersection of data engineering, AI-powered data transformation, and platform engineering, making architectural decisions that will shape our ability to detect misinformation, disinformation, and narrative attacks at scale while managing costs effectively. A key aspect of this role involves building intelligent pipelines that use traditional AI and generative AI to cluster, enrich, classify, and extract insights from data as it flows through our system.

As a Staff Data Engineer you will:

  • Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion
  • Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers
  • Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence
  • Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data
  • Work with AI/ML researchers to implement, integrate and scaling AI processing
  • Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services
  • Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability
  • Design and implement data quality frameworks, monitoring, and alerting systems
  • Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability
  • Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage
  • Make critical build-vs-buy decisions and establish architectural standards for the data organization
  • Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing

Requirements
  • 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
  • Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
  • Strong experience building and operating data pipelines at scale (handling TBs+ of data)
  • Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
  • Proficiency in Python, DBT, and SQL for data processing and pipeline development
  • Experience with both batch and streaming large scale data processing patterns
  • Strong understanding of cloud platforms (AWS, Azure)
  • Excellent communication skills and ability to mentor engineers

Preferred Qualifications:

  • Experience designing both batch and streaming/near real-time data architectures
  • Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
  • Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
  • Experience with Agentic AI, context engineering, and evaluation
  • Background in trust & safety, security, or content moderation domains
  • Experience with data observability tools and building comprehensive monitoring systems
  • Prior experience at a startup or fast-paced environment
  • Apply agentic coding tools for day to day development
  • Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases

What We Value:

  • Technical Excellence: You write clean, maintainable code and make thoughtful architectural decisions
  • Pragmatism: You balance perfection with shipping and know when to optimize vs. when "good enough" is sufficient
  • Ownership: You take end-to-end responsibility for your systems and their reliability
  • Collaboration: You elevate those around you and thrive in a team environment
  • Impact Orientation: You focus on outcomes and business value, not just technical elegance
  • Learning Mindset: You stay current with evolving technologies and continuously improve your craft

We've outlined specific skills, experience, and requirements for this position, but don't stress if you don't meet every single one. Our Talent Team is dedicated to discovering exceptional individuals, and they might identify a relevant aspect of your background that suits this role or another opportunity within Blackbird.AI.
If you have passion for the role, please still apply.


Benefits
  • Competitive compensation package, 401(k), and equity - everyone has a stake in our growth!
  • Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - an apple a day doesn't always keep the doctor away!
  • Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
  • A flexible work environment with opportunities to collaborate with your team in person - you can have it all!
  • Inclusion and Impact - soar to new heights!
  • Professional development stipend - never stop learning!

Top Skills

Spark
AWS
Azure
Data Lakehouse Architectures
Databricks
Dbt
Elasticsearch
Python
SQL

Blackbird.AI San Francisco, California, USA Office

325 Pacific Avenue, San Francisco, CA, United States, 94111

Similar Jobs

3 Days Ago
In-Office or Remote
Atlanta, GA, USA
160K-230K Annually
Senior level
160K-230K Annually
Senior level
Fintech • Gaming • Mobile • Sports • Esports
As a Staff Data Engineer, you will build and maintain data pipelines, collaborate with data teams, and ensure data infrastructure efficiency.
Top Skills: AirflowAlloydbArgoBigQueryBigtableCloud ComposerData StreamDatadogDataprocElk StackFastapiGCPGitGoGoogle Cloud Deployment ManagerGrafanaHevoMaterializeNoSQLPostgresPrometheusPythonRedisSparkSQLStreamlitTerraform
3 Days Ago
Remote or Hybrid
New York, NY, USA
150K-185K Annually
Senior level
150K-185K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
The Senior Staff Data Engineer will design and build scalable data pipelines, manage data integration, and collaborate with teams on machine learning and analytics applications, focusing on generative AI and large language models.
Top Skills: AWSAzureGCPPythonSnowflakeSQL
3 Days Ago
Remote or Hybrid
New York, NY, USA
130K-170K Annually
Senior level
130K-170K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
We are seeking a Staff Data Engineer to build data pipelines and applications using emerging technologies like AI and LLMs, focusing on integration, performance, and cross-functional collaboration.
Top Skills: AWSAzureGCPGenerative AiHiveLangchainLarge Language ModelsPythonSnowflakeSparkSQL

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account