Satori Analytics Logo

Satori Analytics

DevOps Engineer (GCP)

Posted 11 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in Greece
Mid level
Remote
Hiring Remotely in Greece
Mid level
Own and evolve GCP-based infrastructure for an AI evaluation platform: manage Terraform, GKE, databases, CI/CD, observability, secrets, and cost/reliability. Collaborate with backend, ML, and frontend teams to make deployments repeatable, secure, and reliable.
The summary above was generated by AI

Are you passionate about AI? 🤖

At Satori Analytics, we aim to change the world one algorithm at a time by bringing clarity to global brands through Data & AI. From cloud-based ecosystems for fintech to predictive models for airlines, our cutting-edge solutions cover the entire data lifecycle—from ingestion to AI applications.

As a fast-growing scale-up, our team of 100+ tech specialists—including Data Engineers, Data Scientists, and more—delivers innovative analytics solutions across industries like FMCG, retail, manufacturing and FSI. Join us as we lead the data revolution in South-Eastern Europe and beyond!

Together with a partnering company, we're looking for a a DevOps / Platform Engineer to own and evolve the infrastructure that keeps this platform reliable (AI agent evaluation platform), observable, secure, and fast to ship to. You'll work closely with backend, ML, and frontend engineers to make deploying and operating services boring, repeatable, and safe.

What Your Day Might Look Like:

  • Cloud infrastructure as code: Own and extend our Terraform estate across multiple GCP environments (base, core, obs, dev, test, prod), including GKE clusters, Cloud SQL (Postgres/MySQL), networking, buckets, and IAM. Drive the in-progress "Neo" platform rollout and the cutover/retirement of legacy infrastructure.
  • Kubernetes & containers: Manage workloads on GKE, maintain Dockerfiles and Helm-style application configs for ~10 backend services, and tune autoscaling, resource limits, and pod disruption budgets.
  • Maintain and improve our GitHub Actions pipelines: PR checks (Python/JS lint, type-check, tests), Terraform prechecks, image builds and pushes, auto-deploy, and DB-migration labelling/gating. Reduce build times and flakiness, and make deploys self-service for product teams.
  • Data & messaging infrastructure: Operate Postgres, Redis, and Celery-based async workers; manage Alembic migrations, queue health, and backpressure for long-running simulation jobs.
  • Observability: Own our monitoring stack — Grafana dashboards, ClickHouse, Langfuse (LLM tracing), and Celery queue metrics. Build alerting and SLOs so we catch issues before customers do.
  • Security & secrets: Manage secret distribution, least-privilege IAM, and remediation tracking. Partner with engineering on findings in our security assessment process.
  • Cost & reliability: Keep an eye on cloud and LLM-proxy (LiteLLM) spend, right-size resources, and improve resilience of the simulation and evaluation pipelines.

You'll work with:

  • Cloud: Google Cloud Platform (GKE, Cloud SQL, GCS, IAM); some AWS / IBM footprint
  • IaC: Terraform (>= 1.14), multi-environment root modules
  • Containers/orchestration: Docker, docker compose (local), Kubernetes / GKE
  • CI/CD: GitHub Actions
  • Backend: Python 3.13+ (managed with uv), Celery, FastAPI-style HTTP APIs; Node/Express services
  • Data: PostgreSQL, MySQL, Redis, ClickHouse
  • Observability: Grafana, Langfuse, custom Celery metrics
  • LLM infra: LiteLLM proxy

Requirements

Your Superpowers 🚀

  • 3+ years in DevOps / SRE / Platform Engineering, or strong backend experience with heavy infra ownership.
  • Solid hands-on Terraform (modules, state, multi-environment) and cloud experience (GCP preferred; AWS/Azure transferable).
  • Production Kubernetes experience: deployments, services, autoscaling, debugging pods, rollouts/rollbacks.
  • Strong Docker fundamentals and comfort writing/optimising Dockerfiles.
  • CI/CD pipeline design and maintenance (GitHub Actions, or equivalent like GitLab CI / CircleCI).
  • Comfortable scripting and reading code in Python and/or Bash; able to navigate a polyglot monorepo.
  • Operational experience with relational databases and managed database services (migrations, backups, performance).
  • A reliability mindset: monitoring, alerting, incident response, and writing runbooks.

Bonus points for:

  • Experience operating Celery / distributed task queues and Redis at scale.
  • Familiarity with LLM/AI infrastructure (model proxies, GPU scheduling, token/cost management).
  • Observability tooling depth (Grafana, Prometheus, ClickHouse, OpenTelemetry, Langfuse or similar tracing).
  • Security/compliance experience (IAM hardening, secret management, vulnerability remediation).
  • Cost-optimisation experience for cloud + third-party API spend.
  • Experience supporting a monorepo with multiple language ecosystems and editable/internal package dependencies.

Benefits

Perks on Perks

  • Competitive salary.
  • Training budget to level up your skills from top tech partners like Microsoft, AWS, Salesforce, and Databricks – whether it’s certifications or courses, we’ve got you covered.
  • Private insurance, top-tier tech gear, and the chance to work with a stellar crew.

Ready to create some data magic with us? Hit that apply button and let’s get started. ✨

Similar Jobs

3 Days Ago
Remote
Mid level
Mid level
Information Technology • Software
Build, automate, and secure a GCP-centric cloud platform. Own CI/CD, Terraform IaC, GKE, and Airflow orchestration. Collaborate with development, security, and data teams to operate production cloud infrastructure and ensure secure deployments.
Top Skills: Apache AirflowAudit LoggingBashCi/CdGCPGkeKubernetesPowershellPythonRbacSecrets ManagementTerraform
11 Hours Ago
Remote
United States
211K-316K Annually
Senior level
211K-316K Annually
Senior level
Artificial Intelligence • Productivity • Software • Automation
As a Staff Engineer for Revenue, you'll shape technical vision and architecture for billing and pricing systems, ensuring correctness while enhancing cross-team collaboration.
Top Skills: APIsBilling SystemsPerformance OptimizationSubscription Management
11 Hours Ago
In-Office or Remote
Senior level
Senior level
Artificial Intelligence • Cybersecurity
As a Senior SRE, ensure reliability and performance of cloud infrastructure, manage incident response, implement monitoring, and drive continuous improvements.
Top Skills: ArgocdAws EksElk StackGithub ActionsGrafanaKubernetesOpsgeniePagerdutyPrometheusTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account