Senior Site Reliability Engineer (Remote)

Sorry, this job was removed at 4:23 a.m. (PST) on Tuesday, April 27, 2021
Find out who's hiring in San Francisco.
See all Developer + Engineer jobs in San Francisco
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Labelbox’s mission is to build the best products for humans to advance artificial intelligence. Real breakthroughs in AI are reliant on the quality of the training data. Our training data platform enables organizations to improve their machine learning models far quicker and more accurately. We are determined to build software that is more open, easier-to-use, and singularly focused on getting our customers to performant ML faster.


Current Labelbox customers are transforming industries within insurance, retail, manufacturing/robotics, healthcare, and beyond. Our platform is used by Fortune 500 enterprises including Allstate, John Deere, Bayer, Warner Brothers and leading AI-focused companies including FLIR Systems and Caption Health. We are backed by leading investors including Andreessen Horowitz, B Capital, Gradient Ventures (Google's AI-focused fund), and Kleiner Perkins.

Your Day to Day

  • Executing and actioning on an infrastructure roadmap, collaborating with team members across engineering, product, and design
  • Maintaining and improving our monitoring and alerting for both our SaaS and on-premises offerings
  • Managing log and metrics collection using tools such as ElasticStack, Datadog, and others
  • Maintaining and improving visibility into infrastructure and software services by incorporating logging and metrics into concise, useful, and informative dashboards
  • Enabling development teams to monitor, analyze, and manage their services
  • Building out tools or frameworks to improve the overall development experience
  • Identifying and measuring key performance metrics for our infrastructure and defining service-level objectives (SLOs)
  • Participating in our on-call rotation

About You

  • 4+ years of relevant experience in an SRE or DevOps role
  • Experience with modern Linux systems and running services in production
  • Experience managing infrastructure in a major public cloud (AWS, GCP, Azure)
  • Experience with Kubernetes or other container orchestration systems
  • Experience with CI/CD tools and technologies such as Codefresh, Jenkins, TeamCity, etc
  • Experience with and an understanding of complex distributed systems
  • Experience using statistical analysis and/or data science to extract meaningful insights from complex data
  • Experience working under Agile / Scrum methodologies

Bonus

  • Experience with automation tools and technologies such as shell scripting, Terraform, Helm, etc
  • Experience deploying, maintaining, and automating services in on-premises environments
  • Coding skills in languages such as Java or Golang
  • Experience with database technologies such as PostgreSQL, MySQL, or other RDBMS
  • Experience with other open source technologies such as Redis, Elasticsearch, and RabbitMQ
  • Experience with SOC 2, FedRAMP, HIPAA, and other compliance-related programs
  • Experience managing multiple Kubernetes clusters / clusters spanning multiple cloud providers
  • Advanced knowledge of infrastructure management in GCP


Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

510 Treat Ave., San Francisco, CA 94110

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about LabelboxFind similar jobs