Database Reliability Engineer
Udemy is the leading global marketplace for teaching and learning, connecting students everywhere to the world’s best online classes. We are looking for a Senior Datastore Reliability Engineer to join our Datastore Infrastructure Engineering Team. With a commitment to innovation, we embrace automation and agile culture, love technical challenges and are eager to adopt new technologies and tools.
We are responsible for all aspects of MySQL, Redis Enterprise, Memcached, Cassandra and RabbitMQ datastores and proxies like ProxySQL and MCRouter across environments including production. Our primary tools are Terraform, Ansible, Atlantis, Datadog, Percona Management Tools, Github and Python.
We value teamwork, good humor, strong sense of ownership, technological curiosity, and a desire to learn. You will work with a wide range of relational and NoSQL data sources improving reliability, consistency and performance of our growing datastore ecosystem.
Here's what you will be doing:
- Analyze, improve and automate datastore maintenance flows, backup and recovery procedures, capacity management, and access monitoring
- Proactively respond to production infrastructure alerts and warnings, mitigate production issues as they arise and transform incident lessons into automation, documentation and monitoring
- Work with SRE and feature teams to review and deploy changes to production environment, advise on datastore availability and scalability policies and best practices
- Develop and enhance datastore production environment monitoring, observability and management capabilities using existing and new tools and platforms
- Manage replication and failover topology for MySQL, Memcached, Redis and RabbitMQ
- Perform code reviews and answer datastore related infrastructure questions
- Participate in On-Call rotation
We’re excited about you because you have:
- Passion for performance, observability, availability and scalability
- Expert-level knowledge, administration skills and hands on experience with one or more of the following datastores: MySQL, Redis, Cassandra, RabbitMQ, Memcached, Cassandra
- Solid software engineering skills with proficiency in at least one of programming languages like Python (preferred), Java or Go
- Comfortable with infrastructure automation and configuration management tools such as Terraform, Ansible or Puppet
- Experience with automated testing and continuous integration when dealing with infrastructure as code (e.g. Molecule, Atlantis)
- Experience with containers and container orchestrators such as Kubernetes or Mesos Marathon
- Good understanding of Linux/Unix fundamentals and debugging skills
- Experience designing, building and operating distributed data storage systems with hundreds of nodes
- 5+ years experience managing large-scale database systems in Cloud (AWS preferred) and/or hybrid environments
We are responsible for all aspects of MySQL, Redis Enterprise, Memcached, RabbitMQ datastores and proxies like ProxySQL and MCRouter across environments including production. Our primary tools are Terraform, Ansible, Atlantis, Datadog, Percona Management Tools, Github and Python. We value teamwork, good humor, strong sense of ownership, technological curiosity, and a desire to learn.
We believe anyone can build the life they imagine through online learning. Today, more than 50 million students around the world are advancing their careers and passions by exploring and mastering new skills on Udemy, and expert instructors are able to share their knowledge with the world. Through our global marketplace and our solutions for businesses and governments, we connect people everywhere with the skills they need for success in work and life. We’re a close-knit bunch that enjoys problem-solving and collaboration, and we share a serious belief in the power of learning and teaching to change lives. Udemy’s culture encourages innovation, creativity, passion, and teamwork. We also celebrate our milestones and support each other every day.
Founded in 2010, Udemy is privately owned and headquartered in San Francisco’s SOMA neighborhood with offices in Denver (Colorado), Dublin (Ireland), Ankara (Turkey), Gurugram (India), and São Paulo (Brazil).
UDEMY IN THE NEWS
Paid Paternity Leave Should be the Norm in the U.S.
Breakdown of Most In-Demand Skills for 2020—Finance, Marketing, Sales and Engineering
How Investing in Yourself Today Will Set You Up for Career Success Tomorrow
Feedback Isn’t the Problem, but the Way That We Deliver It Is Broken
Read Full Job Description