Site Reliability Engineering - Senior Observability Engineer

| Remote
Sorry, this job was removed at 4:00 a.m. (PST) on Friday, March 19, 2021
Find out who’s hiring remotely
See all Remote jobs
Easy Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Summary

This position involves critical duties and responsibilities that must continue to be performed during crisis situations and contingency operations, which may necessitate extended hours of work. In this position your responsibility will include system design, configuration, deployment and operations of Observability systems and tools. These systems include monitoring of services and infrastructure, log collection and analytics, and application performance monitoring (APM). Together these systems and tools serve as a critical part of Addepars Cloud infrastructure services. The ideal candidate will need to have a strong cloud platform, linux systems and automation experience, and knowledge of running workloads at scale.


Responsibilities

  • Deliver customer-centric service and monitor solutions using agile and good risk management processes
  • On-board applications to Enterprise Observability tools and services across Addepar to enable monitoring and alerting with best practices and standard Application Performance Monitoring (APM) tools
  • Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Addepars Platform as a Service infrastructure
  • Establish customized monitoring dashboards, thresholds, alerting to fully enable Addepar application and support teams
  • Verify program monitoring tool configuration by overseeing the configurations to ensure proper monitoring is in production
  • Maintain, develop and report on metrics relative to Critical Incident Response Team activities for monthly business and flash reporting
  • Serve as a subject matter expert on Application Performance Monitoring (APM), logging, and other observability & visualization tools
  • Performance expert consultation and training services to our application development and platform support partners
  • Understand complex application frameworks and flows to assist with developing and implementing custom monitoring solutions
  • Architect reliable ELK logging clusters, Sensu deployments and open-telemetry compliant distributed tracing solutions
  • Work with a team of experienced engineers to test your ideas and understand the system, and mentor junior team members
  • Build and maintain successful relationships with existing and prospective members ensuring end-to-end observibility of Addepars platform
  • Excellent problem solving and critical thinking skills, and ability to function and communicate under pressure
  • Participate and lead efforts involving incident response and root cause analysis for Site Reliability Engineering
  • On-call rotation support
  • Demonstrate outstanding communication, flexibility, teamwork and leadership
  • Participate, present and speak to KPI’s metrics and uptime performance data in management and executive level debriefs.


Knowledge & Skills

  • 10+ years of experience in software engineering.
  • Developing pipelines using CICD tools: Github Actions and/or Jenkins is a plus.
  • Linux administration.
  • Public cloud providers: AWS, Azure.
  • Docker
  • Kubernetes
  • Programming experience in Java, C++, Python, Go, or Deep experience managing large-scale software and distributed systems and environments.
  • An understanding of and experience with, web application development.
  • A solid foundation in computer science, with competencies in data structures, algorithms, and software design practices.
  • Understand database design, caching, scalability, and network fundamentals.
  • 5+ years of experience with Docker, Kubernetes, Sensu, Prometheus, or other CNCF software is a big plus.
  • An understanding of and experience with incident alerting platforms, Pagerduty and Blameless
  • BS, or MS degree in Computer Science or related technical field or equivalent industry experience.
  • An understanding of and experience in, Product/Project management and issue tracking systems, Jira, Smartsheets, Aha



Addepar is a wealth management platform that specializes in data aggregation, analytics and reporting for even the most complex investment portfolios. Founded in 2009 by Joe Lonsdale, who currently serves as an active Chairman of its Board of Directors and General Partner at 8VC, the company's platform aggregates portfolio, market and client data all in one place. It provides asset owners and advisors a clearer financial picture at every level, allowing them to make more informed and timely investment decisions. Addepar works with hundreds of leading financial advisors, family offices and large financial institutions that manage data for over $2 trillion of assets on the company's platform. In 2020, Addepar was named as a Forbes Fintech 50 company and honored as a member of the CB Insights Fintech 250. Addepar is headquartered in Silicon Valley and has offices in New York City and Salt Lake City. All brokerage services offered through Acervus Securities Inc., member FINRA / SIPC.


Addepar is proud to be an equal opportunity employer. We seek to bring together diverse ideas, experiences, skill sets, perspectives, backgrounds, and identities to drive innovative solutions. We commit to promoting a welcoming environment where inclusion and belonging are held as a shared responsibility.

 

In order to ensure the health and safety of all Addepeeps and our prospective candidates, we have instituted a virtual interview and onboarding experience.

Read Full Job Description
Easy Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

787 Castro St., Mountain View, CA 94041

Similar Jobs

Easy Apply
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about AddeparFind similar jobs