Zayo

Principal Site Reliability Engineer, Network Observability

Sorry, this job was removed at 08:19 p.m. (PST) on Friday, Jul 11, 2025

Remote or Hybrid

Hiring Remotely in United States

115K-164K Annually

Remote or Hybrid

Hiring Remotely in United States

115K-164K Annually

Similar Jobs

BuildOps

Solutions Architect

7 Minutes Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

100K-125K Annually

Senior level

100K-125K Annually

Senior level

Cloud • Mobile • Software

Lead discovery, design, configuration, testing, and validation of accounting integrations between BuildOps and customers' ERPs. Map GL/accounts/entities, build and execute test plans for AP/AR/POs/payments, reconcile data, troubleshoot discrepancies, document solutions, and advise customers on best practices to ensure scalable, accurate end-to-end syncs.

Top Skills: APIsBoomiBuildopsCeligoCsvErpExcelGoogle SheetsIpaasMulesoftNetSuiteQuickbooks OnlineSage IntacctSpectrumViewpoint VistaWorkato

Coinbase

Senior Site Reliability Engineer

13 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.

Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform

Coinbase

Senior Site Reliability Engineer

13 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.

Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform

Company Description

Zayo provides mission-critical bandwidth to the world’s most impactful companies, fueling the innovations that are transforming our society. Zayo’s 141,000-mile network in North America and Europe includes extensive metro connectivity to thousands of buildings and data centers. Zayo’s communications infrastructure solutions include dark fiber, private data networks, wavelengths, Ethernet, and dedicated Internet access. Zayo serves wireless and wireline carriers, media, tech, content, finance, healthcare and other large enterprises.

Do you dream in high scalable systems, thrive in fast-paced environments and enjoy tackling complex technical challenges? Are you passionate about diving into the details and making the most accurate and durable network observability systems? If so, then join our team as a Principal Site Reliability Engineer, Network Observability!

We're looking for a talented Principal Site Reliability Engineer, Network Observability to play a critical role in ensuring the uptime, performance, and scalability of our network with a focus on our network observability systems.

Responsibilities:

Automation: Work with the NOC and software engineering teams to discover processes around network observability that can be automated, and then create a technical plan to implement both the the technical and process changes.
Monitoring and Alerting: Work with the network observability team to design and implement effective monitoring and alerting to proactively identify and address issues.
Incident Management: Own the incident lifecycle, from leading root cause analysis and resolution to implementing preventative measures to avoid future occurrences. Focus on chronic and big picture issues that may have complex resolutions spanning departments, process, and technical elements.
Reliability Engineering: Proactively identify and mitigate potential system risks, focusing on automation, monitoring, and tooling to ensure high service availability.
Scalability and Performance: Design and implement solutions to ensure our infrastructure can handle ever-growing demands while maintaining optimal application performance and providing the best possible detail on service degradation and outages to the NOC. Have a laser focus on reducing mean time it takes for the NOC to correctly diagnose issues and automate troubleshooting and information collection.
Collaboration: Work closely with developers, product managers, and engineers to translate business needs into robust and reliable technical solutions. Become the beacon for best practices and efficient processes throughout the organization.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience.)
Minimum of twelve (12) years of experience in a Senior Network Engineer, Senior Site Reliability Engineer or related role.
Strong understanding of system administration, Linux, and proficiency in scripting languages (Python and various shells.)
Previous experience working both in a NOC and in an upper level network engineering role.
Exceptionally strong working knowledge of networking concepts and application protocols, especially TCP/IP, BGP, DNS, TLS, and HTTP/S and network services.
Expert at developing automation tools for monitoring, alerting, and deployment to ensure efficient and reliable operations.
Expert at designing and implementing monitoring systems at scale.
Experience with various monitoring platforms such as SevOne, Assure1,Prometheus, and Nagios and various vendor EMS/NMS systems.
Previous work in large scale distributed production environments.
Experience with a variety of cloud platforms and tools (AWS, Google, etc.)
Experience with a variety of monitoring and alerting tools (Grafana, Cacti, etc.)
Proven leadership skills, with the ability to mentor and inspire others.
Excellent problem-solving, analytical, and critical thinking skills.
A passion for automation and building efficient systems.
Expert experience working in a highly automated environment.

Preferred Experience:

Experience working with various vendor APIs (or netconf) including Nokia, Juniper, Fujitsu, Infinera, Cisco, and Ciena.
Experience with various network orchestration platforms such as Ciena Blue Planet MDSO, Cisco NSO, Nokia NSP, or others.
Experience automating network troubleshooting.

Estimated Base Salary Range: $114,900 - $164,200 USD/annually.

The base pay range shown is a guideline and reasonable estimate for this role. It takes into account the wide variety of factors that are considered in making compensation decisions. Actual compensation offered may vary from the posted range based upon geographic location, work experience, skill level, certifications, and other business and organizational needs. Non- sales roles may be eligible to participate in a discretionary annual incentive plan. Sales roles may be eligible to participate in a sales incentive plan.

Additionally, this position may be eligible for certain benefits, such as health insurance, life insurance, disability retirement plans, paid time off.

The posting will be active for a minimum of 3 days. The active posting will continue to extend by 3 days until the position is filled.

Benefits, Rewards & Wellness

Excellent Health, Dental & Vision Insurance
Retirement 401(k) Savings Plan
Generous paid time off policy including paid parental leave

Zayo provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws.

This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine