Available Locations: Amsterdam or Remote Netherlands; Lisbon or Remote Portugal; London or Remote UK; Munich or Remote Germany
About the Department
Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behavior is and we are capable of determining and exposing anomalous behavior.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
About the role
We are looking for an Engineering Manager to join Cloudflare, specifically our Observability team, in charge of our internal Metrics and Alerting platform. You will lead a team of passionate, talented engineers that are building one of the largest metrics pipelines in the world processing over 2 billion time series across hundreds of different locations. You will play an active role in shaping our strategy and working with our customers to build the best developer experience. You will change the way people build applications.
You bring a passion for meeting business needs by building technical, innovative solutions. You excel to understand how big-picture goals inform technical details. You thrive in a fast-paced iterative engineering environment and have experience in delivering scalable distributed systems. Most importantly, you have a track record of having past teams respect you as both a technical leader and manager.
Examples of desirable skills, knowledge and experience
Experience leading a team and working across multiple teams to deliver results
Comfortable managing backend focused teams
Solid foundation in computer science and software engineering with strong competencies in software design, and building distributed systems
Excel at planning, creating teams and overseeing execution to meet commitments and deliver with predictability
Demonstrate a track record of managing a team including hiring, on-boarding, and professional development. You inspire your team to reach higher. You're as good as explaining "why" as you are "how"
Experience implementing tools, process, internal instrumentation, methodologies and resolving blockages
Comfortable managing teams/projects with tight deadlines and short release cycles
Operating knowledge of Prometheus, Thanos, Alertmanager and related infrastructure
Bonus Points
Understanding of server hardware, performance expectations and limitations, and failure domains
Deep Linux/UNIX systems knowledge
Managing contributions to large open-source projects