Senior Site Reliability Engineer - Platform
PagerDuty (View all Jobs)
1. Zoom / on-site pair programming and tasks
Programming Languages Mentioned
We are interviewing and onboarding 100% virtually at this time. PagerDuty is focused on inclusion and employee well-being by building a culture that isn’t location specific and gives equal opportunity to everyone—regardless of where you are working. Unless your job requirements make it necessary to be in a company office, you may choose to work in-office, remotely, or hybrid.
The SRE Platform team is a high impact and exciting team responsible for delivering the core infrastructure that powers hundreds of microservices for all of PagerDuty engineering. We build solutions that accelerate developer productivity, improve reliability and help PagerDuty scale for today and tomorrow. If you’re passionate about Kubernetes and building solutions that developers will love, come join our SRE Platform team!
How You Contribute To Our Vision: Key Responsibilities
- You deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
- You help maintain the overall health of the platform including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
- You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform
- You continuously strive to improve the developer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring
- You stay current on technical trends in order to suggest innovative tools and approaches to interesting problems
- You share your expertise with the entire Engineering organization
- You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules
About You: Skills and Attributes
- You have solved multiple problems by writing code to automate your way out of them and have a passion for replacing manual processes with your code
- You have been responsible for running critical services that multiple customers depend upon. You understand the importance and impact that operational optimization can have on a product and the positive ripple effects that it can have across an entire organization
- You are empathetic: You take others’ opinions into account and clearly communicate your thoughts to reach solutions quickly
- You consider it important to understand and appreciate your customers, and enjoy seeing your work improve the work of others
- You love to teach and take vicarious pleasure from the success of those you have taught or mentored
- Knowledge of a dynamic language like Ruby or Python
- Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
- Experience managing multiple Kubernetes clusters in a production environment
- Experience deploying web applications on Kubernetes (Helm, ArgoCD, etc…)
- Experience with infrastructure as code (Terraform or CloudFormation)
- Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk, etc.)
- Knowledge of configuration management systems like Ansible, Chef or Puppet
- Experience in automating releases, continuous integration/delivery systems and relevant tools (e.g. Jenkins, CircleCI, Travis CI, Buildkite, etc.)
Not sure if you qualify?
Apply anyway! We extend opportunities to a broad array of candidates, including those with diverse workplace experiences and backgrounds. Whether you're new to the corporate world, returning to work after a gap in employment, or simply looking to transition or take the next step in your career path, we are excited to connect with you.
We are dedicated to providing a culture where our people are happy, enabled and inspired to do their best. One of the ways we do this is by developing a comprehensive total rewards approach that supports employees and their loved ones. As a global organization, our programs are competitive with industry standards and aligned with local laws and regulations.
Your package may include:
- Competitive salary and company equity
- Comprehensive benefits package from day one
- ESPP (Employee Stock Purchase Program)
- Retirement or pension plan
- Paid parental leave - up to 22 weeks for pregnant parent, up to 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)
- Generous paid vacation time
- Paid holidays and sick leave
- Paid employee volunteer time - 20 hours per year
- Bi-annual company-wide hack weeks
- Mental wellness programs
- Dutonian Wellness Days - scheduled company-wide paid days off in addition to PTO and scheduled holidays
- HibernationDuty - a week each year when everyone at PagerDuty, with the exception of a small, coverage crew, is asked to take a much needed break to truly disconnect and recharge
PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management, serving over 14,000 customers and 850,000 users worldwide, including 65% of the Fortune 100.
For the teams who build and run digital systems, PagerDuty is the best way to manage the urgent, mission-critical work that is essential to keeping digital services always on. We make it easy to handle any unplanned task, event, or opportunity, right away.
Led by CEO Jennifer Tejada, 50% of our board of directors is comprised of women, 45% of our managers are from underrepresented groups, and we are a proud member of the Pledge 1% Movement, committed to donating 1% Equity, 1% Employee time, and 1% Product to accelerate change in our communities. We are Great Place to Work-certified™ and our product is top rated in its category on TrustRadius.
From how we build our teams to who sits in the boardroom, we hope you can see yourself at PagerDuty.
PagerDuty is committed to creating a diverse environment and is an equal opportunity employer. PagerDuty does not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, parental status, veteran status, or disability status.
PagerDuty is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application process. Should you require accommodation, please email firstname.lastname@example.org and we will work with you to meet your accessibility needs.
PagerDuty verifies work authorization in accordance with the requirements of your local jurisdiction.
Please mention No Whiteboard if you apply!
I'm a one-man team looking to improve tech interviews, and could use any support! 😄