Site Reliability Engineer - Remote
Litmus (View all Jobs)
1. General technical questions 2. Take-home code challenge 3. Discussion, on-site programming session, meet & greet with the team
Programming Languages Mentioned
Ruby, SQL, C#
Who is Litmus?
Litmus’s email creation, testing, and analytics platform empowers marketers, designers, and agencies to confidently deliver quality, brand-aligned communications that delight and engage consumers. Over the last 10+ years, we’ve built a reputation as thought leaders and one of the most trusted solutions in the industry, and it shines through our customers – all 250,000 of them, representing the world’s top brands in tech, banking & finance, retail, media, and more.
And with our newest acquisition of Kickdynamic, an AI-driven content automation solution, we are now leaders in email marketing personalization.
How do we do this? It all starts with our people, and a core belief that the talent we seek should be a culture add rather than a “culture fit”.
What would I do at Litmus?
We’re looking for a remote Site Reliability Engineer to join our SRE team, working on automating, improving and managing our infrastructure. Our SRE team is a small-but-mighty, remote team; if you have experience with AWS and Terraform -even if you're new to SRE- we want to hear from you!
A typical day for one of our SREs might include:
- Improving Terraform module functionality to better support the needs of other software engineering teams
- Assisting software engineering teams with defining their infrastructure in Terraform
- Supporting software engineering teams with Incident Response
- Writing or improving Packer templates and Ansible roles
- Crafting software solutions to eliminate toil, for ourselves or other engineering teams
- Writing Postmortems for incidents
- Implementing and monitoring observability metrics for services, creating and tuning alerts that center around SLOs and customer impact
- Planning and execution of major migrations and upgrades
- Participating in an on-call rotation for responding to our rare customer-impacting production incidents
What’s it like to work in Site Reliability Engineering at Litmus?
You’d work alongside a team of smart, curious people working on challenging problems. Within SRE we work closely with our developer teams which are populated with some amazing people. As a larger engineering organization and locally on our team, we are thoughtful about our work and our culture, and we work together to enable each other, and our engineering peers to do our best work.
Our Ecosystem is broad, but centers around Ruby on Rails and .NET on the development side of the house, backed by database technologies like MySQL and Postgres running in AWS RDS and increasingly leveraging Aurora. Our SRE team primarily uses Bash and Go, and drives automation with a toolchain that includes Terraform and Ansible.
What can I expect in the first 60 days?
In your first week, you will:
- Receive your work computer, credentials, and anything else you might need to get started
- Set up your new laptop the way you like it and help contribute any improvements you spot back to our onboarding documentation
- Be introduced to the broader SRE team, including a specific partner who will help you get oriented and give you the context you need to succeed
- Walk through the infrastructure as well as the various tools and software we use
- Connect via regular stand-ups and retrospectives with SRE
In your first month, you will:
- Become familiar and comfortable with our processes
- Meet the other development teams and work alongside other members of SRE on small projects that will enable you to make the necessary connections for the work
- Learn about our long-term goals that help guide the day-to-day decisions about prioritization that we make individually
- Deploy impactful improvements to our infrastructure
- Experience wins. You’ll be set up with some smaller pieces of work to help you find your feet and grow your confidence in how our environment works
After your first couple of months, you will:
- Start to feel more at home, with a good idea of what the most pressing work is, and where you can have the most impact for your peers inside and outside of SRE, as well as our customers
What are we looking for in a candidate?
- You have a reasonable proficiency with the core AWS platform (IAM, Compute, Route53, RDS, Lambda, and S3 in particular). The majority of our infrastructure is in AWS, and it will be the focus of most of your work. Most of our services are straightforward EC2 stacks, so you don’t need to be a wizard, but you will want to have a degree of familiarity.
- You are proficient at writing Terraform, particularly modules and working with the AWS provider. We make heavy use of Terraform and it is a core part of our day-to-day work. This is a core competency and very little happens without touching Terraform so you will need to know how to work with it.
- You have experience with some configuration management tooling (such as: Chef, Ansible, Puppet, Salt, etc). We use Ansible, which isn’t difficult to pick up if you’re already familiar with the space.
- You are motivated to stamp out toil. This is a small team, and it's critical that we are always looking for places to automate away repetitive tasks.
- You have experience with at least one programming language, preferably Go, .NET or Ruby. This one is not a hard requirement, but software engineering experience is a definite plus.
- In whatever automation or programming languages you know, you write clean thoughtful code. The better we are at this the more enjoyable it is for everyone.
- You have an appreciation for communicating and collaborating asynchronously. We work with developers in many different time zones, and often this requires patience and planning.
- You have the initiative and drive to manage your own time and workload. We work together to define our priorities as a team, but your day-to-day work and prioritization will be managed by you.
- You have a security mindset with an urge to ensure the safety and security of the data of our customers and peers.
Why should I choose Litmus?
- We offer everything you’d expect from a profitable company that’s been going strong for 10+ years, including a great salary and stock options, comprehensive health care benefits, and a generous retirement plan match
- You’ll receive 28 days of paid vacation—on top of team retreats and public holidays
- A platform for good: Affinity Groups, a culture of Diversity, Equity & Inclusion, and volunteer days—creating belonging for all is in our DNA both inside and outside of work #bebeyondlitmus
- Remote-friendly culture. No matter where you are, you’ll feel connected to the team
- Over half of our employees work remotely in the U.S. and UK and your work experience is just as exciting, entertaining (!), and engaging
- We take family seriously and offer flexible schedules and generous parental leave programs
- We give you great tools and tech to do your best work: Hardware, software, and home- and office setups
Not sure if you meet all the requirements? Please apply! We know there is no job description that can measure a person’s attitude, aptitude, or amplitude (the ability to turn it up a notch) and highly encourage you to apply.
Our approach is shaped by a strong respect for each individual. This applies to every aspect of employment – from equitable wages, work-life balance, the freedom to be your whole self, to equal opportunities for growth and development at Litmus. We believe wholeheartedly the more inclusive we are, the better our work will be.
Please mention No Whiteboard if you apply!
I'm a one-man team looking to improve tech interviews, and could use any support! 😄