Software Engineer - Observability Infrastructure

Stripe (View all Jobs)


Please mention No Whiteboard if you apply!
I'm a one-man team looking to improve tech interviews, and could use any support! 😄

Interview Process

1. Programming/debugging phone screen 2. On-site with your own laptop/setup and full access to internet. Interviews include systems design, 45 min practical coding question, integrating an API exercise, debugging, and talking with hiring manager about team alignment.

Programming Languages Mentioned


Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies - from the world’s largest enterprises to the most ambitious startups - use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone's reach while doing the most important work of your career.

About the team

Stripe’s infrastructure powers businesses all over the world. We process payments, run marketplaces, detect fraud, help entrepreneurs start an internet business from anywhere in the world, build world-class developer-friendly APIs, and more. If you’re an infrastructure engineer here, you’ll get to build the systems that power our products.

The success of every single API request we process is critical to everyone involved! We can’t go down because our users’ businesses depend on us.

What you’ll do

You’ll be on a team that maintains a product we provide to the rest of engineering, like storage or message queueing. You’ll make decisions with a significant impact on Stripe. There is a lot of work to do to make Stripe engineers’ work easier and our platform even more reliable than it is today, and we’d love for you to be part of it. We’re close to the people using our systems, so we constantly get feedback that we can use to make them better. The team will help all of engineering—from the CTO to our interns—by identifying, creating and automating engineering practices, processes and software that will be leveraged by the whole organization to improve reliability.

You’ll work with other infrastructure engineers as well as product engineers who use the systems you’re building.


  • Develop the core interfaces and infrastructure used by all of Stripe’s engineering teams
  • Design automated fault detection infrastructure and systems that run in 24x7 mode with yearly downtime measured in minutes
  • Scale the observability infrastructure to support hundreds of terabytes of logs and hundreds of billions of metric data points daily
  • Debug issues and solve distributed systems challenges across services and levels of the stack
  • Build best-in-class developer tooling for people using your infrastructure

Projects you could work on:

We have a ton of important work to do, which is why we’re hiring! Our projects are of course changing all the time, but here are a few projects either that we’ve done in the past, so you can get an idea of the types of work we do. Technologies we use include: haproxy, nginx, consul, jenkins, signalfx, statsd, kafka, rabbitmq, storm, and many others.

  • Plan and implement multi-region availability for our distributed job queuing infrastructure! All of our systems can sustain losing machines, and making our systems even more resistant to failure is a big theme for us. If you like thinking about distributed systems, you might find a good home here!
  • Write easy-to-use and reliable client libraries for our Kafka or database systems. You’ll write abstractions and provide reasonable defaults around timeouts and error handling for a complex system.
  • Move us to a region with no downtime.
  • Build fantastic code review tools! If you love helping developers be more effective at their jobs, we have a ton of interesting projects in this area. Related projects: you could help us have better reproducible builds with Bazel and build great developer environments.
  • We have a bunch of projects around deploying and running code: help us instantly roll back bad deploys so that we can recover quickly, and build infrastructure that lets us scale up our API workers in seconds in response to high API load.
  • We need to scale our databases to handle 10x the load they can today. You could help us shard them more effectively, upgrade our database engines, and build great tools for developers so they can understand their slow queries more easily. A lot of our database projects are open source.

Who you are

We're looking for someone who meets the minimum requirements to be considered for the role. If you meet most of these requirements, you are encouraged to apply. 

Minimum requirements

    • Think about systems – their edge cases, failure modes and life cycles
    • Are comfortable operating infrastructure systems at scale
    • Wears every 9 of uptime as a badge of honor
    • Can debug complex problems across the whole stack
    • Focus on the needs of your users
    • Are able to write high quality code in a programming language (e.g. Ruby, Scala, Go)
    • Thrive in a high autonomy environment surrounded by unsolved problems
    • Worked with data pipelines moving around large sets of data, quickly
    • Managed an on-premise logging installation (e.g. Splunk, ELK),time series metric database (e.g. Prometheus, InfluxDB, M3DB), or distributed tracing infrastructure
    • Familiarity with writing eBPF filters and debugging performance problems

Please mention No Whiteboard if you apply!
I'm a one-man team looking to improve tech interviews, and could use any support! 😄

Get weekly alerts of new jobs from companies not using whiteboard interviews!