Staff Software Engineer, Machine Learning Infrastructure
Stripe (View all Jobs)
1. Programming/debugging phone screen 2. On-site with your own laptop/setup and full access to internet. Interviews include systems design, 45 min practical coding question, integrating an API exercise, debugging, and talking with hiring manager about team alignment.
Programming Languages Mentioned
Who we are
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP o the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the team
The Machine Learning Infrastructure organization provides infrastructure and support to run machine learning workflows and ship to production, tooling and operational capacity to accelerate the use of these workflows, and opinionated technical guidance to guide our users onto successful paths.
What you’ll do
You will work closely with machine learning engineers, data scientists, and platform infrastructure teams to build the powerful, flexible, and user-friendly systems that substantially increase ML-Ops velocity across the company.
- Create long term technical vision for the org, and identify paths to deliver value in shorter term phases
- Building powerful, flexible, and user-friendly infrastructure that powers all of ML at Stripe
- Designing and building fast, reliable services for ML feature engineering, model training and model serving, and scaling that infrastructure across multiple regions
- Creating services and libraries that enable ML engineers at Stripe to seamlessly transition from experimentation to production across Stripe’s systems
- Pairing with product teams and ML engineers to develop easy-to-use infrastructure for production ML models
- Collaborate with stakeholders across the organization including dependency engineering teams, product, design, infrastructure, and operations
Who you are
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
- Over 10 years of experience building software applications in large scale distributed systems
- Over 4 years of experience building Machine Learning Infrastructure
- A strong sense of curiosity and a desire to both learn and share knowledge with your peers. We like to work in a collaborative environment and hope you do too.
- A solid engineering background and experience with infrastructure and/or distributed systems. You’ll work mostly in Python, Java and Scala but we care more about your general engineering skills than your knowledge of a specific language.
- Familiarity with the full life cycle of software development, from design and implementation to testing and deployment.
- Experience with building and maintaining high availability, low latency systems, especially with respect to reliability, testing, and observability.
- A sense of pragmatism: you know when to aim for the ideal solution and when to adjust course.
- Experience optimizing the end-to-end performance of distributed systems.
- Experience designing and implementing data processing systems using the lambda architecture.
- Experience debugging and optimizing large scale data pipelines using Apache Spark.
- Experience training and shipping machine learning models to production to solve critical business problems.
Please mention No Whiteboard if you apply!
I'm a one-man team looking to improve tech interviews, and could use any support! 😄