Join the team building Caffeine.ai, the world's first platform for the "self-writing internet." We are on a mission to revolutionize how software is created by enabling anyone to build full-stack, on-chain applications through natural language. Our team is a cross-functional group of engineers dedicated to making this vision a reality, building the infrastructure that powers a new era where creating on the web is as simple as a conversation.
About the Role
As an experienced Senior Site Reliability Engineer, you will be a cornerstone of the caffeine.ai application's success. Your primary focus will be ensuring the rock-solid availability, performance, and scalability of our user-facing products and the complex microservices that power them. You will work with our engineering, infrastructure, and security teams to bake reliability and operability into the product from the start.
This is a hybrid role based in our Zurich office, with a requirement of 3+ days in the office per week.
Responsibilities:
- Select, design, build, deploy, and maintain the services required to ensure the high availability of the caffeine.ai application.
- Implement observability tools to ensure visibility into service stability and performance.
- Identify opportunities to automate or improve processes and then implement that automation.
- Participate in design and code reviews to bake reliability and operability into the product from the start, identifying risks and proposing mitigations.
- Coordinate incident response across multiple teams—clearly understanding and communicating what is going on, next steps, and who is responsible for what.
- Operate, troubleshoot, and deploy software to Unix systems.
- Participate in an on-call rotation for production services, with a primary responsibility of coordinating the incident response. On-call work is compensated with generous time off.
Requirements:
- Proven experience in a Site Reliability Engineering role with a focus on product and microservices architectures.
- Expertise in observability and monitoring of applications and services using tools such as Prometheus/Grafana and ELK logging.
- Experience designing and writing moderate-sized applications. We primarily use Rust, but C++ or other systems language experience is valuable.
- Experience with automation and scripting in languages such as Python, Perl, or Shell.
- A systemic and methodical approach to troubleshooting issues across the entire stack (hardware, software, application, network).
- Solid understanding of Internet networking protocols (TCP/IP, TLS, DNS, HTTP/S).
- Experience with CI/CD pipelines.
- Strong communication and interpersonal skills.
About DFINITY and the Internet Computer:
Join our team of over 250 talented individuals, including world-renowned cryptographers, distributed systems engineers, programming language experts, and industry leaders, who are shaping the future of the internet and web3. DFINITY was founded in 2016 by entrepreneur and crypto theoretician, Dominic Williams.
All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.