Aceolution

Senior Site Reliability Engineer

England

Posted 1 day ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

Senior Site Reliability Engineer

Senior Lead Site Reliability Engineer – Observability

We are seeking an [experienced Lead/Senior Site Reliability Engineer – Observability] to join a high-performing engineering team responsible for designing, building, and operating large-scale observability platforms that underpin mission-critical cloud services. This role involves architecting highly scalable monitoring, logging, alerting, and telemetry systems while collaborating closely with software engineering, platform, and infrastructure teams.

About the Role

This opportunity is ideal for engineers who thrive on solving complex infrastructure challenges, working with large-scale distributed systems, and building resilient cloud platforms.

Key Responsibilities

Design, implement, and maintain scalable and highly available observability platforms.
Build and operate enterprise-scale monitoring, logging, and alerting solutions.
Design and optimise Prometheus-based monitoring architectures for large-scale environments.
Deploy and manage high-performance Elasticsearch clusters for decentralised log storage and analytics.
Build and maintain high-throughput event streaming pipelines using Kafka.
Develop self-service APIs, libraries, and tools to enable engineering teams in managing observability.
Automate infrastructure deployment using Terraform under the purview of Infrastructure as Code (IaC).
Partner with engineering teams to enhance system reliability, monitoring capabilities, and operational excellence.
Troubleshoot production issues, conduct root-cause analysis, and implement long-term preventive solutions.
Participate in an on-call rotation to diagnose and address production disruptions.
Drive automation, operational excellence, and continuous improvement across cloud platforms.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Required Skills & Experience

Minimum 5+ years designing, deploying, and operating medium-to-large-scale distributed systems on Linux environments (Debian, Ubuntu, etc).
Minimum 2+ years of programming expertise in one or more of the following:
- Go
- Python
- Ruby
- Scala
- Bash
Deep understanding of Site Reliability Engineering (SRE) principles and best practices.
Experience building and supporting highly available cloud infrastructure.
Strong analytical, troubleshooting, and problem-solving skills.
Demonstrated ability to collaborate effectively in cross-functional environments.

Technical Skills (Preferred Expertise)

Familiarity with at least several of the following:

SRE & Observability (Monitoring, Logging, Alerting)
Prometheus, Thanos, Cortex, Grafana, Graphite
ELK Stack components (Elasticsearch, Logstash, Kibana)
Kafka for real-time event streaming
Terraform & Infrastructure as Code (IaC)
Ansible for configuration management
Consul for service meshing and discoverability
Snowflake for data warehousing
Linux administration
DevOps practices (CI/CD pipelines, GitOps, etc.)

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Preferred Qualifications

Hands-on experience with large-scale observability platforms.
Strong background in distributed systems and cloud-native infrastructure.
Ability to process high-volume monitoring, logging, and telemetry data.
Dedication to automation, scalability, and operational excellence.
Adaptability to thrive in a remote-first, collaborative environment.

Eligibility

This role requires unrestricted work eligibility in the United Kingdom (no employer sponsorship required).

Why Join Us?

Collaborate with high-performing engineers on cutting-edge cloud infrastructure.
Address complex technical challenges while contributing to scalable, automated, and resilient platforms.
Work in a remote-first environment with flexible, mission-aligned opportunities.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Site Reliability Engineering

Observability

Prometheus

Elasticsearch

Kafka

Terraform

Ansible

Linux Administration

Infrastructure as Code

DevOps Practices

Monitoring

Logging

Alerting

Telemetry

Distributed Systems

Cloud Infrastructure

Location

England, United Kingdom