Manchester Digital

Site Reliability Engineer

Manchester

Posted 24 days ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

Site Reliability Engineer

As a Site Reliability Engineer, you will enhance system reliability, observability, and performance through a strong engineering approach and assist with incident resolution and best practices.

You will leverage your software engineering skills — with a focus on system reliability and observability — to monitor the health, performance, and availability of critical systems, directly impacting operational efficiency.

Using your expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices, and develop features for maintainability. You will also help engineer tools and automation for effective service management.

Collaboration is central to this role. You will work across multiple functions to integrate reliability and observability best practices into the software development lifecycle. By supporting governance standards set by central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands and enhance overall service performance.

** Hybrid work policy: This role is eligible for inclusion in the company’s hybrid working from home policy.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Preferred Skills & Experience

Excellent knowledge of Site Reliability Engineering (SRE) principles, including:
- Creation and management of Service Level Indicators (SLIs)
- Setting Service Level Objectives (SLOs) for reliability and improved customer satisfaction.
Proven experience with observability tools, including:
- Splunk, New Relic, Grafana, and PagerDuty.
Strong proficiency in programming languages like:
- Python, Golang (Go), and JavaScript.
Proficiency in modern software development techniques and software development lifecycle (SDLC) methodologies.
Experience with Infrastructure as Code (IaC) automation and orchestration, such as:
- Ansible and Terraform.
Prior industry experience working in large-scale, 24/7 enterprise environments where system uptime and stability are critical to business success.
Keen interest in industry trends, particularly Platform Engineering.
Strong proficiency in shell scripting for automation and system management tasks.

Key Responsibilities

Write and contribute to code that enhances the reliability and observability of services, including:
- Telemetry introduction
- Development of operational APIs
- Creation of tooling essential for observability.

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Develop and maintain tools that facilitate effective system management to ensure operational efficiency and resilience.
Integrate automation and orchestration platforms to:
- Automate manual activities
- Reduce operational toil.
Build sophisticated dashboards using telemetry data and tools like:
- Grafana, Splunk, and New Relic.
Maintain and administer existing monitoring and analytics toolsets.
Mentor colleagues on new technologies or best practices.
Actively participate in:
- Live incident resolution
- Post-mortem analysis Provide remediation strategies to mitigate root causes and improve system health, preventing future issues.
Drive initiatives to enhance system reliability and observability, contributing to a culture of continuous improvement.
Collaborate with central SRE and Observability teams to:
- Establish and uphold standards for reliability and observability.
- Assist other teams in adhering to these best practices.
Work closely with IT Operations, providing and supporting critical tooling to enhance business value.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Site Reliability Engineering

Service Level Indicators

Service Level Objectives

Observability Tools

Splunk

New Relic

Grafana

Pager Duty

Python

Golang

JavaScript

Infrastructure as Code

Ansible

Terraform

Shell Scripting

Automation

Location

Manchester, England, United Kingdom