Insight International (UK) Ltd

Cloud Engineering Lead

Sheffield

Posted 2 days ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

Cloud Engineering Lead

About this role The Chaos Engineering Tech Lead will lead the development and execution of chaos engineering capabilities for the Treasury Technology team, with the objective of improving platform resilience, recoverability, and operational stability across critical services. This role will provide technical leadership to define chaos engineering practices, design and govern experiments, and drive remediation of resilience gaps across the Treasury technology estate.

In this role, you will:

Define and lead the chaos engineering strategy, roadmap, and operating model for Treasury Technology.
Establish best practices, guardrails, and standards for safe and effective chaos engineering across the technology estate.
Design, review, and execute chaos experiments to validate resilience across infrastructure, platforms, applications, and service dependencies.
Identify resilience weaknesses, single points of failure, recovery gaps, and operational risks before they lead to production incidents.
Drive remediation by working closely with engineering and platform teams to resolve issues identified through experiments.
Ensure chaos experiments are measurable, controlled, and aligned to service resilience objectives and business criticality.
Define resilience metrics, reporting, and evidence to track maturity and demonstrate improvement over time.
Embed resilience-by-design principles into engineering practices, delivery processes, and operational readiness, including running Gamdays to rehearse failure scenarios, validate runbooks/alerting/on-call readiness, and strengthen a resilience culture across teams.
Partner with architecture, SRE, DevOps, infrastructure, security, and support teams to align resilience activities with wider engineering priorities.
Coach and guide engineers in resilience thinking, experiment design, and chaos engineering practices to build team capability.
Communicate technical findings, resilience risks, and improvement priorities clearly to senior stakeholders and decision makers.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

To be successful in this role, you should meet the following requirements:

University Degree (or above) in Computer Science, Software Engineering, or a related discipline.
Excellent written and spoken communication skills in English.
Strong experience in a technical leadership role within platform engineering, site reliability engineering, cloud engineering, resilience engineering, or related discipline.
Proven track record in designing and implementing chaos engineering, fault injection, or resilience testing practices in complex enterprise environments.
Deep hands-on knowledge of Kubernetes, including deployment behaviour, failure handling, scaling, networking, and troubleshooting.
Strong experience with GCP and cloud-native platforms, including operational and resilience considerations.
Strong understanding of distributed systems, failure modes, system recovery, and service resilience patterns.
Experience working across modern technology stacks, including microservices, APIs, platform services, and cloud-based applications.
Strong understanding of observability, including metrics, logging, tracing, alerting, and service health monitoring.
Experience with automation, CI/CD pipelines, and engineering tooling to support scalable resilience practices.
Strong problem-solving capability, with the ability to diagnose detailed technical issues and drive practical solutions.
Strong stakeholder management and communication skills, with the ability to influence across engineering teams and senior management.
Demonstrated resilience mindset, with a strong focus on proactive risk reduction and continuous improvement.

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Preferred Qualifications

Experience in financial services or another highly regulated environment.
Experience working on business-critical platforms where stability, recoverability, and availability are essential.
Familiarity with SRE principles, incident management, disaster recovery, and operational resilience practices.
Experience introducing new engineering capabilities and driving adoption across multiple teams.
Knowledge of the technology stack used within Treasury platforms would be advantageous.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Chaos Engineering

Resilience Engineering

Kubernetes

GCP

Cloud Engineering

Distributed Systems

Observability

Automation

CI/CD

Problem-Solving

Stakeholder Management

Communication

Risk Reduction

Incident Management

Disaster Recovery

Microservices

Location

Sheffield, England, United Kingdom