Falcon Smart IT (FalconSmartIT)

Site Reliability Engineer

Brighton

Posted 8 days ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

Site Reliability Engineer

Job Location: Hove, UK (Hybrid 3 days office) Job Type: FTE

About the Role

The Site Reliability Engineer (SRE) will drive the modernization of IT operations by implementing observability practices and automating toil. This position demands deep expertise in Site Reliability Engineering (SRE) principles, modern observability tools, and automation to ensure scalability, reliability, and efficiency across IT systems. The ideal candidate is a strategic thinker with hands-on experience who can lead modernization initiatives while fostering a culture of reliability and innovation.

Primary Responsibilities

Work closely with the Product Engineering team to strategize IT operations modernization, enhancing observability and reducing toil.
Architect and deploy observability platforms to effectively monitor system health, performance, and reliability.
Propose and drive AI-driven alerting and proactive anomaly detection strategies to reduce MTTD (Mean Time to Detect) and MTTR (Mean Time to Resolve).
Develop and enforce SRE best practices, including:
- Service Level Objectives (SLOs)
- Service Level Indicators (SLIs)
- Error Budgets
Establish and create an AIOPS roadmap to improve operational efficiency.
Lead efforts to automate repetitive tasks (toil) using:
- Scripting
- Orchestration tools
- AI/ML-based solutions
Drive toil automation initiatives, including:
- Automated incident responses
- Self-healing automation for autonomous operations
Collaborate with cross-functional teams to ensure systems are scalable, resilient, and maintainable.
Drive incident management and root cause analysis processes through automation, aiming for continuous improvement toward autonomous operations.
Partner with engineering, architecture, and product teams to promote shift-left engineering practices and ensure reliability.
Mentor and guide teams on adopting SRE principles and tools.
Advocate for a culture of reliability, automation, and continuous improvement company-wide.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Key Skills

Strong expertise in Site Reliability Engineering (SRE) principles
Advanced knowledge of observability tools – Dynatrace & Datadog (preferred skills)
Proficiency in automation & scripting – Python & Ansible (preferred skills)
Strong experience with cloud platforms – AWS & Azure (preferred skills)
Solid understanding of:
- Containerization (e.g., Docker)
- Orchestration (e.g., Kubernetes)
Proficiency in cloud-native distributed systems and microservices architecture
Exposure to AI/ML techniques for:
- Predictive analytics
- Automated problem resolution
Familiarity with CI/CD pipelines and automated release/deployment engineering solutions
Nice-to-have:
- Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey)
- Knowledge of automation frameworks for resilience tracking
Ability to:
- Manage and prioritize multiple projects in a fast-paced environment
- Communicate effectively across teams
- Solve complex problems with analytical thinking and adaptability
- Balance engineering excellence with business priorities (strategic mindset)

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Preferred Qualifications

12+ years of experience in IT operations, SRE, or DevOps roles
Proven track record of:
- Implementing observability and automation solutions in large-scale environments
Certifications in:
- Cloud platforms
- Observability tools
- SRE-related areas

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Site Reliability Engineering

Observability

Automation

Scripting

Python

Ansible

AWS

Azure

Containerization

Docker

Kubernetes

Microservices

AI/ML

CI/CD

Chaos Engineering

Problem Solving

Location

Brighton, England, United Kingdom