Rodeo
ResourcesPartnersSign in

Era4

Site Reliability Engineer

Bristol
Posted 3 days ago
Sign up to applySee more jobs like this

How your CV stacks up

1Upload CV
2Analyse CV
3Improve CV

Upload your CV to see how well it fits this job role

?%

Site Reliability Engineer

Automation Engineer – AI Platform Operations

Role Summary

Era4 develops, owns, and operates AI infrastructure across the UK, powered by renewable energy, through the reinvention of legacy industrial sites into modern data centre facilities. This Automation Engineer role builds an AI Platform Operations function from scratch at the intersection of Site Reliability Engineering (SRE) and AI-driven workflow automation, with no legacy constraints.

Must be located within Bristol or the surrounding area due to onsite requirements. Contractor options are available if interim support is preferred (day rate to be provided in application).


Key Responsibilities

Runbook Automation & Agent Development

  • Build agentic, executable workflows capable of:
    • Triaging, diagnosing, and autonomously remediating known failure patterns (with appropriate safeguards).
    • Designing LLM-backed agents targeting:
      • Observability stacks
      • ITSM platforms
      • Infrastructure APIs (DCIM, IPAM, hypervisor layers)
  • Develop auditable, high-control automations for client interaction workflows and higher-risk platform actions (e.g., infrastructure orchestration).

Operational Tooling & Self-Service Enablement

  • Engineer self-service tooling for engineers and service desk analysts:
    • CLI utilities
    • ChatOps integrations (Slack/Teams bots)
    • Status dashboards
    • Self-service automation hooks
  • Reduce manual support dependency between DevSecOps and operations teams via automation.
  • Maintain a version-controlled, peer-reviewed library of:
    • Automation assets
    • Agent prompts
    • Runbook-as-code artefacts

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

P

Graduate Consultant — 2026 Scheme

PwC·London, UK
£35,000/yr

Why you're a good match

Strong

Your economics background and your summer at a regional bank line up with what PwC looks for on the consulting scheme. Applications close in four weeks.

See breakdown
Save jobNot relevant
View details

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Why you're a good match

You’ve got the grades and the economics background, and your bank internship is exactly the experience this scheme looks for. Apply soon — deadlines close within the month.

See breakdown
Strong

Experience fit

Your summer at the bank plus your econometrics coursework map directly to the day-one responsibilities on this scheme — client modelling, market briefings, and deal support.

See breakdown
Strong

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Event & Alert Intelligence

  • Implement automation layers around monitoring and event management:
    • Alert suppression logic
    • Event enrichment pipelines
    • Correlation rules
    • Alert-to-ticket integrations
  • Optimise observability tooling (Prometheus, Mimir, Grafana) to improve signal-to-noise ratios.
  • Design event correlation and deduplication logic to mitigate alert storms and enhance incident context.

Continuous Improvement & Knowledge Capture

  • Identify operational pain points for automation and prioritise a toil reduction backlog.
  • Dedicate post-incident reviews to extract insights for future automation and runbook updates.
  • Contribute to evolving Era4’s operational standards, tool architecture, and agent framework.

Requirements

Essential Experience & Skills

  • Operations lead experience in SRE, Senior Platform, or DevOps environments, including:
    • Hands-on on-call operations and incident management.
    • Transitioning narrative runbooks to executable workflows/codified decision trees.
  • ITIL-aligned understanding of incident/change management and ITSM tooling (ServiceNow, Halo, Jira Service Management)...
  • Technical expertise:
    • Python for scripting, automation, and API integrations.
    • Proficiency with observability platforms: Prometheus, Grafana, or Mimir.
    • API integration experience with ITSM platforms via REST/GraphQL.
    • Familiarity with event-driven architectures: message queues, webhook patterns.
    • Strong grasp of GPU infrastructure management (performance metrics, signal automation).
    • Infrastructure-as-Code (IaC) experience (e.g., Kubernetes, Terraform).
    • API-first approach is mandatory—experience integrating agents with:
      • DCIM (Data Centre Infrastructure Management)
      • IPAM processes
      • Hyperconverged control planes

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Nice-to-Have Advantages

  • Exposure to data centre or colocation operational environments, especially high-trust compute/high-density GPU workloads.
  • Expertise in ChatOps: building Slack/Teams automation bots for operations.
  • Knowledge of:
    • DCIM platforms and thermal/power telemetry pipelines.
    • OpenTelemetry, distributed tracing, or log aggregation (Loki, ELK, Splunk).
  • Experience contributing to open-source observability/automation projects.
  • Background in startup/scale-up toolchain engineering (building systems from scratch).

Why Join Era4?

You’ll play a critical role in shaping a next-generation AI infrastructure company, with: ✅ High visibility with leadership. ✅ Real operational autonomy over system-wide tooling. ✅ Direct impact on operational excellence at scale. ✅ A mission-driven purpose: enabling 21st-century compute for healthcare, finance, and public good.


Diversity & Inclusion

Era4 is an equal opportunity employer. We’re committed to fostering an inclusive culture where everyone’s contributions are valued.

All dilutions and pronouns welcome.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Python
Automation
API Integration
Data Processing
Observability
Monitoring
ITSM
Event-Driven Architectures
Message Queues
Webhook Automation
Infrastructure-as-Code
Cloud-Native
GPU Infrastructure
ChatOps
OpenTelemetry
Distributed Tracing

Location

Bristol, England, United Kingdom

Sign up to applySee more jobs like this