Rodeo
ResourcesPartnersSign in

HSBC Global Services Limited

MLOps Engineer (LLM/GenAI)

Sheffield
Posted 9 days ago
Sign up to applySee more jobs like this

How your CV stacks up

1Upload CV
2Analyse CV
3Improve CV

Upload your CV to see how well it fits this job role

?%

MLOps Engineer (LLM/GenAI)

MLOps Engineer (LLM/GenAI) – HSBC

If you’re looking for a career that will help you stand out, join HSBC and fulfil your potential – whether you want a career that could take you to the top, or an exciting new direction. We offer opportunities, support, and rewards that will take you further.

We’re one of the largest banking and financial services organisations in the world, with a network covering over 50 countries and territories. We aim to be where the growth is, enabling businesses to thrive and economies to prosper, and ultimately helping people fulfil their hopes and realise their ambitions.


About the Role

We are seeking an MLOps Engineer (LLM/GenAI) for this fantastic role, where you’ll engineer production-grade infrastructure for modern AI:

  • Hosting LLMs and speech/embedding models
  • Pushing inference performance on real hardware
  • Building repeatable fine-tuning pipelines to ship domain-adapted models into production

If you enjoy tackling hard performance problems, platform engineering, and seeing your work widely used across a global organisation, this role is built for you.

As an HSBC employee in the UK, you’ll have access to tailored professional development opportunities along with a competitive pay and benefits package, including:

  • Private healthcare for all UK-based employees
  • Enhanced maternity and adoption pay, plus support when you return to work
  • A contributory pension scheme with a generous employer contribution

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

P

Graduate Consultant — 2026 Scheme

PwC·London, UK
£35,000/yr

Why you're a good match

Strong

Your economics background and your summer at a regional bank line up with what PwC looks for on the consulting scheme. Applications close in four weeks.

See breakdown
Save jobNot relevant
View details

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Why you're a good match

You’ve got the grades and the economics background, and your bank internship is exactly the experience this scheme looks for. Apply soon — deadlines close within the month.

See breakdown
Strong

Experience fit

Your summer at the bank plus your econometrics coursework map directly to the day-one responsibilities on this scheme — client modelling, market briefings, and deal support.

See breakdown
Strong

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.


Key Responsibilities

  • Design, build, and operate scalable model hosting platforms for LLMs, embeddings, and speech-to-text (STT)/text-to-speech (TTS) across heterogeneous hardware
  • Optimise inference for latency, throughput, and cost (e.g., quantisation (INT4, FP8, GPTQ, AWQ), KV-cache optimisation, dynamic/continuous batching)
  • Evaluate and integrate inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware
  • Own inference health/performance monitoring (latency, throughput, time to first token (TTFT), memory, availability) and troubleshoot bottlenecks/deployment issues
  • Build end-to-end fine-tuning pipelines (data preparation → distributed training → validation) and integrate fine-tuned models into the hosting/inference stack

Requirements

To be successful in this role, you should have the following skills:

  • Extensive experience in building AI platforms, covering:
    • Model hosting and inference optimisation
    • Fine-tuning pipelines (with LLM experience strongly preferred)
  • Strong Python and CUDA engineering skills, with a solid understanding of:
    • GPU/CPU architecture
    • High-performance computing (HPC) fundamentals
  • Deep expertise in inference optimisation, including:
    • KV-cache optimisation
    • Batching strategies
    • Quantisation techniques (INT4, FP8, GPTQ, AWQ)
    • Operator optimisation
    • Framework integration (vLLM, TensorRT-LLM, SGLang)
  • Production hosting experience with:
    • Containerisation (Docker)
    • Orchestration (Kubernetes)
    • Cloud platforms (AWS, GCP, Azure)
  • End-to-end fine-tuning expertise, including:
    • Data preparation
    • Distributed training
    • Hyperparameter tuning
    • Low-rank adaptation techniques (HF, Accelerate, LoRA, QLoRA)
    • Benchmarking, monitoring, and troubleshooting

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Commitment to Diversity & Inclusion

HSBC is committed to creating diverse and inclusive workplaces.

No matter their gender, ethnicity, disability, religion, sexual orientation, socio-economic background, or age, everyone deserves equal opportunities. We take pride in being a Disability Confident Leader and will offer an interview to people with disabilities, long-term conditions, or neurodivergent candidates who meet the minimum criteria for the role.

If you require accommodations during the recruitment process, please contact our Recruitment Helpdesk at: hsbc.recruitment@hsbc.com.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

MLOps
LLM Hosting
Inference Optimisation
Python
CUDA
GPU Architecture
Docker
Kubernetes
AWS
GCP
Azure
Fine-tuning Pipelines
vLLM
TensorRT-LLM
SGLang
Distributed Training

Location

Sheffield, England, United Kingdom

Sign up to applySee more jobs like this