ConnexAI

Machine Learning Engineer

Manchester

Posted 24 days ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

Machine Learning Engineer

Build Low Latency Conversational AI Systems

We are building real-time conversational AI systems built on top of large language models, speech AI, and agentic workflows. Our platform combines ASR, LLMs, and TTS into production-grade AI systems used globally across enterprise environments where latency, reliability, and scalability matter.

We are hiring a Machine Learning Engineer to build low-latency production systems for our LLM team. This role is centred around writing scalable code that enables real-time conversational AI to perform reliably under heavy production workloads.

You’ll work closely with our LLM and speech teams to solve challenges around inference speed, concurrency, request handling, GPU performance, distributed systems, and real-time response streaming.

What you’ll do

Build and optimise low-latency LLM systems for real-time conversational AI
Write production-grade Python code focused on performance, scalability, and reliability
Design systems capable of handling large volumes of concurrent real-time requests
Solve engineering challenges around batching, request scheduling, queue management, streaming responses, and distributed workloads
Improve inference speed, GPU memory usage, and overall system responsiveness
Deploy and optimise open-source LLMs using tooling such as vLLM, TensorRT-LLM, Triton, SGLang, CUDA, or similar technologies
Build scalable orchestration layers and ML pipelines around LLM systems, including RAG and agentic workflows
Develop backend inference services and APIs for production AI systems
Productionise new model capabilities and features for real-world customer use cases

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

What we’re looking for

Strong experience writing production-grade software for machine learning systems
Strong Python engineering skills
Experience building low-latency or highly concurrent systems
Strong problem-solving ability and enjoyment of building systems from the ground up
Experience with distributed systems, parallel workloads, and performance optimisation
Experience working with inference tooling such as vLLM, TensorRT, Triton, CUDA, ONNX, or similar technologies
Experience building scalable backend services or ML systems used in production
Understanding of real-time systems and performance-focused engineering
Strong communication skills and ability to work closely with engineers and researchers

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Why this role?

You’ll work on designing and building low-latency conversational AI systems capable of serving large volumes of concurrent real-time requests. The role focuses on solving difficult engineering challenges around inference speed, reliability, concurrency, GPU performance, and scalable production AI systems.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Python

Machine Learning

Low-Latency Systems

Concurrency

Distributed Systems

Performance Optimisation

Inference Tooling

Backend Services

Real-Time Systems

Scalability

GPU Performance

Request Handling

Streaming Responses

Queue Management

Batching

Production Systems

Location

Manchester, England, United Kingdom