ConnexAI
Machine Learning Engineer

How your CV stacks up
Upload your CV to see how well it fits this job role
?%
Machine Learning Engineer
Build Low Latency Conversational AI Systems
We are building real-time conversational AI systems built on top of large language models, speech AI, and agentic workflows. Our platform combines ASR, LLMs, and TTS into production-grade AI systems used globally across enterprise environments where latency, reliability, and scalability matter.
We are hiring a Machine Learning Engineer to build low-latency production systems for our LLM team. This role is centred around writing scalable code that enables real-time conversational AI to perform reliably under heavy production workloads.
You’ll work closely with our LLM and speech teams to solve challenges around inference speed, concurrency, request handling, GPU performance, distributed systems, and real-time response streaming.
What you’ll do
- Build and optimise low-latency LLM systems for real-time conversational AI
- Write production-grade Python code focused on performance, scalability, and reliability
- Design systems capable of handling large volumes of concurrent real-time requests
- Solve engineering challenges around batching, request scheduling, queue management, streaming responses, and distributed workloads
- Improve inference speed, GPU memory usage, and overall system responsiveness
- Deploy and optimise open-source LLMs using tooling such as vLLM, TensorRT-LLM, Triton, SGLang, CUDA, or similar technologies
- Build scalable orchestration layers and ML pipelines around LLM systems, including RAG and agentic workflows
- Develop backend inference services and APIs for production AI systems
- Productionise new model capabilities and features for real-world customer use cases
Reasons to use Rodeo
I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?
Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.
Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.
Start with a chat, not a search bar
Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.
Graduate Consultant — 2026 Scheme
Why you're a good match
StrongYour economics background and your summer at a regional bank line up with what PwC looks for on the consulting scheme. Applications close in four weeks.
See breakdownIt searches the market for you
Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.
Why you're a good match
You’ve got the grades and the economics background, and your bank internship is exactly the experience this scheme looks for. Apply soon — deadlines close within the month.
Experience fit
Your summer at the bank plus your econometrics coursework map directly to the day-one responsibilities on this scheme — client modelling, market briefings, and deal support.
Only hits
No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.
What we’re looking for
- Strong experience writing production-grade software for machine learning systems
- Strong Python engineering skills
- Experience building low-latency or highly concurrent systems
- Strong problem-solving ability and enjoyment of building systems from the ground up
- Experience with distributed systems, parallel workloads, and performance optimisation
- Experience working with inference tooling such as vLLM, TensorRT, Triton, CUDA, ONNX, or similar technologies
- Experience building scalable backend services or ML systems used in production
- Understanding of real-time systems and performance-focused engineering
- Strong communication skills and ability to work closely with engineers and researchers


Get help with your application
Your very own career expert that helps elevate your application to the next level.
Why this role?
You’ll work on designing and building low-latency conversational AI systems capable of serving large volumes of concurrent real-time requests. The role focuses on solving difficult engineering challenges around inference speed, reliability, concurrency, GPU performance, and scalable production AI systems.
“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”
Jessica, London
Skills