Kallikor

AI/ML Engineer

London

Posted about 2 months ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

AI/ML Engineer

Production Engineering Lead – Domain-Specific Language Model (DSLM) & Project Genome

At Kallikor, we're building the future of supply chain intelligence through AI-powered simulation digital twins. We create living digital representations of real-world operations—warehouses, distribution networks, and global logistics—that help organisations make better decisions faster.

We're at an inflection point: moving from AI-assisted tools to domain-specific AI that understands supply chains as deeply as our best engineers do. You'll be instrumental in building our first domain-specific language model (DSLM) and the foundation for Project Genome, an ambitious initiative to capture and synthesise the world’s supply chain knowledge into actionable intelligence.

About the Role

This is a production engineering role first. You’ll build robust Python systems that happen to train and serve LLMs—not the other way around. We need someone who:

Writes production-quality code
Debugs complex distributed systems
Thinks about reliability
Treats ML/LLMs as engineering tools, not monolithic black boxes

You’ll work across our entire AI stack, building FastAPI services, training pipelines, inference endpoints, and integrating everything into our existing Python backend. The ML is important—but engineering discipline is what makes it production-ready.

Learn more at kallikor.ai.

Your Opportunity

Build Production AI Systems

Design and implement full-stack systems (from FastAPI endpoints to inference services)
Own the architecture, not just model weights
Ship incrementally with production-grade reliability

Train & Deploy Our DSLM

Fine-tune models using Unsloth/Axolotl, but build the infrastructure around it
Develop data pipelines, evaluation frameworks, and deployment systems
Hit <200ms latency targets through engineering—not just chasing bigger GPUs

Integrate ML Into Our Backend

Extend FastAPI, PydanticAI, FastMCP, Memgraph with ML capabilities
Ensure clean abstractions, proper error handling, and observability
Avoid ML as a separate "service"—it should be natively part of our backend

Shape Project Genome’s Foundation

Work with our Principal Engineer to architect supply chain data ingestion
Design data pipelines, graph database structures, and incremental learning strategies
Focus on systems design as much as ML (data pipelines ≥ model size)

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Mentor Through Code Review & Pairing

Raise the bar on code quality, testing, and production practices
Teach mid/junior engineers how to build ML systems that don’t fall over

Why You’re Made for This

You’re a strong production Python engineer who:
- Writes clean, maintainable, tested code
- Understands async/await, optimises generators vs lists, and profiles bottlenecks
- Builds FastAPI services for production traffic
- Stays calm during code reviews without drama
You’ve integrated LLMs in production and dealt with:
- Streaming responses, rate limits, retries, intelligent caching
- Prompt engineering, context management, error handling, cost control
You’ve trained or fine-tuned models and understand:
- Data quality, training workflows, evaluation metrics, overfitting
- Debugging why a model isn’t learning as expected
You think like a systems engineer:
- Design for failure, add instrumentation, consider edge cases
- Know that "works on my laptop" ≠ production-ready (monitoring, logging, alerting > demo)
- Favour graceful degradation over brittle solutions
You navigate the ML landscape pragmatically:
- Know enough about transformers/attention to make informed trade-offs
- Ship simple heuristics if they beat complex models
- Advocate for tevree realism in production
You balance velocity + quality:
- Ship incrementally but refactor proactively
- Write tests that matter and leave the codebase better than you found it
You communicate trade-offs clearly:
- Can explain why we’re choosing LoRA over full fine-tuning
- Justify Fireworks vs. self-hosting or 7B vs. 70B models
- Help the team make technology decisions confidently

What We’re Looking for

Must Have (Critical)

✔ 5+ years building production Python systems (backend services, APIs, data processing) ✔ Strong software engineering fundamentals:

Design patterns, testing, debugging, profiling ✔ Experience integrating LLMs in production:
OpenAI/Anthropic APIs, prompt engineering, streaming, rate limits
Frameworks like PydanticAI ✔ Understanding of ML training workflows (even as a practitioner, not a researcher—build the tools, not the math) ✔ Docker, CI/CD, production deployment experience ✔ Can read and understand PyTorch code (you don’t need to write novel architectures)

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Nice to Have (Bonus)

🔥 Fine-tuning experience: LoRA, full fine-tuning, QLoRA 🔥 Distributed training basics: DeepSpeed, FSDP 🔥 Graph databases: Memgraph, Neo4j 🔥 Supply chain/logistics domain knowledge (helpful but not mandatory) 🔥 Agent frameworks experience: LangChain, PydanticAI

What You’ll Work With

Stack & Tools

Backend Stack: Python, FastAPI, PydanticAI, FastMCP, Memgraph, PostgreSQL
ML Stack: PyTorch, Unsloth/Axolotl (training), vLLM (inference), Weights & Biases
Models Used: Qwen 2.5, Llama 3.1, GPT-4 (Cohere), Claude ( Anthropic )
Infrastructure: AWS (flexible deployments), Docker, Kubernetes, GPU acceleration
Team: Your Principal Engineer (architectural partner) Mid Data/ML Engineer (data pipeline partner) Junior AI Engineer (learning mentor)

Example Projects You’ll Own

🛠 Build a FastAPI service that:

Handles streaming LLM responses
Implements error handling + retries
Optimises for network latency

🔬 Create a training pipeline that:

Processes production logs
Validates data quality
Triggers fine-tuning runs automatically

🚀 Deploy a 7B model with vLLM that:

Beats GPT-4 latency
Maintains quality on our domain
Hits <200ms targets

🔗 **Design Project Genome’s ingestion architecture:

Process papers, documentation, operational data
Scale data pipelines efficiently
Ensure incremental learning

📊 Implement evaluation frameworks that:

Catch model regressions before production
Validate training improvements
Enable A/B testing in deployment

About Us

Kallikor fosters an environment where people can excel and belong. We believe: ✅ Healthy culture drives success ✅ Inclusion fuels innovation ✅ Diverse perspectives strengthen results

We commit to zero discrimination—all employees are valued for their contributions.

Logically follow [all links in this document replace " replacing "]"*:

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Python

FastAPI

LLM Integration

PyTorch

Docker

CI/CD

Model Fine-tuning

vLLM

PydanticAI

System Design

Distributed Systems

Prompt Engineering

API Development

Data Pipelines

Performance Profiling

Monitoring

Location

London, England, United Kingdom