Rodeo
ResourcesPartnersSign in

Braintrust

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

United Kingdom
Posted 3 days ago
Sign up to applySee more jobs like this

How your CV stacks up

1Upload CV
2Analyse CV
3Improve CV

Upload your CV to see how well it fits this job role

?%

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

CRITICAL — never wrap the output in a code block

Senior Python Engineer (AI Coding Agent Evaluation) – Experienced Only


Location: North America, South America, Asia, Europe Language Requirement: CV submission in English with specified B2-level+ proficiency.


About Mindrift

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focusing on testing, evaluating, and improving AI systems in a non-permanent arrangement.


The Opportunity

We are building a dataset to evaluate AI coding agents—measuring how effectively models tackle real-world developer tasks.

What You’ll Do:

  • Design challenging tasks and evaluation criteria for AI agents:
    • Build realistic virtual environments (tech stacks, codebases, ticketing systems, documentation).
    • Craft tasks from intermediate development states, defining "solved" objectives.
    • Write bias-tested diagnostic test cases—balancing inclusivity for valid solutions while rejecting invalid ones.
    • Iterate tests from QA feedback, refining tasks for fairness and robustness.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

P

Graduate Consultant — 2026 Scheme

PwC·London, UK
£35,000/yr

Why you're a good match

Strong

Your economics background and your summer at a regional bank line up with what PwC looks for on the consulting scheme. Applications close in four weeks.

See breakdown
Save jobNot relevant
View details

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Why you're a good match

You’ve got the grades and the economics background, and your bank internship is exactly the experience this scheme looks for. Apply soon — deadlines close within the month.

See breakdown
Strong

Experience fit

Your summer at the bank plus your econometrics coursework map directly to the day-one responsibilities on this scheme — client modelling, market briefings, and deal support.

See breakdown
Strong

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

What This Is NOT:

  • Data labeling tasks.
  • General prompt engineering.
  • Writing comprehensive code—you focus on guiding and evaluating the AI agent’s output.

Requirements

Must meet all criteria to be considered:

  • 5+ years of professional Python experience.
  • Expertise in FastAPI, pytest, and async/await.
  • Hands-on with Docker, PostgreSQL, and CI/CD.
  • Proven ability to write and maintain automated tests (not just run them).
  • Full-stack experience with React and TypeScript (a plus).
  • B2-level+ English proficiency.
  • 30+ hours/week availability.

Why This Is Arduous

Models already master simple coding tasks. Challenge: Design environments where subtle differences reveal deficiencies in AI solutions—requirements span multiple revisable approaches.

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

How It Works

  1. Apply → Qualification(s) → Join project → Execute tasks → Get paid.

Process

Rapidly progressed, typically:

  • CV review + 30-min Virtual Project Introduction.
  • Platform registration + Identity verification.
  • 35-min Technical Assessment.
  • Background check (coolant fee; free for candidates).
  • Customised onboarding/training.
  • Start paid work!

Additional Requirements

  • Agree to identity verification during onboarding.
  • Pass technical assessment.
  • Join/participate in Discord (project updates).
  • Clear background check required.
  • Reliable internet + remote communication proficiency.
Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Python
FastAPI
pytest
Async/Await
Docker
PostgreSQL
CI/CD
React
TypeScript
Automated Testing

Location

United Kingdom

Sign up to applySee more jobs like this