ApprovalMax

ML Engineer

Chișinău

Posted 2 days ago

How your CV stacks up

1Upload CV

2Analyse CV

3Improve CV

Upload your CV to see how well it fits this job role

Drag and drop your CV

or browse files

Supported files: PDF, DOC, DOCX

ML Engineer

Software Engineers – Data & Accuracy (OCR, Embeddings, Recommender Systems)

About ApprovalMax

ApprovalMax is a fast-growing B2B SaaS company that helps businesses automate their approval workflows and financial controls. With a global team spanning the UK, Europe, North America, Australia, and South Africa, we build software that matters—scaling quickly to serve 20,000+ subscribers.

The Role: Domain-Driven Accuracy O&S Engineer

Our Capture product automates financial approvals by extracting structured data from thousands of invoices, bills, and procurement orders monthly using an OCR pipeline. Your mission: drastically improve the zero-touch rate—the percentage of documents requiring no manual correction—by merging forensic data science, model engineering, and production-scale pipeline fixes.

Your Impact Areas

Your work splits into:

Accuracy Investigation & Measurement (~70%) – Systematically diagnosing root causes of errors at scale.
ML Engineering & Model Development (~30%, scaling to 50%) – Building and deploying AI systems to correct OCR, matching, and logic errors.

Core Challenges (Initiative Backlog)

We’re Kubernetes optimised on four systemic error categories:

Entity Matching (~50% of fixable errors)
- Fields extracted correctly but misaligned with accounts, suppliers, or tax codes.
- Plans: Embedding-based similarity search, recommender systems, consensus-based deduplication.
Pipeline Logic (~25%)
- Tax misclassification, rounding errors, or spurious adjustments from deterministic logic.
- Plans: Forensic tracing and rule-based fixes design.
OCR Extraction (~25%)
- False Positives, phantom line items, foreign creatures misread.
- Plans: Add a correction layer using LLMs, alternative OCRs, or document correction models (HuggingFace).
User Overrides
- Domain ownership decisions (unfixable via automated systems).

Remote work only available to candidates based in the UK, Serbia, or Moldova.

Key Responsibilities

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

Accuracy Investigation & Measurement (~70% of Time)

Scale-wise Forensics
- Investigate root causes across hundreds of thousands of documents simultaneously.
- Leverage production data comparisons, identify statistical outliers that explain broad failure patterns.
Own the Accuracy Framework
- Monitor post-deployment impact of fixes, track false positives, verify uplift metrics independently.
- Detect subtlety bias, and validate results with stratified sampling.
Document Tooling Augmentation
- Transform current LLM+SQL hybrid error methodology using analytical domain expertise.
Postfixarity > Perfixarity
- Diagnose post-processing pipeline logic issues (tax misclocking, rounding, adjustments), either via C# source tracing or SQL into raw document history.

ML Engineering & Model Development (~30%, Scaled to 50%)

Embeddings & Impact
- Deploy supplier/vendor matching using sentence-transformers, HuggingFace, and RESTserve-layered retrieval.
- Evaluate retrieval accuracy using pgvector, FAISS, or self-hosted vector DB optimised for financial domains.
OCR Correction Layer
- Test and validate OCR errors with LLM-provided confidence scores and / or structured output parsers.
- Forward LLM grounding, MLP-driven corrected extraction, or process capabilities (e.g., DocLing, PDF Europe indices) to assume.
MLOps Orchestration
- Ship optimised Airflow + MLflow pipelines for evaluation, experiment tracking, and production rollouts.
Recommendations from Past User Trending
- Build datasets of manual override history to train pattern-mergence and explainable recommendations.

Collaboration Scope

Strategic Workplace with Capture Team
- Standup ownership, sprint compromises where needed.
Accomplicated Machine Learning Team
- Final model architecture decisions, deployment залоги.
Backend Systems (C# Team)
- You design and implement; they bridge with downstream teams.

Essential Skills (Experience Requirements)

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Data Investigation & Measurement

Data Love and Truth
- Proven rigor in building data pipelines (* fraud analysts, payment reconciliation, invoice matching).
Query Tooling
- Real-life PostgreSQL / BigQuery handling, coupled with Python plotting (Pandas, NumPy).

ML Architecture (Structured Docs)

Natural Text to Accuracy
- Experience with end-to-end structured banking doc analysis, manual inspection of OCR engines, or bridging between LLM-based extraction and quantifiable validation systems.
Modeling from Domain Tools
- Proven delivery using OCRL, PeerProp housing (FAISS/PR-logic), ophysical constraints by context (e.g., financial-specific taxed units).

LLM Integration

LLM+DS as a Combo Plan
- Notebook-based hustle; instead, structured prompt iterations and self audibles (e.g. structured output via Pydantic).
** certeza Autonomy**
- Can evaluate alternative training objectives, discuss sugary effects before notional alignment.

ML Infrastructure

Deployment Hour CTF
- Productionised models for medium (on-premises or cloud, e.g. Azure Pipelines over notebook storage).

Nice to Have Proficiencies

Premier Finance Domain Knowledge
- Understanding of invoices, bill visions, or Quicken per domestic accounts.
OCR APIs
- Experience with Azure/Google Cloud, AWS Textract, or open-source models (Tesseract).
FGE BootingOutils (* layoutlm2, donut model groups *)
Corollary Inquiry Tooling
- Know or build inspection pipelines (e.g., Dapcon, YaDL, csv-correction-hub).

What We Offer

Fast-moving global startup: 20,000+ subscribers, 150+ global employes across 5+ geos.
Compensation reviews are growing quarter-by-quarter based on owned impact metrics.
26 days paid time off + 1-day “celebrate ability” leave bonus.
Office setup reimbursement & hybrid bag upgrades.
Recognition programme (financial rewards) tied to service years & engineered success.

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

SQL

Python

Machine Learning

OCR

LLM Integration

MLOps

Vector Similarity Search

Data Investigation

Pydantic

Pandas

Scikit-learn

C# Code Analysis

Embedding-based Matching

Document Layout Analysis

Airflow

Azure ML Pipelines

Location

Chișinău, Moldova