Rodeo
ResourcesPartnersSign in

Amazon

Applied Scientist, Edge AI and Science

Cambridge
Posted 23 days ago
Sign up to applySee more jobs like this

How your CV stacks up

1Upload CV
2Analyse CV
3Improve CV

Upload your CV to see how well it fits this job role

?%

Applied Scientist, Edge AI and Science

Description

Amazon Devices is an inventive research and development company that designs and engineers high-profile devices like the Kindle family of products, Fire Tablets, Fire TV, Health & Wellness, Amazon Echo, and Astro products.

This is an exciting opportunity to bring generative AI to Amazon's consumer products, both on-device at the edge and in the cloud. Our compression platform delivers 20x to 100x neural network compression, but using it well still takes weeks of hands-on learning and expert intuition. The Edge AI Model Studio team exists to change that. We become the expert users so partner teams don't have to: we turn compression science into reliable, production workflows, and we package the results into a library of compression-ready student architectures that partners can run on their own. Our north star is simple. Training-to-deployment should feel like pushing a button, not a month-long science project.

We are looking for an Applied Scientist to join Model Studio and help compress the next generation of models for edge and cloud deployment across modalities, including large language models, vision-language models, speech and audio models, and omni models that reason jointly over text, audio, and video. You will apply and extend state-of-the-art compression recipes to real models, define the benchmarks and evaluation methodology that make trade-offs explicit, and build the reference implementations that let other teams deploy compressed models without our help. You will work backwards from deployment constraints such as memory, latency, throughput, power, and cost, which differ across edge and cloud targets, partnering closely with fellow scientists, platform and compiler engineers, hardware architects, and product teams. The role sits on two frontiers at once. Compressing a model effectively and healing it back to quality means staying current not just with the latest compression techniques, but with the rapidly evolving model architectures themselves, and understanding deeply how each one works inside.

You will take ownership of project-level delivery, apply advanced compression across a wide range of real models, and have room to grow your scope and technical influence.

Key job responsibilities

Apply and extend compression recipes (knowledge distillation, structured pruning, and post-training and quantization-aware quantization including low-bit and mixed-precision) to assigned models, achieving 20x to 100x compression while preserving model quality. Design and run healing recipes (fine-tuning and distillation that recover accuracy lost to compression), iterating on data mixes, objectives, and training settings until the compressed model meets its quality bar. Track emerging model architectures and dissect how they work internally, so you can choose where to compress, anticipate where accuracy will break, and design recovery strategies grounded in the model's actual structure. Build a library of compression-ready model entries: reference implementations, compression recipes, model cards, and benchmark results that partner teams can run self-service to produce deployment-ready artifacts for edge and cloud targets. Define the datasets, benchmarks, and KPIs that matter for your models, and build evaluation methodology that makes accuracy, latency, memory, and cost trade-offs explicit. Run fast feasibility gates on new model families and modalities before committing to long efforts, and pivot early when a candidate does not clear the bar. Capture platform friction as high-signal feedback: minimal reproductions and tracked fix requests that help platform and compression-science partners root-cause issues, so partner teams never rediscover the same blockers. Write reproducible, testable, well-documented code that meets the SDE I bar, so your recipes and results can be reproduced and built on by others. Collaborate with Applied Scientists, platform and compiler engineers, hardware architects, and partner teams; mentor interns and help newer teammates ramp up. Where appropriate and not precluded by business considerations, publish and present on Amazon's behalf at top ML venues such as NeurIPS, ICLR, and MLSys.

Reasons to use Rodeo

I’m in my final year doing Economics and I don’t know whether to apply for grad schemes now or do a masters first. What do you think?

Honest answer — it depends on where you want to end up. A lot of top grad schemes (Big 4, civil service, banking) don’t need a masters. Let’s look at the ones you’d be competitive for now, and we can decide if a masters actually adds anything.

Also worth knowing: most autumn 2026 applications are open now. Timing matters more than you think.

Start with a chat, not a search bar

Grad scheme, placement, apprenticeship? Not sure what you want yet — that's fine. Your agent talks it through with you and turns "I have no idea" into a shortlist.

P

Graduate Consultant — 2026 Scheme

PwC·London, UK
£35,000/yr

Why you're a good match

Strong

Your economics background and your summer at a regional bank line up with what PwC looks for on the consulting scheme. Applications close in four weeks.

See breakdown
Save jobNot relevant
View details

It searches the market for you

Every day your agent scans the market matching roles against what actually matters to you, not just keywords on a CV.

Why you're a good match

You’ve got the grades and the economics background, and your bank internship is exactly the experience this scheme looks for. Apply soon — deadlines close within the month.

See breakdown
Strong

Experience fit

Your summer at the bank plus your econometrics coursework map directly to the day-one responsibilities on this scheme — client modelling, market briefings, and deal support.

See breakdown
Strong

Only hits

No noise. No "maybe this fits." Just roles with a clear explanation of why they're right — and where to focus when applying.

A day in the life

You pick up a vision-language model whose vision tower needs to fit tight memory, latency, and cost budgets for deployment. You configure a quantization-aware training run at the team's target compression ratio, then check the compressed checkpoint against a visual reasoning benchmark and find it recovers only part of the baseline accuracy. You design a healing run to close the gap, tuning the data mix and training objective to fine-tune the compressed model back toward the teacher's quality. The next checkpoint clears most of the gap but still lands short, so rather than assume the recipe is at fault, you dig into the evaluation harness and discover a benchmark filter is misaligned, deflating the score. You fix the filter, re-run, and confirm the healed model lands where the science predicts. You then package the work as a reusable model entry (recipe, model card, benchmark numbers, and a reference implementation a partner team can run on their own) and file a minimal reproduction of the harness bug so no one rediscovers it.

A typical week mixes hands-on compression and evaluation with design discussions alongside fellow scientists and platform engineers. You run a fast feasibility gate on a new model family before committing to a long effort, profile a compressed model to confirm a real throughput gain, and turn a recurring friction point into a reusable pattern. You work in a small, fast-moving team where every recipe you harden compounds across future models and every partner you unblock ships faster.

About The Team

We compress frontier models 20x to 100x and put them in the hands of millions of customers, everywhere from your pocket to the cloud: the device in your hand, the Echo on your counter, and the services behind them. The models the industry shipped last month, we are shrinking this month, across language, vision, speech, and omni. That is the job: take the best models in the world and make them small enough, fast enough, and cheap enough to run everywhere, without giving up the intelligence that makes them worth running.

Get help with your application

Your very own career expert that helps elevate your application to the next level.

Get help applying for this job

Edge AI Model Studio is the team that makes it real. We are the expert users of a compression platform that most of Amazon cannot yet wield, and our mission is to change that, turning weeks of expert intuition into recipes anyone can run. We are small, we move fast, and we own our work end to end: a result counts only when it ships with a recipe, benchmarks, and an artifact a partner team can run without us. Every recipe we crack compounds across every model that follows. If you want your science in real products at real scale, and you want to put the frontier of generative AI in the hands of millions of customers, come build it with us.

Basic Qualifications

Master's degree, or a PhD and experience in CS, CE, ML or related field Experience programming in Java, C++, Python or related language Experience in patents or publications at top-tier peer-reviewed conferences or journals Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning

Preferred Qualifications

Experience with multimodal and omni models: vision-language models, audio-language or speech models, or omni architectures that jointly process text, audio, and video. Experience with neural network compression techniques (quantization, knowledge distillation, structured pruning, low-rank factorization) for resource-constrained deployment. Familiarity with mixed-precision training and inference (FP16, BF16, FP8, INT8, INT4) and low-bit quantization. Experience with edge deployment, model compilation, or inference optimization, and an understanding of hardware-aware trade-offs. Experience with large-scale ML systems, including profiling, debugging, and reasoning about system performance.

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Company - Evi Technologies Limited

Job ID: A10442934

Trusted by 25,000+ job seekers

“It took my CV and asked me questions relevant to understanding what kind of jobs to suggest for me. Suggestions were almost perfect. Jobs were exactly what I’ve been looking for.”

Jessica, London

Get help applying for this job

Skills

Deep Learning
Model Compression
Quantization
Knowledge Distillation
Structured Pruning
Model Evaluation
Python
C++
Java
Machine Learning
Neural Networks
Benchmarking
Data Analysis
Model Architecture
Performance Optimization
Collaboration

Location

Cambridge, England, United Kingdom

Sign up to applySee more jobs like this