Lloyds Banking Group Logo

Lloyds Banking Group

ML/AI Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
Manchester, Greater Manchester, England
Mid level
In-Office
Manchester, Greater Manchester, England
Mid level
The ML/AI Engineer will build and maintain scalable systems for the machine learning lifecycle, focusing on Kubernetes orchestration, CI/CD automation, model deployment, and ensuring reliability and efficiency. Responsibilities include automating deployment, monitoring, and maintaining models in production environments.
The summary above was generated by AI

End Date

Friday 27 March 2026

Salary Range

£72,702 - £80,780

We support flexible working – click here for more information on flexible working options

Flexible Working Options

Hybrid Working, Job Share

Job Description Summary

.

Job Description

JOB TITLE: ML/AI Engineer

SALARY: £70,929 - £85,000 per annum

LOCATION: Manchester

HOURS: Full-time – 35 hours

WORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Manchester office.

About this opportunity…

Exciting opportunity for a hands-on ML/AI Engineer to join our Data & AI Engineering team. You’ll build, automate, and maintain scalable systems that support the full machine learning lifecycle. You will lead Kubernetes orchestration, CI/CD automation (including Harness), GPU optimisation, and large‑scale model deployment, owning the path from code commit to reliable, monitored production services

This is a unique opportunity to shape the future of AI by embedding fairness, transparency, and accountability at the heart of innovation. You’ll join us at an exciting time as we move into the next phase of our transformation. We’re looking for curious, passionate engineers who thrive on innovation and want to make a real impact.

About us…

We’re on an exciting journey and there couldn’t be a better time to join us. The investments we’re making in our people, data, and technology are leading to innovative projects, fresh possibilities, and countless new ways for our people to work, learn, and thrive.

What you’ll do…

  • Compose, build, and operate production‑grade Kubernetes clusters for high‑volume model inference and scheduled training jobs.

  • Configure autoscaling, resource quotas, GPU/CPU node pools, service mesh, Helm charts, and custom operators to meet reliability and efficiency targets.

  • Implement GitOps workflows for environment configuration and application releases.

  • Build CI/CD pipelines in Harness (or equivalent) to automate build, test, model packaging, and deployment across environments (dev / pre‑prod / prod).

  • Enable progressive delivery (blue/green, canary) and rollback strategies, integrating quality gates, unit/integration tests, and model‑evaluation checks.

  • Standardise pipelines for continuous training (CT) and continuous monitoring (CM) to keep models fresh and safe in production.

  • Deploy and tune GPU‑backed inference services (e.g., A100), optimise CUDA environments, and leverage TensorRT where appropriate.

  • Operate scalable serving frameworks (NVIDIA Triton, TorchServe) with attention to latency, efficiency, resilience, and cost.

  • Implement end‑to‑end observability for models and pipelines: drift, data quality, fairness signals, latency, GPU utilisation, error budgets, and SLOs/SLIs via Prometheus, Grafana, and Dynatrace.

  • Establish actionable alerting and runbooks for on‑call operations; drive incident reviews and reliability improvements.

  • Operate a model registry (e.g., MLflow) with experiment tracking, versioning, lineage, and environment‑specific artefacts.

  • Enforce audit readiness: model cards, reproducible builds, provenance, and controlled promotion between stages

What you’ll need…

  • Strong Python for automation, tooling, and service development.

  • Deep expertise in Kubernetes, Docker, Helm, operators, node‑pool management, and autoscaling.

  • CI/CD expertise having hands‑on experience with Harness (or similar) building multi‑stage pipelines; experience with GitOps, artefact repositories, and environment promotion.

  • Practical experience with CUDA, TensorRT, Triton, TorchServe, and GPU scheduling/optimisation.

  • Proficiency in Prometheus, Grafana, Dynatrace defining SLIs/SLOs and alert thresholds for ML systems.

  • Experience operating MLflow (or equivalent) for experiment tracking, model bundling, and deployments.

  • Expert use of Git, branching models, protected merges, and code‑review workflows.

It would be great if you had any of the following…

  • Experience with GCP (e.g., GKE, Cloud Run, Pub/Sub, BigQuery) and Vertex AI (Endpoints, Pipelines, Model Monitoring, Feature Store).

  • Hooks for prompt/version management, offline/online evaluation, and human‑in‑the‑loop workflows (e.g., RLHF) to enable continuous improvement.

  • Familiarity with Model Context Protocol (MCP) for tool interoperability, plus Google ADK, LangGraph/LangChain for agent orchestration and multi‑agent patterns.

  • Ray, Kubeflow, or similar frameworks.

  • Experience embedding controls, audit evidence, and governance in regulated environments.

  • Experience with GPU efficiency, autoscaling strategies, and workload right‑sizing.

About working for us…

Our focus is to ensure we're inclusive every day, building an organisation that reflects modern society and celebrates diversity in all its forms. We want our people to feel that they belong and can be their best, regardless of background, identity or culture. And it’s why we especially welcome applications from under-represented groups. We’re disability confident. So if you’d like reasonable adjustments to be made to our recruitment processes, just let us know.

We also offer a wide-ranging benefits package, which includes…

  • A generous pension contribution of up to 15%

  • An annual bonus award, subject to Group performance

  • Share schemes including free shares

  • Benefits you can adapt to your lifestyle, such as discounted shopping

  • 30 days’ holiday, with bank holidays on top

  • A range of wellbeing initiatives and generous parental leave policies

Want to do amazing work, that’s interesting and makes a difference to millions of people? Join our journey!

At Lloyds Banking Group, we're driven by a clear purpose; to help Britain prosper. Across the Group, our colleagues are focused on making a difference to customers, businesses and communities. With us you'll have a key role to play in shaping the financial services of the future, whilst the scale and reach of our Group means you'll have many opportunities to learn, grow and develop.

We keep your data safe. So, we'll only ever ask you to provide confidential or sensitive information once you have formally been invited along to an interview or accepted a verbal offer to join us which is when we run our background checks.  We'll always explain what we need and why, with any request coming from a trusted Lloyds Banking Group person. 

We're focused on creating a values-led culture and are committed to building a workforce which reflects the diversity of the customers and communities we serve. Together we’re building a truly inclusive workplace where all of our colleagues have the opportunity to make a real difference.

Top Skills

Ci/Cd
Cuda
Docker
Dynatrace
Grafana
Harness
Helm
Kubernetes
Mlflow
Prometheus
Python
Tensorrt
Torchserve
Triton

Similar Jobs

Yesterday
In-Office or Remote
London, Greater London, England, GBR
Senior level
Senior level
Software
The Staff Software Engineer - AI/ML will lead the development of AI models and solutions, drive cross-team collaboration, provide technical leadership, and mentor engineers. Responsibilities include managing AI/ML initiatives, enhancing model performance, ensuring compliance, and fostering a collaborative culture.
Top Skills: Amazon BedrockAWSJavaPythonRustSagemakerVertex Ai
Yesterday
In-Office or Remote
London, Greater London, England, GBR
Senior level
Senior level
Software
Lead the development of AI/ML solutions, ensuring best practices for model deployment and mentoring engineers, while collaborating cross-functionally.
Top Skills: Amazon BedrockAWSClaude CodeDatabricksDatadogDockerGitGrafanaJavaK8SMlflowOtelPythonRustSagemakerTerraformVertex Ai
2 Days Ago
Remote or Hybrid
12 Locations
Senior level
Senior level
Information Technology • Software
The Principal AI/ML Engineer will lead the design, implementation, and deployment of AI systems, mentor junior engineers, and ensure model performance and reliability in production environments.
Top Skills: AWSAzureGCPPythonPyTorchScikit-LearnTensorFlow

What you need to know about the Manchester Tech Scene

Home to a £5 billion digital ecosystem, including MediaCity, which consists of major players like the BBC, ITV and Ericsson, Manchester is one of the U.K.'s top digital tech hubs, at the forefront of advancements in film, television and emerging sectors like as e-sports, while also fostering a community of professionals dedicated to pushing creative and technological boundaries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account