DeepL Logo

DeepL

Senior Research Data Engineer

Posted 14 Days Ago
Be an Early Applicant
In-Office
Berlin
Senior level
In-Office
Berlin
Senior level
Join the Foundation Model Training team to design and build data pipelines, manage unstructured data, and support AI language technology projects.
The summary above was generated by AI
Meet DeepL

DeepL is a global AI product and research company focused on building secure, intelligent solutions to complex business problems. Over 200,000 business customers and millions of individuals across 228 global markets today trust DeepL's Language AI platform for human-like translation, improved writing and real-time voice translation. Building on a history of innovation, quality and security, DeepL continues to expand its offerings beyond the field of Language, including DeepL Agent - an autonomous AI assistant designed to transform the way businesses and knowledge workers get work done. Founded in 2017 by CEO Jarek Kutylowski, DeepL now has over 1,000 passionate employees and is supported by world-renowned investors including Benchmark, IVP, and Index Ventures.

Our goal is to become the global leader in trusted, intelligent AI technology, building products that drive better communication, foster connections, and create a meaningful impact. To achieve this, we need talented people like you to join our journey. If you’re ready to shape the future of AI and grow your career in a fast-moving, purpose-driven environment, DeepL is your next destination.

What sets us apart

What sets us apart is our blend of cutting-edge AI technology, meaningful work, and a culture where people truly thrive. We’re a team of innovators, researchers, and creators driven by a shared purpose to unlock human potential by making work simpler, smarter, and more connected.

When we share what it’s like to work at DeepL, the reactions are overwhelmingly positive. This might be because of our technology that helps millions of people and businesses communicate and work better every day, or because of the trust, curiosity, and care that shape our culture.

What we know for sure is this: being part of DeepL means joining a team dedicated to innovation, growth, and well-being. Discover more about life at DeepL onLinkedIn,Instagram, and our Blog.

Meet the team behind this journey

DeepL is renowned for its AI products - from language and translation, to enterprise agents. At the core of these products are custom-built algorithms and models that are trained using data. The quality and volume of data are key factors in our success.

You will join our Foundation Model Training team. As a cross-functional team of research scientists and data engineers specialising in machine learning, we develop foundation models and manage the pre-training corpora and associated data preparation pipelines. We work with unstructured data on a petabyte scale. This is a fast-paced and highly competitive field where we face challenging problems at the frontier of research and engineering.

Your responsibilities
  • Work on ambitious frontier research projects as part of a foundation model training research team consisting of research scientists and research data engineers.

  • Architect, design and build scalable data pipelines from the ground up, e.g. for downloading and preparing multimodal unstructured data for training.

  • Build on top of a modern tech stack incl. Kubernetes, Dask, Ray, etc., and make extensive use of actively developing open-source solutions, where needed debugging low level issues and potentially submitting fixes to upstream.

  • Deploy complex Python data solutions to cloud infrastructure, incl. AWS and company data centers (on prem) where you will own operation of data processing at massive scale.

  • Go beyond “Big Data” and ETL. You will engineer and operate large scale data solutions for real-world unstructured data incl. text, code, image and audio modalities.

  • Collaborate with stakeholders, research scientists, other research data engineers and data tooling and platform teams.

  • Raise the standard for excellence and act as owner and champion for the quality and availability of our foundation model training data.

  • Ensure mission-critical reliability of data pipeline jobs, maintain high quality code with documentation and provide a great data product user experience.

Qualities we look for
  • Degree in a scientific or technical field.

  • Previous work experience as Data Engineer or a similar data- and engineering-centric role in a scaled-up tech company with a focus on large-scale unstructured data.

  • Extensive experience with large scale data engineering writing high quality Python code and leveraging the full Python data ecosystem in cloud deployments.

  • Exploratory data analysis, data cleaning, data validation, ideally ML feature engineering for text and other unstructured data.

  • Developing, testing and deploying data pipelines and infrastructure

  • End-to-end ownership of data solution development, operations, testing and quality assurance, ideally responsibility for data products.

  • Experience with distributed computing and parallelization for Python like Dask, Celery and Ray in combination with infrastructure-as-code, ideally Kubernetes on AWS.

  • Excellent communication and collaboration skills in an English-speaking organization.

Ideally, you have domain-specific experiences:

  • LLM training data preparation.

  • NLP, text classification, model-based/GPU workflows.

  • Dynamic workflow orchestration frameworks like Argo Workflows.

  • Linguistics expertise or speaking multiple languages.

  • Fluent in an additional high-performance programming language like C++, Go or Rust.

What we offer
  • Diverse and internationally distributed team: joining our team means becoming part of a large, global community with people of more than 90 nationalities. We're more than just colleagues; we're a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing–we've doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.

  • Open communication, regular feedback: as a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.

  • Hybrid work, flexible hours: we offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.

  • Regular in-person team events: we bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together–literally.

  • Monthly full-day hacking sessions: every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about and get the opportunity to work with other teams–we value your initiatives, impact, and creativity.

  • 30 days of annual leave: we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.

  • Competitive benefits: just as our team spans the globe, so does our benefits package. We've crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.

If this role and our mission resonate with you, but you're hesitant because you don't check all the boxes, don't let that hold you back. At DeepL, it's all about the value you bring and the growth we can foster together. Go ahead, apply—let's discover your potential together. We can't wait to meet you!

We are an equal opportunity employer

You are welcome at DeepL for who you are—we appreciate authenticity here. Our product is for everyone, and so is our workplace. The more voices we have represented and amplified in our business, the more we will all succeed, contribute, and think forward! So bring us your personal experience, your perspectives, and your background. It’s in our diversity that we will find the power to break down language barriers in the world.

Top Skills

Argo Workflows
AWS
Celery
Dask
Kubernetes
Python
Ray

Similar Jobs

5 Hours Ago
Remote or Hybrid
Germany
Senior level
Senior level
Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
The role involves driving sales in Germany, focusing on enterprise accounts and selling SaaS solutions to C-suite executives. Responsibilities include full sales cycle management and building strong relationships with clients.
Top Skills: SaaSSalesforce
5 Hours Ago
Easy Apply
Hybrid
Berlin, DEU
Easy Apply
Mid level
Mid level
AdTech • Artificial Intelligence • Digital Media • Marketing Tech • Social Media • Software • Generative AI
As a Customer Success Manager, you will manage customer accounts, enhance online marketing strategies, resolve advertising issues, and maintain relationships with major platforms. You'll drive growth by understanding customer needs and collaborating with teams to offer strategic advice and problem-solving.
Top Skills: GoogleMetaPinterestSmartly.IoSnapchatTiktok
5 Hours Ago
Easy Apply
Hybrid
Berlin, DEU
Easy Apply
Mid level
Mid level
Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
The role involves implementing and scaling the Navan Expense solution, partnering with various teams to enhance customer experience and drive revenue in the enterprise market.
Top Skills: Erp Systems

What you need to know about the Manchester Tech Scene

Home to a £5 billion digital ecosystem, including MediaCity, which consists of major players like the BBC, ITV and Ericsson, Manchester is one of the U.K.'s top digital tech hubs, at the forefront of advancements in film, television and emerging sectors like as e-sports, while also fostering a community of professionals dedicated to pushing creative and technological boundaries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account