Pallon Logo

Pallon

Infrastructure Engineer (GPU Cluster)

Job Posted 17 Days Ago Posted 17 Days Ago
Be an Early Applicant
Remote
2 Locations
Senior level
Remote
2 Locations
Senior level
As a Senior Infrastructure Engineer at Pallon, you will architect, manage, and optimize infrastructure solutions, including GPU clusters and cloud environments, while collaborating with engineering teams to ensure system performance and availability.
The summary above was generated by AI

💧 About Pallon

At Pallon, a spin-off from ETH Zurich, we’re creating AI that automatically detects defects in sewer inspection videos and advises cities on when & how to fix them. By providing more precise, objective data, we aim to fix wastewater leaks, reduce CO2 emissions, and prevent urban flooding. Our mission is to make cities more sustainable and resilient.

💪 Your Role

  • Own All Infrastructure Decisions: Be the architect and decision-maker for our entire infrastructure, from our high-density compute cluster to our cloud environment.

  • Architect and Manage a Cutting-Edge GPU Cluster: Design and build our powerful GPU cluster, optimizing it for maximum deep learning and computer vision performance.

  • Partner with Engineering Teams: Work directly with the computer vision and platform teams to solve their infrastructure challenges.

  • Maintain high availability, security, and performance: Maintain high availability, security, and performance of our production systems and data pipelines

  • Roll Up Your Sleeves and Get Things Done: From strategic planning to replacing faulty NVMe drives, troubleshooting Kubernetes pod eviction issues, and configuring custom systemd units, you'll handle it all.

🎯 You will be successful if:

  • You have a proven track record (5+ years) of not just implementing, but architecting and deciding on infrastructure solutions, ideally in startup environments.

  • You possess a deep understanding of all levels of the stack, including Linux system administration, cloud infrastructure (container orchestration, infrastructure-as-code), and hardware (server architecture, networking, storage systems).

  • You are comfortable with scripting and programming in various languages.

  • You have a university degree in Computer Science or a related field.

  • You are highly independent and excel at prioritizing your own work, seeking help when needed.

  • You communicate clearly and effectively with engineering teams, translating their needs into practical infrastructure solutions.

  • You are a quick and eager learner, ready to adapt to new technologies and challenges.

  • [Bonus] You have experience with high-performance computing (HPC) environments or machine learning infrastructure.

  • [Bonus] You have experience with data engineering and ETL pipelines.

🚀 Our Tech Stack

Note: we do not require experience in these exact technologies.

  • HPC Cluster: Linux, Nvidia GPUs, Slurm, Infiniband

  • Cloud: Google Cloud Platform, Kubernetes, Docker, Gitlab CI/CD

  • Data Analytics: DBT, BigQuery, Metabase

😎 Benefits & Team Culture

As a part of Pallon, you will:

  • Contribute to a positive impact on society and the environment.

  • Develop a novel product that changes a whole industry.

  • Be part of a motivated, smart, fun, and supportive team of software engineers and AI researchers.

  • Own a part of Pallon and have a part in our success with our Employee Stock Option Plan (ESOP).

  • Work for the Underworld, not the Devil: exploring sewers virtually and in real life during our Pallon offsites.

  • Work from home or enjoy access to our beautiful office space located in Zürich.

Inclusion statement

At Pallon, we highly value equality of opportunity and inclusivity, and we would like to particularly encourage women and candidates from under-represented backgrounds to apply, even if you don’t match with 100% of the requirements.

Top Skills

BigQuery
Dbt
Docker
Gitlab Ci/Cd
Google Cloud Platform
Infiniband
Kubernetes
Linux
Metabase
Nvidia Gpus
Slurm

Similar Jobs

6 Hours Ago
Easy Apply
Remote
28 Locations
Easy Apply
Mid level
Mid level
Artificial Intelligence • Cloud • Information Technology • Machine Learning • Natural Language Processing • Software
The Machine Learning Engineer will create models and pipelines, collaborate on features, and evaluate AI/ML models to enhance Smartling's translation platform.
Top Skills: AWSPythonPyTorchTensorFlow
Yesterday
Easy Apply
Remote
30 Locations
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
As a Senior Fullstack Engineer in Technical Writing at GitLab, you will enhance and maintain the documentation site by developing features, improving performance, collaborating with stakeholders, and supporting the Technical Writing team.
Top Skills: Ci/CdHugoJavaScriptRuby on RailsRubyVue
Yesterday
Easy Apply
Remote
31 Locations
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
The Principal Engineer, Analytics leads technical direction for analytics initiatives, mentors teams, conducts high-complexity problem-solving, and engages with stakeholders to optimize performance and scalability.
Top Skills: Ci/CdClickhouseGoKafkaNatsPostgresRuby On Rails

What you need to know about the Manchester Tech Scene

Home to a £5 billion digital ecosystem, including MediaCity, which consists of major players like the BBC, ITV and Ericsson, Manchester is one of the U.K.'s top digital tech hubs, at the forefront of advancements in film, television and emerging sectors like as e-sports, while also fostering a community of professionals dedicated to pushing creative and technological boundaries.
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account