Similar Jobs
Core Responsibilities
- Automate infrastructure provisioning, configuration, and maintenance using Terraform, Ansible, and Python.
- Build, enhance, and maintain CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline for continuous delivery and consistency across environments.
- Implement and optimize monitoring solutions using Datadog, Prometheus, Grafana, and ELK/EFK stacks to ensure high service reliability.
- Develop alerting strategies and escalation paths aligned to service-level objectives (SLOs) and key performance indicators (KPIs).
- Build custom scripts and automation for patching, validation, and system health checks.
- Partner with U.S. SREs and Engineering teams on environment management, change control, and incident response improvements.
- Analyze logs and performance metrics to identify stability issues, optimize cloud costs, and drive continuous improvement.
- Maintain detailed runbooks, SOPs, and documentation supporting operational readiness and knowledge transfer.
- Contribute to open-source or internal tooling that enhances automation, monitoring, or observability capabilities.
- Conduct periodic reliability reviews, performance tests, and failover simulations to validate readiness.
- Support adoption of infrastructure-as-code, immutable environments, and container orchestration (Docker/Kubernetes).
- Promote DevOps and SRE best practices across the engineering organization.
Tools & Technologies
AWS (EC2, S3, Lambda, CloudWatch, IAM, RDS, ECS/EKS), Terraform, Ansible, Python, Bash, Jenkins, GitHub Actions, Docker, Kubernetes, Prometheus, Grafana, ELK/EFK, Loki, Jira, Confluence.
Qualifications
- 5–7 years in SRE, DevOps, or Infrastructure Engineering.
- Bachelor’s degree in computer science or related field of study preferred, or equivalent experience
- Experience supporting U.S. healthcare or other regulated SaaS systems (HIPAA, SOC2, ISO27001).
- Strong scripting and automation (Ansible, Jenkins, Python, Bash, Terraform, CloudFormation).
- Understanding of CI/CD, networking, and secure cloud architecture.
- Proven collaboration with U.S. teams across time zones; clear written and spoken English.
- Familiarity with EHR, HL7/FHIR, or state/federal public health systems preferred.
- Knowledge of data privacy frameworks (HIPAA, HITRUST, GDPR) and ITIL-based change/incident management.
Work Model
- Aligns with U.S. Eastern hours for daily collaboration, stand-ups, and sprint planning.
- Documents work thoroughly to ensure audit readiness and operational transparency.
- Works closely with U.S. SRE leadership on automation priorities, sprint goals, and production readiness activities.
Soft Skills
- Analytical problem-solver with attention to detail.
- Self-driven, collaborative, and process-oriented.
- Excellent communication and time management across distributed teams.
- Passionate about automation, reliability, and continuous improvement.
Example Contributions
- Automated patching pipeline for pre-production validation of security updates.
- Designed Grafana dashboards reducing alert noise by 40%.
- Built Python scripts automating AWS cleanup, saving 15% cloud spend.
- Implemented environment consistency checks improving deployment success rates.
- Introduced CI/CD optimizations reducing release time by 25%.


.jpg)