Senior SRE

Relocation | Abu Dhabi

Location: Abu Dhabi, UAE (relocation required)
Company: https://www.presight.ai/
Employment Type: Full-time, on-site only
Salary Range: $8,000-9,000 (30-34k AED) net, based on interview results

About Presight:

Presight is an ADX-listed public company with Abu Dhabi based G42 as its majority shareholder and is the
region’s leading big data analytics company powered by GenAI. It combines big data, analytics, and AI expertise
to serve every sector, of every scale, to create business and positive societal impact. Presight excels at all-
source data interpretation to support insight-driven decision-making that shapes policy and creates safer,
healthier, happier, and more sustainable societies. Today, through its range of GenAI-driven products and
solutions, Presight is bringing App

Position Overview:

Seeking a meticulous Engineer - Site Reliability who will support the Presight delivery model that empowers
product & technology teams to develop & deliver high-quality products, improve platform infrastructure and
strengthen the reliability of products and solutions.
You play a key role in defining & establishing the delivery model deployed in the development of cutting edge,
next-gen analytics solutions & services at Presight.

Key Responsibilities:

- Manage the infrastructure required to run our solutions deployed to public or private cloud (air-gapped).
- Analyze service performance, identify bottlenecks, and provide measurable improvement plans.
- Maintain the environment’s health by continuously monitoring technical and business metrics, configuring alerts for potential issues, and proactively addressing risks to prevent disruptions
- Deploy application updates with minimal disruption to services
- Identify, evaluate, and conduct proof-of-concepts for new technologies.
- Contribute to the knowledge base.
- Review and modify CI/CD principles and service maturity iteratively, striving for continuous
- improvement

Requirements:

- 5+ years of experience in managing Kubernetes clusters.
- 5+ years of experience in configuring and using monitoring/observability platforms
- Familiarity with at least one type of database
- 5+ years in a SRE/DevOps/Sysadmin/Platform Engineer role

Mandatory skills:

- Strong background in Linux/Unix Administration
- Solid hands-on experience deploying and operating Kubernetes or Openshift clusters
- Experience configuring and maintaining monitoring and observability solutions
- Ability to troubleshoot and resolve complex production issues efficiently, including performing root cause analysis and restoring services quickly during high-pressure incidents or critical outages
- Experience in backing up and restoring various systems
- Working together with project managers and solution architects while serving as subject matter experts
- Implementing basic network security (e.g. configuring VPCs, firewalls/security groups, etc.)
- Understand the dependencies of various GPU cards, and upgrade container images as needed in order to ensure compatibility
- Deploy and operate products provided by third party providers
- Creating releases together with the development team and deploying release packages to all required environments

Nice to Have:

- Good understanding of typical system architecture and interaction between its components
- Experience automating tasks using infrastructure-as-code tools, e.g. Ansible, Terraform
- Thorough understanding of a company's systems, including auxiliary components like caching systems (e.g., Redis, Memcached) and message queues (e.g., RabbitMQ, Kafka)
- Good understanding of databases, e.g. Postgres, Elasticsearch, Clickhouse
- Basic scripting
- Working knowledge of OAuth 2.0, OpenID/OpenID-Connect, SAML 2.0, Kerberos, LDAP

Contact for questions and CV: @ant1kdream