Data Reliability Engineer

Exciting data SRE role

Your new company

A leading big 4 bank are seeking a data SRE to work on their enterprise data warehouse.


Your new role

As a Site Reliability Engineer (SRE) with expertise in Data Engineering and PySpark, you will play a crucial role in ensuring the reliability, scalability, and performance of our data infrastructure and applications. Your key responsibilities will include:
  • Monitoring and maintaining the health, performance, and availability of data processing systems and infrastructure.
  • Collaborating with data engineers, software developers, and stakeholders to ensure seamless integration and deployment of data solutions.
  • Automating and optimizing system reliability and efficiency through scripting and tooling.
  • Troubleshooting and resolving issues related to data processing, infrastructure, and application performance.
  • Implementing best practices for data security, retention, backup, and recovery.
  • Providing production support, including smoke checks, incident management, and change control.
  • Conducting root cause analysis to identify and implement corrective actions for recurring issues.
  • Maintaining documentation, runbooks, and troubleshooting guides for data systems and processes.
  • Supporting data projects such as system migrations, upgrades, and expansions.
  • Participating in on-call rotations, including weekend and holiday support as needed.


What you'll need to succeed

  • Bachelor's degree in Computer Science, Data Science, or a related field.
  • Experience: Proven track record as an SRE or in a similar role within Data Engineering, particularly in managing Spark platforms.
  • Technical Expertise: Proficiency in PySpark and related big data technologies (e.g., Spark, Hadoop, Hive). Strong understanding of data pipelines built using PySpark. Debugging expertise for Spark issues at both platform and application levels. Experience with data processing orchestration and scheduling tools. Good knowledge of the Spark ecosystem and distributed computing principles. Strong Linux, networking, CPU, memory, and storage fundamentals.
  • Soft Skills: Excellent problem-solving skills with a keen eye for detail. Strong communication and collaboration abilities. Ability to work efficiently under pressure in a fast-paced environment.


What you need to do now


If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.

If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion on your career.



LHS 297508

Summary

Job Type
Permanent
Industry
Banking & Financial Services
Location
NSW - Sydney CBD
Specialism
Technology
Ref:
2924981

Talk to a consultant

Talk to Marcus Castle, the specialist consultant managing this position, located in Sydney City
Level 13, Chifley Tower, 2 Chifley Square

Telephone: 0292492285

Similar jobs to Data Reliability Engineer

  • Senior Full-Stack Javascript Engineer

    Full-Stack Javascript Engineer (React, Node, Typescript, AWS, Typescript) up to $180K plus super
    NSW - Sydney CBD
  • Splunk Engineer

    Contract Splunk Ops Engineer
    NSW - Sydney CBDCompetitive Day Rate
  • Senior Public Domain Engineer

    Senior Public Domain Engineer
    NSW - Sydney CBD
  • Embedded Linux Engineer

    Embedded Linux Engineer
    NSW - Northern SydneyExcellent Salary Package
  • Senior Data Engineer

    Senior Data Engineer
    NSW - Sydney CBD