Sr. Site Reliability Engineer #9161

Share this job

Sr Site Reliability Engineer. Tempe, Austin or Dallas

Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. As a Senior Site Reliability Engineer you will be responsibility for performance, reliability, and availability of critical applications for the Company.

Skills and requirements:

Be part of the team that owns the availability, performance, and reliability of customer deployments'
Drive adherence to SLAs through monitoring, alerting, and scaling
Deploy, maintain, support, and troubleshoot critical, large-scale customer infrastructure deployments in private and public cloud
Dive deep into issues and outages to establish root causes and communicate them to your business partners
Design and document automated procedures
Partner with the Security team to ensure confidentiality, integrity and availability of customer data and deployments

The ideal candidate will have experience and qualifications for planning and managing operations infrastructure, including:

Experience planning and executing site deployments (AWS, private cloud).
Expertise automating system administration tasks with scripting tools (Python or shell preferred).
Aptitude for analyzing and troubleshooting operating system, networking, configuration, and performance problems.
Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP.
Ability to install, configure and maintain Linux hosts and popular open source applications such as Nginx, Apache HTTPd, Apache Tomcat, Postfix, and MySQL server.
Experience with monitoring and automation tools such as Ansible, Splunk, Zabbix, etc.
Ability to communicate clearly with both technical and non-technical staff.
Familiar with system hardening and server security best practices.

Qualifications:

A bachelor’s degree is required, preferably in Computer Science, Software Engineering, or other related engineering discipline.
AWS Certified with 3+ years of hands on extensive experience in AWS Cloud Operations and experience in design & implementation of complex distributed applications and infrastructure.
5+ years of real work deployment experience in core infrastructure technologies including compute (Windows), storage (SAN/NAS), networking, databases (Oracle and/or SQL), security, and management.
For the last 2+ years, hands-on experience with deploying cloud solutions such as AWS and others.
Understand performance and availability requirements; working with Software Engineering to define deployment, configuration, and monitoring requirements.
Experience maintaining complex systems in a cloud environment
Ability to create meaningful metrics and alerting for service health monitoring
Reducing manual effort through automation with scripting or programming languages
Skilled with configuration management and automation frameworks
Proficiency driving root cause analyses to meaningful improvements
Leading troubleshooting efforts with production/non-production systems.
Participating as part of a 24x7 on call rotation
Experience working in a high-growth environment
Hands on Kubernetes skills is nice to have
Cybersecurity experience (e.g. Infrastructure, application, system, or compliance) is nice to have
A strong working knowledge of Linux variants is nice to have

Apply for this job