Share this job
Site Reliability Engineer Engineer II - 9901
Apply for this job

Product Support Engineer II Tempe, Az. Or Austin, Texas


Description

The Company is a growing bank in an Innovation economy. As a member of Production Engineering Team, you will be responsible for supporting mission critical applications and building end-to-end observability. As a Site Reliability Engineer II, you will collaborate with other Site Reliability Engineers across Risk Technology vertical of Corporate Systems & Data Organization and providing 24x7 application support enabling our clients to have access to highly available, resilient and performant applications.

Would you like to use your Site Reliability Engineering skills and do you have passion for building instrumentation needed for identifying issues before clients find issues in production? Are you familiar with best practices for application, compute and services, performance monitoring? Do you want to play a key role in improving client experience through “always available” systems architectures?

If you fit the above description, you might be the person we are looking for! We are a group of smart people, passionate about modern tools and technologies, and believe that best-in-class site reliability engineering is critical to The Company’s and its customer success.

 

Responsibilities:

  • Help build end-to-end Observability for Risk Technology Applications
  • Create and Manage Alarms and Dashboards using App Dynamics and Splunk
  • When Alerts get triggered, troubleshoot the application and help restore the service
  • Contribute to creating technical documentation like Runbooks, Standard Operating Procedures (SOP) for use by engineers and other team members 
  • Participated in blameless root-cause analyses for all the incidents, learn from the mistakes and develop actionable monitoring alerts
  • Solve problems related to operations of mission-critical services and build automation to proactively detect
  • Participate in incident calls with strong sense of urgency to triage and restore service
  • Participate in release calls and help in the deployment, validation and troubleshooting of any unforeseen issues
  • Be on-call rotation for application support

 

Technical Skills: 

  • Bachelor’s Degree in Computer Science, Engineering or a related technical discipline recommended
  • Minimum of 3-5 years of hands-on experience in a technical role developing or supporting applications for large corporations
  • Extensive experience building Telemetry using tools like AppDynamics & Splunk for both proactive and reactive monitoring 
  • Demonstrable skillset in scripting languages, e.g., Bash, PowerShell, demonstrable skillset in programming languages, preferably JavaScript or Python
  • Experience with System Administration with Linux (RHEL/CentOS) including Microsoft Active Directory, and LDAP integration
  • Experience in eliminating toil by automating mundane tasks
  • Experience with DevOps tools such as Jenkins, Maven, GitLab, SonarQube for on-premise applications or SaaS vendor products
  • Experience in supporting OFSAA KYC, Fircosoft, SAS AML and Enterprise Fraud Systems a plus
  • A team player capable of high performance, flexibility in a dynamic working environment
  • Effective oral and written communication skills as well as positive, client-focused interpersonal skills and attitude  
  • Experience in Incident & Problem Management processes with good exposure to troubleshooting
  • Preferred knowledge with Web development, JEE & Enterprise Technologies: JMS, JDBC
  • Hands-on experience in RDBMS architecture and performance tuning RDBMS like Oracle/SQL Server 
  • Strong organizational and Incident, Problem Management skills. 

 

 

Apply for this job
Powered by