Manager, Site Reliability Engineering- Commerce Platforms

Genuine Parts Company

Full Time

Atlanta, GA 30339

Posted

Apply This Job

Job description

Company Background:

Genuine Parts Company (“GPC” or the “Company”), founded in 1928 and based in Atlanta, Georgia, is a leading specialty distributor engaged in the distribution of automotive and industrial replacement parts and value-added services. The Company operates a global portfolio of businesses with more than 10,000 locations across the world. GPC has approximately 50,000 global employees. The Company has operations in the United States, Canada, Mexico, Australia, New Zealand, Indonesia, Singapore, France, the U.K., Germany, Poland, the Netherlands, Belgium, Spain and China.

Position Purpose:

We are seeking a highly motivated, experienced Manager, Site Reliability Engineering to join the world’s leading distributor of automotive and industrial replacement parts and value-added services operating 5,500+ locations and servicing more than 20,000 locations in the U.S and Canada. This role will report to the Director, Platform Engineering.

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. SRE ensures that GPC’s services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on our systems capacity and performance.

The SRE team is responsible for driving service uptime and quality in 24 x 7 environments. The team will enhance observability, troubleshoot applications, support cloud-based transformations, develop tools & automation, and impact the design of the future platform architecture. Additionally, they will be involved in developing support standards for all applications and adheres to those plans to provide the necessary level of production SLO/SLI/SLAs.

This role supports multiple technology areas and requires partnerships with teams from multiple locations, skill sets, and backgrounds. As such, we are seeking a leader with strong communication skills in addition to a solid foundation of technical skills, analytical abilities, and end-to-end troubleshooting techniques.

Responsibilities:

Lead a team of Software/Systems Engineers on projects for users and be directly responsible for uptime.
Own end-to-end availability, performance of key services, track process efficiency and service availability using established Key Performance Indicators (KPIs)

Pro-actively detecting, monitoring and alerting against any stability or reliability issues and managing incidents appropriately within defined SLAs
Lead incident resolution and problem management
Direct and manage escalation and resolution calls with members from various teams
Communicate progress and resolution to appropriate stakeholders and leadership
Conduct Post-incident reviews, document findings, and take action on learnings
Review incident trends, identify repeating issues, perform root cause analysis, and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions
Lead by example, mentor the team and establish credibility through quality technical execution.
Manage and optimize on-call rotations across continents, using a follow-the-sun model.
Recommend application changes to improve application performance, reliability, and cost to operate
Work with Engineering to transition applications from one platform to another
Review existing processes and recommend changes or institute new processes as necessary, including observability, alerting, operations, engineering and system tuning, etc.
Generate high-quality documentation, detailing the platform to application architectures and common patterns, runbooks, SOPs, knowledge base etc..
Manage a highly technical employee base and ensure we maintain a high bar for performance and culture

Location:

GPC has two work locations to choose from, Duluth or Atlanta office.
We offer a Flexible Work Policy that permits eligible employees to work remotely

Desired Qualifications & Experiences:

10+ years of relevant work experience in software engineering & technology
At least 5 years’ experience in an SRE or very similar leadership role
Deep expertise in the mentality, processes, and tools needed to deliver SRE principles
Cloud Services experience with Google Cloud / Azure / AWS
Experience with high throughput / low latency / highly available microservice based architecture
Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation.

Architecture-level knowledge of Windows and Linux and Infrastructure systems
Experience with production deployment, monitoring and operational support for enterprise-class applications
Experience working with Continuous Integration/ Continuous Deployment tools
Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
A strong mix of Software Engineer and Operation Support skills.
Eager to learn new technologies and platform patterns
Strong customer service orientation with a focus on managing and exceeding customer expectations
Degree in Computer Science or Engineering fields, or equivalent experience

jackharris.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, jackharris.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, jackharris.com is the ideal place to find your next job.

Save This Job Apply Job

Manager, Site Reliability Engineering- Commerce Platforms

Job description

Intrested in this job?

Related Jobs

All Related Listed jobs

Surgical Technician

Administrative Assistant - Construction

Site Supervisor for School-age program

Machine Operator

Team Member

Warehouse Associate

Assistant Manager - North Indio

RN-Clinic Zionsville Schools

Registered Nurse - General Medicine - Part Time

Team Member