Western Digital’s High Performance Computing and Cloud Computing environments are key to bringing new storage solutions to market. As a Sr. Sys Admin in the High Performance Computing (HPC) and Cloud Computing Infrastructure team, you will be at the heart of Western Digital’s engineering and product development process, delivering the HPC and Cloud Computing infrastructure that empowers engineering teams to develop new storage technologies and deliver high quality products to market quickly. The sheer diversity of Western Digital’s products (solid state solutions and hard disk drives for consumer and data center markets, S3 compatible data center archival systems and more) requires a variety of development applications and HPC computing solutions be available to engineering teams worldwide. Western Digital’s use of Cloud Computing is a key capability for delivering HPC, Big Data computing and rapid scale deployments of optimized infrastructure solutions worldwide. This position will ensure Western Digital’s success by partnering with the worldwide engineering teams to deliver the right scalable solutions and computing infrastructure for physics based modeling, CFD, CAE, FEA, and EDA applications. This large catalogue of engineering applications represents the many engineering and development disciplines here at Western Digital. Our computing solutions are provided from a true hybrid enterprise IT environment, scaling from on-premise clusters to large clusters in colocation data centers to hyperscale computing solutions (a.k.a the “Cloud”). We manage computing with both GPU and CPU based clusters and extend to thousands of processor cores to meet the many demands of our engineering teams.Utilizing Cloud Computing, Western Digital has unique opportunities for rapid, dynamically scalable computing infrastructure to meet needs ranging from Big Data analytics to supporting the Western Digital MyCloud environment. Proper use of Cloud Computing allows teams to dramatically change and improve workflows while simultaneously decreasing IT TCO for Western Digital. Position Responsibilities:
- Be part of a team to deploy and support the world-class High Performance Computing (HPC) solutions (inclusive of applications as needed) for the diverse, worldwide team of Western Digital development engineers as part of the Western Digital IT team, with a strong emphasis on Linux/Unix computing solutions. The HPC solutions are deployed on-premise, in colocation centers and in the Cloud. The HPC solutions include both CPU and GPU based computing solutions.
- Be an SME/Lead for UNIX/Linux operations worldwide.
- Be an SME/lead for engineering application deployment support for Western Digital’s development teams worldwide.
- As part of IT, bring best of class scalable solutions deployment expertise that aligns to corporate best practices, economies of scale, security and governance.
- Be an advocate for Open Source HPC and Linux solutions deployment where appropriate and be knowledgeable of relevant Open Source HPC and engineering simulation solutions.
- Maintain a collaborative, team-based cross-functional management approach that fosters engagement, consensus and collaboration.
Work with the IT Leadership to support solution development, culture creation and operational efficiencies in alignment with long term planning goals.
- BS/BA preferred or equivalent experience
- 10+ years of relevant industry experience.
- Experience with installation, management and use of software such as compilers, scientific applications, batch schedulers, job resource, and application license and utilization managers.
- Experience with the installation, configuration, management, and use of high performance computers such as clusters of Linux boxes, large SMPs and large scale solutions of 1000’s of computing cores.
- 3+ years of direct experience with Amazon Web Services and/or Microsoft Azure environments.
- 10+ years of relevant experience in enterprise level Linux/UNIX systems administration provisioning, configuration, troubleshooting, and monitoring (Nagios, Zenoss, SNMP, Ancible)
- Excellent understanding of current methodologies in high performance operations and large-scale implementations.
- Must possess strong documentation skills and can work with rapid change and fast pace.
- Proven ability to influence and/or lead high performing/ geographically dispersed teams.
- Excellent analytical, problem solving, and troubleshooting skills to manage complex process and technology issues.