DevOps Engineer – Texas A&M Genomics & Bioinformatics Service

Job ID: 103450
Job date: 2017-08-29
End Date:

Country :

[Click Here to Access the Original Job Post]

Job Description:
We're looking for a technical problem solver to work on our bioinformatics team to build out and refine the automation methods for our large-scale data intensive systems and computational pipelines. You will join a team of Bioinformaticians building out data analysis pipelines and running them at scale. You will work across multiple functional areas to support automation and operations of these activities. You will have the opportunity to work with a number of automation tools across the stack and use the latest technologies. You will join a team of innovative engineers and scientists who will keep you challenged in our demanding environment as we support cancer researchers across the world.

This role focuses on the Genomic Data Commons, which by its nature lies at the intersection of cutting edge research and production systems, both in terms of the bioinformatics and the computer science principles being utilized. The Genomic Data Commons is currently the world's largest collection of harmonized cancer genomics data. Developing a deep technical and quantitative understanding of the system, software, and security architecture will be critical to success in this role.

You will focus on system availability, performance, resource management, and capacity monitoring, along with installation, configuration, and operations procedures. You will be given broadly defined goals and expected to work collaboratively across functional teams to determine best methods and technologies for achieving objectives. Your work will be informed by quantitative models for understanding and improving the overall performance of the system and by working with engineers to understand pain points. You will build out proof of concept environments and report on design outcomes to inform rapid technology advancement.

Key responsibilities include:

Automation Frameworks - Build out and maintain automation frameworks across systems, software, data management, and security aspects of a complex platform across on-premise and public cloud environments with a mix of best practices and custom solutions
Production Support - Triage, research, communicate, address production incidents
Production Monitoring - Wrangle disparate system monitoring assets and develop common analytics to inform optimization, define benchmarks and confidence intervals, and forecast to proactively mitigate production incidents
Build Monitoring - Troubleshoot source code management and deployment issues and participate in continuous delivery objectives
Security Automation - Assist with the automation of our security and compliance procedures.
Stay abreast of broad technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.
Other duties as assigned.

This at-will position is wholly or partially funded by extramural funds (e.g., grant, gift, endowment) which is renewed under provisions set by the granter. Employment will be contingent upon the continued receipt of these extramural funds and satisfactory job performance.

Education:

Bachelor's degree in computer science, mathematics, statistics, or engineering required.
Advanced degree in computer science, mathematics, statistics, engineering, or a relevant quantitative field preferred.

Experience;

Minimum two (2) years experience developing infrastructure, configuration and/or deployment automation required.
Hands-on scripting experience (Bash, Python, or other dynamic language) required.
Unix/Linux programming or system administration experience required.
Experience with AWS (EC2/S3/Glacier) preferred.
Interal cloud (OpenStack) experience preferred.
Experience with configuration management utility (Chef, Puppet, Ansible) preferred.
Experience with F5 or other load balancing technologies preferred.
Experience with source control and build systems (SVN, Git, Jenkins, etc) preferred.
Experience with Virtualization (Hyper-V, Docker) preferred.
Experience with log aggregation tools (ELK stack, Splunk) preferred.
Experience with security frameworks (FISMA, NIST, FIPS) preferred.
Experience working in an agile environment preferred.

Competencies:

Ability to lead across a collaborative team environment required.
Ability and willingness to acquire new programming languages, statistical and computational methods, and background in research area required.
Ability to prioritize and manage workload to meet critical project milestones and deadlines required.
Confidentiality related to sensitive matters such as strategic initiatives, trade secrets, quiet periods, and scientific discoveries yet to be put in the public domain required.
Ability to take a broad plan and break it into incremental tasks and oversee the completion of each task required.
Ability to come into a team used to minimal supervision and oversight and ensure accountability for deliverables and outcomes required.
Ability to persuade others to adapt new structures or systems in order to meet objectives required.

Requeriments :

Technical Bachelor

Skills :

Additional Info:
The Center for Data Intensive Science at the University of Chicago is a research center pioneering translational data science to advance biology, medicine, and environmental research.

Data driven research approaches require interdisciplinary innovation in computing technology, algorithms, and statistical models. The growing volume of data available necessitates advances in the sophistication of these methods. Our work centers around developing instruments to integrate commons of complex data with cloud computing technology. We architect large scale commons of research data, computing resources, applications, tools, services. Our guiding principles center around open data, open-source software, and open infrastructure. Through this approach, we can more effectively use data at scale to study and pursue scientific inquiry in the areas of biology, medicine, healthcare, and the environment.

We are leaders in data sharing, democratizing access for the broader research community and accelerating discovery. Our leadership emerged with the launch of the first open-source cloud-based computational research platform recognized as an NIH Trusted Partner, achieving rigorous data quality and data management service requirements. Today we offer over seven petabytes of rich research data through the following data commons platforms:

- NCI Genomic Data Commons

- Bionimbus Protected Data Cloud

- Blood Profiling Atlas for Cancer

- OCC Environmental Data Commons

- Open Science Data Cloud

We are based in Chicago, but our work engages collaborators from across the world.

[Click Here to Access the Original Job Post]