Lead Data Engineer – Texas A&M Genomics & Bioinformatics Service

Job ID: 103534
Job date: 2017-09-06
End Date:

Country :

Role : Other

[Click Here to Access the Original Job Post]

Job Description:
A Lead Data Engineer in the Center for Data Intensive Science leads data engineering activities for one or more programming projects under direction of the Lead Software Architect. Responsibilities include requirements, design, implementation, deployment/delivery, and support for data services. Leads data services activities and manages dependencies in collaboration with other functional areas including systems, bioinformatics, quality assurance, and security. Partners with project management and project leads. Leads team efforts related to software development for data services and oversees the work of other technical team members. Provides technical oversight and develops standards, guidelines, and processes for relevant applications. Reviews the design and code development of key architectural components. Escalates key decisions to senior management and stakeholders and consistently communicates pertinent information as needed, using good judgement on what matters to escalate.

Serves on an architecture counsel, contributing to the technical leadership of the Center. Contributes to broader decisions on project and infrastructure needs, including the evaluation of server technologies, languages, platforms, and frameworks. Develops technical diagrams, project plans, timelines, and resources allocation. Works within the framework of the project management methodology used for the project, generally agile. Mentors other engineers on the team working in the relevant area(s).

Works with cloud computing infrastructure primarily based on OpenStack to design, develop, maintain, and evaluate software applications to meet business and technical requirements. Works in Linux-based systems in Python and C/C++, with some Java, Ruby and various web programming. Oversees code testing where appropriate and ensures standards are met. Uses analytics and metrics to drive system optimization and operational efficiencies. Works with users, collaborators, and technical staff to resolve problems and respond to feedback regarding potential improvements and enhancements. Ensures appropriate documentation. Serves as a liaison with internal and external collaborators on one or more research projects. Technical scope may include the full stack from systems to algorithms to user interfaces. Research projects may span management, sharing, and provenance of large data sets; resource allocation and scheduling for cloud computing, large scale pipelining of next-generation sequence analysis, transfer programs/protocols for high-speed networks and resource visualization.

Perform other duties as assigned.

This at-will position is wholly or partially funded by extramural funds (e.g., grant, gift, endowment) which is renewed under provisions set by the grantor. Employment will be contingent upon the continued receipt of these extramural funds and satisfactory job performance.

Education:

Master's degree in computer science, mathematics, statistics, engineering, or a related quantitative field required; OR a Bachelor's degree and two (2) years of relevant experience required; OR six (6) years of relevant experience in lieu of a degree.
PhD in mathematics, computer science, engineering, or a related field preferred.

Experience:

Minimum five (5) years of relevant programming experience required.
Strong experience and proficiency in Python, C/C++, Java, or Ruby required.
Experience with the full software development life cycle required.
Experience leading and coordinating software development projects required.
Experience with data engineering required.
Unix/Linux experience required.
Version control experience required.
Experience building software in a microservice environment preferred.
Advanced experience with relational databases preferred.
Experience with devops methodologies preferred.
High performance/ cloud computing experience preferred.
Unix/Linux programming or system administration experience preferred.
UX/UI experience preferred.
Experience with genomics preferred.
Experience creating development specifications, use cases, and other development related documentation preferred.
Project management experience preferred.

Competencies:

Ability to prioritize and manage workload to meet critical project milestones and deadlines required.
Ability and willingness to acquire new programming languages, statistical and computational methods, and background in research area required.
Ability to lead a collaborative team environment required.
Ability to communicate technical concepts to non-technical staff required.
Knowledge of software development best practices required.
Confidentiality related to sensitive matters such as strategic initiatives, trade secrets, quiet periods, and scientific discoveries yet to be put in the public domain required.

Requeriments :

Skills :

Additional Info:
About the Unit The Center for Data Intensive Science at the University of Chicago is a research center pioneering translational data science to advance biology, medicine, and environmental research.

Data driven research approaches require interdisciplinary innovation in computing technology, algorithms, and statistical models. The growing volume of data available necessitates advances in the sophistication of these methods. Our work centers around developing instruments to integrate commons of complex data with cloud computing technology. We architect large scale commons of research data, computing resources, applications, tools, services. Our guiding principles center around open data, open-source software, and open infrastructure. Through this approach, we can more effectively use data at scale to study and pursue scientific inquiry in the areas of biology, medicine, healthcare, and the environment.

We are leaders in data sharing, democratizing access for the broader research community and accelerating discovery. Our leadership emerged with the launch of the first open-source cloud-based computational research platform recognized as an NIH Trusted Partner, achieving rigorous data quality and data management service requirements. Today we offer over seven petabytes of rich research data through the following data commons platforms:

- NCI Genomic Data Commons

- Bionimbus Protected Data Cloud

- Blood Profiling Atlas for Cancer

- OCC Environmental Data Commons

- Open Science Data Cloud

We are based in Chicago, but our work engages collaborators from across the world.

[Click Here to Access the Original Job Post]