Data Engineer – Texas A&M Genomics & Bioinformatics Service

Job ID: 19107
Job date: 2017-01-19
End Date:

Country :

[Click Here to Access the Original Job Post]

Job Description:
We are seeking an experienced developer to join our server team in building scalable data systems for genomics. A significant challenge will be work on our high-volume genetic variant store, involving transport, transformation and query algorithms, federated query and scaling up into the big data range. Additional challenges will involve integration of large content sources into the Syapse product. You should have hands-on experience with noSQL and map-reduce style data systems, and be comfortable taking on all aspects of handling TB and larger data sets. With or without a bioinformatics background, you should be excited to provide a robust, scalable data platform that will enable the next generation of clinical genomics practice. The position can fit a range of experiences and degree levels.

Key Responsibilities:

-Participate in design and development of Syapse’s genomic data stores.

-Contribute to physical architecture and deployment of these systems in AWS.

-Take ownership of challenges ranging from large scale data transport and querying to security and availability, to testing and performance tuning.

-Collaborate on integration projects that join the genomics stores with RDF data via a federated query facility.

-Use data mining techniques to analyze large-scale genomics data.

Requirements:

-Degree in computer science or related field required.

-Five or more years experience building large scale applications in a distributed environment.

-Experience building systems around big data components, e.g. Big Table, Hadoop, or warehousing.

-Hands-on experience working in a cloud data center environment is a must; AWS a strong plus.

-Comfortable designing for parameters such as scalability, latency, throughput, reliability, availability, consistency, and security.

-A variety of data system knowledge spanning at least SQL, noSQL, and map-reduce.

-Knowledge of distributed systems, and experience tuning query or algorithm performance in a distributed setting. -Fluency in python, java, sql, and unix.

-Familiarity with real-world data mining techniques a must; hands-on implementation experience a strong plus.

-Ability to work independently, collaborate closely, and bring other engineers up to speed on your work rapidly and thoroughly.

Requeriments :

Technical Bachelor

Skills :

Areas :

Computational Biology

Additional Info:
IMPORTANT: Please remember to attach a Word document resume when applying!

[Click Here to Access the Original Job Post]