Data Scientist and Machine Learning Engineer – Texas A&M Genomics & Bioinformatics Service

Job ID: 417558012
Job date: 2017-08-11
End Date:

Country :

[Click Here to Access the Original Job Post]

Job Description:
We're looking for engineers and data scientists who are passionate about subjects like information retrieval, distributed computing, artificial intelligence and natural language processing.

Job in Details:

Building cutting edge Artificial Intelligence and Machine Learning applications.
Developing Recommendation Engines, Web and Text Data Mining using Natural Language Understanding models, applying Network and Graph Analysis algorithms and tools, training Deep Neural Networks.
End-to-end design and implementation of data analytics systems; this includes data gathering, requirement engineering and specification, as well as conceptualization of technical solutions based on business needs.
Working closely with other Data Scientists, Data Engineers to identify opportunities for design and implementation of internet scale Data Mining solutions.
Development of ETL pipelines for large complex datasets; Processing of structured and unstructured data, using Spark, Hive, Kafka, Flume, Oozie etc.
Prototyping and implementation of massive scaled Data Analytics solutions, using big data tools (Hadoop, Spark, HIVE/Impala, SQL, H2O, Python and R).
Working with Cloud Platforms (AWS, MS Azure and Google Computing Engine)

Basic Qualifications:

MS degree in Computer Science or relevant quantitative disciplines like statistics, operations research, bioinformatics, mathematics or physics.
1 year of work or educational experience in Machine Learning and Artificial Intelligence
1 year of relevant experience in data analysis fields (statistics/data science).
Experience with one or more general purpose programming languages including, but not limited to: Java, C/C++, Python, Scala or R
Fluent German and/or English in spoken and written

Preferred Qualifications:

MS in Computer Science, Artificial Intelligence, Machine Learning or related technical fields.
Experience with one or more of the following: Natural Language Processing and Understanding, Classification, Pattern Recognition and Recommendation Systems.
Experience in dealing with large amounts of data, e.g., social network data, scientific data, sensor data, etc.
Applied machine learning experience on large datasets.
Proven programmer experience in at least one programming language, such as Java, Scala, C++, or a similar object-oriented language

Experience/knowledge in some of the following technology areas is an advantage:

Hadoop, HDFS, Hive, HBase, Cassandra, Kafka
YARN, MapReduce, Spark
Cloud Platforms (AWS, MS Azure and Google Computing Engine)
Spark MLlib, H2O, Python ML and Data Science Libs (NumPy, SciPy, Pandas, IPython, Scikit-learn, Theano, TensorFlow, NLTK)

Requeriments :

M.Sc. or higher Degree in Computer Science

Skills :

Areas :

Bioinformatics

Additional Info:
We are a growing Machine Intelligence (Machine Learning and Artificial Intelligence) Startup founded in 2015 with 20+ Engineers and Data Scientists.

At Qimia we develop the next-generation technologies that change how humans use, interact with, explore and gain insight from data. We are an active part of the AI revolution that is taking place in tech right now, working with Big Data and Machine Learning technologies to develop intelligent systems of the future.

We develop Data Analysis and Machine Learning Solutions in-house and consult our clients in their projects. In Qimia we are working with major clients from different industries introducing analytics and AI solutions into their businesses. Most of our clients are German and international blue-chip companies located in Germany.

Your prospects at Qimia:

It awaits you a stimulating and challenging work atmosphere, with flat hierarchies and experienced and helpful colleges. Here at Qimia we put a strong focus on a comprehensive training and education of our Engineers/Data Scientists.

The Topics that we cover in our training are:

Hadoop DevOps training: installation and configuration of Hadoop distributions (Cloudera and Hortonworks); development using the most important Hadoop frameworks, MapReduce, Spark, Hive, Hbase and Oozie etc.
Big Data Science: Python Machine Learning Libs (NumPy, SciPy, Pandas, IPython, Scikit-learn, Theano, TensorFlow, NLTK), Spark for Data Mining and Machine Learning (Spark SQL, Spark MLlib, PySpark, GraphFrames and H2O)
Deep Neural Networks: Feed-Forward neural nets, Convolutional neural nets, Recurrent neural nets, development of production ready TensorFlow and DL4J solutions.
Data Science and Machine Learning essentials: Time Series and Sequential Data processing, Supervised and Unsupervised Machine Learning, Classification, Logistic Regression and Random Forest, Support Vector Machines, K-Nearest Neighbors, Naive Bayes and Gradient Boosting
Web and Text Mining: Natural Language Processing and Information Retrieval, Categorizing and automatic Tag/Keyword extraction, Document Classification and Clustering, Entity Recognition, tf-idf, N-grams, word2vec and gensim etc.

In our training, we work intensively on current and previous Kaggle competitions in areas of Deep Learning, Predictive Analysis, Recommendation Engines and Natural Language Understanding.

[Click Here to Access the Original Job Post]