Gather user requirements and construct UML diagrams for requirements modelling
Gather and process raw data at scale (including writing scripts, web scraping, calling APIs,
writing SQL queries, etc)
Design, develop & document data pipeline and analysis programs using Hadoop and related
ecosystem tools such as Pig, Hive, Spark and Storm
Design, develop & document predictive models utilizing tools included in Hadoop cluster such as
Design, develop & document data ingestion, data pre-preparation, data cleansing and data
standardization rules to prepare datasets for analysis, and ensure the processs are executed in
optimized and timely manner.
Design, develop & document methods to transform unstructured datasets such as text, audio and
video to structured attributes.
Design, develop & document data processing workflow and governance rules on Python, Oozie
Design, develop & document RESTful Web API and Web Applications for productization of data
pipeline and processing workflow on Python and AngularJS
Conduct requirement gathering to understand customer needs and as-is data ecosystem
Work with subject matter experts to translate domain knowledge into data processing pipelines
and data products
Design, develop & document data products such as web based visualization dashboards and data
Design, deploy, manage & document data processing infrastructure both on-site and on-cloud
Design and develop automated unit test scripts for developed softwares
3+ years experience in software development projects or ETL / data warehousing projects
Experience in system development lifecycle, either professionally or as hobby.
Enough programming knowledge to clean, and scrub noisy datasets.
Self-driven and able to take own initiative to learn and explore
Capable of picking up new technologies and practices in a rapid manner
Solid foundation in mathematical and algorithmic thinking
Strong background and experience in statistics is a plus
Background in UML modeling is a plus
Coaching and self-paced training materials will be provided.
Join a high energy team, which includes several Open Source contributors working towards
transforming the local IT industry through Open Source technologies.
Opportunity to work with multiple high-demand Open Source technologies in the Big Data
market which includes the following technologies’ ecosystems