Hadoop, Hive, Teradata, Spark streaming, Kafka, NiFi, Python, Qlikview
Technical Data Analyst
╋ Migration of a big data lake to a big data cloud solution: Statistical analysis of relevant sources and datasets used - Working with data engineers to prioritise the necessary components (NiFi, Kafka broker and Spark Streaming) for real time streaming.
╋ Web scraping Python POC doing location-based specific event extraction aimed at anticipating network congestion.
Automation of the data pipeline: extraction from Internet sources, retention of raw files, ingestion into HDFS. Identification of useful data sets in DataLake to establish network speed and reliability metrics.
We were working in Telstra's operations and security department with the aim of improving the network infrastructure (mobile and fixed). Telstra collects hundreds of TeraBytes of data per day on its infrastructure (from Telstra's own infrastructure and from some external data sources). The objectives were to enrich DataLake to support data analysis, monitor the network customer experience and proactively anticipate major outages that may occur on the infrastructure through pattern detection.
╋ Project team of 20: Data engineers, data scientists and business stakeholders.
╋ Identify valuable and critical data sets that can be cross-referenced to improve analytical models on Telstra network performance, and create value for Telstra infrastructure analysis.
╋ External data: identify and ingest relevant new data sources that could enrich existing data models (e.g. Machine Learning solution for fault prediction).
╋ Data dictionary and data modelling and business technical analysis of the main existing data sources in the Big Data Lake.