Hadoop, Hive, Teradata, Spark streaming, Kafka, NiFi, Python, Qlikview
Technical Data Analyst
╋ Big Data Lake migration onto a Big Data Cloud solution: Statistics analysis of the relevant sources and datasets used – Work with Data Engineers to prioritize the needed components (NiFi, Kafka broker and Spark Streaming) for real-time streaming flows.
╋ Web scraping Python POC doing specific location-based events extraction aiming to anticipate the congestion on the network.
Automation of data pipeline: extraction from internet sources, curation of raw files, ingestion in HDFS. Identification of valuable datasets in the DataLake in order to build network speed and reliability metrics.
We were working in the Operations & Security Department of Telstra with the goal of improving the network infrastructure (mobile and fixed). Telstra is collecting hundreds of TeraBytes of data per day on its infrastructure (coming from the Telstra infrastructure and some external data sources). The objectives were to enrich the DataLake to support the data insights analysis, monitor the network customer experience and proactively anticipate the main outages which can occur on the infrastructure through patterns detection.
╋ 20 people project team: Data engineers, data scientists and business stakeholders
╋ Identify the valuable and critical datasets that can be crossed to improve the analysis models about the performance of the Telstra network, and create value for Telstra infrastructure analysis
╋ External data: Identify and ingest new relevant data sources which could enrich the existing data models (for example: Machine Learning solution for outages prediction)
╋ Data dictionary and data modeling and technical business analysis of the main existing data sources existing in the Big Data Lake.