top of page
Gemini_Generated_Image_tofxqetofxqetofx.png

Telecom Customer 360

Case Study

About

This project focuses on building a repeatable, low-code transformation pipeline for TelcoPrime’s flagship "Customer 360" initiative. By leveraging Microsoft Fabric Dataflow Gen2, the solution ingests raw customer data and applies rigorous validation and feature engineering logic to produce a clean, analytics-ready Customer_Silver table.

Challenge

Managing diverse, high-volume telecom data required robust orchestration to resolve critical data quality issues that undermined the "Customer 360" initiative.

​

  • Fragmented Formats: Reconciled unvalidated large-scale tables with fragmented CSV files lacking schema enforcement.

​

  • Address Inconsistencies: Resolved variations in street suffixes (e.g., "Street" vs "St") that caused campaign failures.

​

  • State Fragmentation: Standardized regional data recorded with mixed abbreviations, full names, and inconsistent casing.

​

  • Contaminated Fields: Sanitized phone fields corrupted with non-numeric characters and invalid "bad_data" entries.

 

Objective

 

  • Eliminate manual intervention by building an automated, repeatable pipeline in Microsoft Fabric.

    ​

  • Automate data standardization of all customer attributes using centralized reference lookup tables within Dataflow Gen2.

    ​

  • Ensure reliable data availability for marketing and analytics teams by implementing a quality flagging system.

    ​

Solution

A modular Microsoft Fabric Dataflow Gen2 architecture was implemented to automate ingestion and complex transformations across disparate telecom datasets.

​

  • Built Dataflow Gen2 transformation logic to harmonize raw tables with external CSV reference files.

​

  • Utilized Dataflow Gen2 lookup tables to resolve geographic inconsistencies and enforce data standards.

​

  • Applied sanitization rules to strip non-numeric characters and parse complex address strings into structured columns.

​

  • Engineered a custom quality flag to programmatically distinguish validated records from those requiring investigation.

​

  • Established a Dataflow Gen2 repeatable pipeline to ensure a reliable foundation for marketing and analytics.

Business impact

Reduced data preparation time and eliminated manual cleansing steps for the engineering team. Enabled standardized, high-integrity data ingestion that eliminated marketing waste from undeliverable mail. Provided a scalable "Silver" layer foundation that significantly improved the accuracy of predictive churn models and downstream business reporting.

Screenshot 2025-12-25 214636.jpg

Lookup Table for Normalisation

The Bronze layer contained fragmented regional data, with the State column using a mix of full names ("California"), inconsistent casing ("tx", "FL"), and abbreviations ("Calif."). Additionally, the Phone field was contaminated with non-numeric strings like "bad_data" and inconsistent formatting.

To resolve this, I implemented a Dataflow Gen2 lookup strategy that programmatically joins the customer table with a validated state_lookup file. This process maps every variation—regardless of casing or length—to a single, authoritative two-letter code, ensuring 100% geographic accuracy.

DATAFLOW.jpg

Data Transformation Workflow

The transformation phase utilized the Dataflow Gen2 engine to convert raw customer data into a high-integrity, analytics-ready Silver table through a modular Power Query workflow. I began with geographic normalization, executing a Left Outer Join with a state lookup table to map fragmented inputs (e.g., "Texas," "tx") to an authoritative two-letter standard. To resolve address inconsistencies, I implemented parsing logic to isolate street names and unit numbers, cross-referencing them against a suffix lookup for naming uniformity. Data sanitization was further refined by stripping non-numeric characters from phone fields and engineering a validation logic that nullified entries with incorrect lengths. This automated process produced a streamlined Customer_Silver table with sanitized attributes and standardized geographic features

lakehouse.jpg

The Customer_Silver Table

he screenshot below showcases the final Customer_Silver table within the Microsoft Fabric Lakehouse. This view confirms the successful execution of the transformation logic:

  • Geographic Uniformity: Every state entry is now a standardized two-letter code (e.g., "CA").

  • Address Structure: Raw strings are correctly parsed into Address_Street and Address_unit.

  • Sanitized Phone Numbers: The Phone_clean column contains only 10-digit numeric strings, with all incorrect and invalid entries removed.

bottom of page