top of page
Screenshot 2026-01-08 152252.jpg

Green Grid Solutions  The Smart Pricing

Case Study

About

Green Grid Solutions launched a pilot program to validate a dynamic electricity pricing model that charges higher rates during peak usage hours to drive energy conservation. To support this business goal, the solution leveraged Databricks to process smart-meter data at scale, ensure accurate billing, and enforce customer data privacy. Raw meter readings and reference files were ingested via Databricks Volumes and transformed using PySpark to produce compliant, analytics-ready billing data.

Challenge

  • Data Quality Issues: Smart-meter readings included invalid values and spikes, risking inaccurate customer billing

​

  • Complex Pricing Logic: Peak and non-peak rate calculations based on timestamps were error-prone when handled manually

​

  • Data Privacy Risk: Presence of customer email addresses in source data posed compliance and regulatory concerns

​

  • Scalability Limitations: Lack of a centralized processing approach made it difficult to apply consistent rules as data volumes increased.

Solution

  • Multi-Format Data Processing: Processed raw CSV and JSON files  within a Databricks PySpark notebook using Volumes as the storage layer

​

  • Data Quality Validation: Implemented data quality rules to filter out invalid meter readings and isolate anomalous records for exclusion from billing

​

  • Peak Pricing Logic: Applied time-based business logic in PySpark to accurately calculate peak and non-peak electricity charges

​

  • Data Privacy Enforcement: Enforced data privacy by hashing customer email addresses using SHA-256 before persisting analytical results

​

  • Analytics-Ready Storage: Stored the cleansed, secured, and enriched output as Delta tables for reliable querying and analysis

Business impact

  • Pricing Model Validation: Enabled the Head of Product to confidently validate the dynamic pricing model by accurately differentiating peak and non-peak energy usage

​

  • Customer Trust Protection: Prevented customer overbilling and trust erosion by filtering phantom spikes and invalid meter readings from billing calculations

​

  • Regulatory Compliance: Met Chief Legal Officer requirements by ensuring no raw customer email addresses were exposed in analytics outputs

​

  • Risk Mitigation: Eliminated the risk of regulatory violations and pilot shutdown through enforced data anonymization at the Silver layer

​

  • Strategic Decision Support: Delivered reliable, compliant insights that supported a clear go/no-go decision on scaling the smart pricing program beyond the pilot.

This code shows schema-enforced ingestion of smart-meter JSON data using PySpark, ensuring correct data types and preventing schema drift.
The data is read directly from Databricks Volumes with multiline support for reliable parsing.
Early schema validation establishes a consistent foundation for downstream quality checks and billing logic

This code applies a PySpark filter to identify invalid meter readings based on consumption thresholds.
Records failing the quality rules are isolated into a quarantine dataset to prevent downstream billing errors.

This code enforces the quality rule by filtering meter readings to retain only records with kWh values greater than 0 and less than 100.
Invalid records are diverted to the Quarantine_Readings table, ensuring only clean data is used for further processing.

Here we enforce data privacy by hashing customer email addresses using SHA-256. The raw email column is removed, ensuring no PII is stored in the analytics layer.

This snippet enforces the peak-hour pricing rule by doubling the base rate for readings between 6 PM and 9 PM.
The final bill amount is then calculated as kWh multiplied by the applicable rate, producing accurate time-based billing results.

This code writes the final, validated billing dataset to the Silver_SmartBill Delta table.
The overwrite operation ensures the analytics layer always contains the latest clean, compliant, and fully calculated billing data.

bottom of page