In 2014, Duke Energy established an Analytics Competency Center in the Grid Solutionsorganization with the charter to apply analytics to grid data in support of business decisionmaking. The new team soon had some signal successes:
• multiple siloed data sets were integrated and analyzed, providing significant new insights into outages’ effects on customer satisfaction and spotlighting ‘pockets of poor performance’,
• predictive models integrated into storm effect prediction tools avoided tens of millions of dollars of costs during and after severe weather events, and
• classification models identified many cases of energy theft and failing meters, and collected millions of dollars in back- and forward-billing.
"The streaming smart grid data will be a game changer for utility analytics"
These successes showed the value of moving from the well-understood business intelligence [BI] and data mining efforts to predictive modeling. The team began to shift the focus of analytics within Duke Energy from 'what happened?' to 'what will happen?'
All of these efforts listed above dealt with data sets of many millions of records. For these initiatives and many others, we pulled data from conventional relational database systems, such as Customer Information Systems [CIS], Outage Management Systems [OMS], and Enterprise Data Warehouses [EDW], and performed quantitative analysis with well-known statistical toolslike SAS, R, and Python.
The Big Data Challenge
Flush with early success, the data science team was excited to wade hip-deep into the smart grid data streams.The new smart grid deployments promised access to more frequent data, which conceivably could yield better results from more robust models that were developed and tested faster.
In the theft detection use case, for example, meter read intervals jump from 12 reads per yearwith old-fashioned meters to 35,000 reads on a smart meter. The larger data set allows for more sophisticated modeling and rapid hypothesis testing.
By using the new streaming usage data, there were obvious applied analytics use cases for load disaggregation, energy efficiency marketing, customer segmentation, distribution planning, and a host of others.We foresaw a steady march into the broad sunlit uplands of predictive modeling and further, into artificial intelligence and machine learning.
But then we hit a snag: namely, for each million customersequipped with 15 minute interval smart meters, 35billionusage records are generated every year. Duke Energy has 7.5 million retail electric customers. Based on Smart Meter roll-out strategy, that record count would exceed1.4 trillion by 2020.That is really, truly Big Data.
The team’s first attempts to become familiar with this data were daunting. Using standard RDBMS tools, initial data mining efforts in the Smart Meter usage tables [containing tens of billions of rows of data] managed to bring the EDW to its knees. [Your humble correspondent sheepishly took the call from Production Support.Now I know what the dog that catches the car feels like.]
And usage data is not the only data stream that flows from Smart Meters: they may also be configured to deliver other useful data such as voltage, volt-ampere reactive[VAR], meter events, and temperature.
It was apparent that handling data streams of the scale that smart grid devices produce required a new data platform and a new toolkit, because the old ones just could not cut it.
A Steep Learning Curve
Duke Energy evaluated several big data platforms and ultimately selected and implemented a distribution of Hadoop. Hadoop is a framework that enables distributed processing of enormous data sets across clusters of processing nodes. Hadoop has evolved into not just a data platform, but an entire ecosystem with tools for node configuration, data management, querying, and modeling.
Our IT partners had to staff up to configure and support new platform and admin tools. And our data scientists had to contend with learning the new ecosystem and engaging on a voyage of discovery with an entirely new portfolio of oddly named and spelled tools, likeSqoop, Hive, Pig, HBase, Kafka, Chukwa, Tez, Orc, Parquet, Mahout, Oozie, Ambari, Scala, Spark, Flume, NiFi, and so on and these tools are often superseded at a rapid clip by other emerging tools entering the ecosystem – keeping up requires constant effort.
Soon we were organizing training, seeking new big data analytics recruits, developing new onboarding material, and more in addition to delivering on our existing commitments for analytics work. A new job family, Data Engineer, was even created to handle data collection, movement and curation needed for analytics work.
We learned that the old ways of data mining and modeling could not work in the brave new world of big data. And through significant effort and investment, we learned [and continue to learn] the new ways and tools that can.
Crossing the Frontier
We can begin to see that we are approaching thosebroad sunlit uplands. From the predictive modeling that delivered such impressive benefits, we are now tacklingartificial intelligence and machine learning, which are enabled by streaming data from smart grid devices. These data streams and new tools can be used for previously infeasible analytics initiatives, such as asset failure prediction and load disaggregation. In addition, the torrents of data deliver much more detail that can be used to enhance the earlier predictive models:
• reliability analysis will improve as the smart meters report their own power outage and restoration events, providing more accurate Estimated Time of Restoration [ETR] calculations, much greater visibility into momentary interruptions, and indicating true customer experience;
• more accurate protective device and outage duration data can be integrated iteratively to enhance the storm Effect predictor tools’ outputs; and
• theft detection algorithms will become more robust as models are developed to leverage the high-frequency usage reads and meter events, such as tamper and inversion notifications.
The streaming smart grid data will be a game changer for utility analytics. However, making the jump from databases and desktop modeling tools to streaming data on distributed processing architecture with new modeling packagesrequire an investment ofeffort and resources that should not be underestimated.