Identification and characterization of irregular consumptions of load data

Journal of Modern Power Systems and Clean Energy, Feb 2017

The historical information of loadings on substation helps in evaluation of size of photovoltaic (PV) generation and energy storages for peak shaving and distribution system upgrade deferral. A method, based on consumption data, is proposed to separate the unusual consumption and to form the clusters of similar regular consumption. The method does optimal partition of the load pattern data into core points and border points, high and less dense regions, respectively. The local outlier factor, which does not require fixed probability distribution of data and statistical measures, ranks the unusual consumptions on only the border points, which are a few percent of the complete data. The suggested method finds the optimal or close to optimal number of clusters of similar shape of load patterns to detect regular peak and valley load demands on different days. Furthermore, identification and characterization of features pertaining to unusual consumptions in load pattern data have been done on border points only. The effectiveness of the proposed method and characterization is tested on two practical distribution systems.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

Identification and characterization of irregular consumptions of load data

J. Mod. Power Syst. Clean Energy ( Identification and characterization of irregular consumptions of load data Desh Deepak SHARMA 0 1 2 3 S. N. SINGH 0 1 2 3 Jeremy LIN 0 1 2 3 Elham FORUZAN 0 1 2 3 Jeremy LIN 0 1 2 3 0 PJM Interconnection , Audubon, PA , USA 1 Elham FORUZAN 2 & Desh Deepak SHARMA 3 Department of Electrical Engineering, University of Nebraska-Lincoln , Lincoln, NE , USA The historical information of loadings on substation helps in evaluation of size of photovoltaic (PV) generation and energy storages for peak shaving and distribution system upgrade deferral. A method, based on consumption data, is proposed to separate the unusual consumption and to form the clusters of similar regular consumption. The method does optimal partition of the load pattern data into core points and border points, high and less dense regions, respectively. The local outlier factor, which does not require fixed probability distribution of data and statistical measures, ranks the unusual consumptions on only the border points, which are a few percent of the complete data. The suggested method finds the optimal or close to optimal number of clusters of similar shape of load patterns to detect regular peak and valley load demands on different days. Furthermore, identification and characterization of features pertaining to unusual consumptions in load pattern data have been done on border points only. The effectiveness of the proposed method and characterization is tested on two practical distribution systems. Density based clustering; Irregular consumption; Local outlier factor; Peak demand; Valley demand - Indian Institute of Technology, Kanpur, Kanpur, India 1 Introduction During the last few decades, there has been a major shift from the vertically integrated monopolistic system to the open power market system. The restructuring of electricity supply industry has created many new challenges in providing the secure, stable and economical electric power to the end users [ 1–3 ]. The electric prices vary significantly during the day due demand variations. To overcome the peaking problems, the demand response programs are suggested under the smart grid initiatives [ 4–6 ]. Under demand response scheme, customers reduce the electrical load demand during the peak-price period by rescheduling the demand for low-price periods [ 4–9 ]. Peak clipping, valley filling and load shifting are key tools of demand response [9]. Power operators are concerned about irregular behavior of electricity consumption in their decision making process. In the load profile data, abnormal consumptions may happen due to measurement error, undetected consumption, illegal electricity connection, improperly installed equipment, etc. [ 10–14 ]. Clustering of load profiles helps in developing working methodology for energy losses (technical and commercial) evaluation [ 10–13 ]. For peak shaving and distribution system upgrade, it is very essential to know the changes in loading at the substations i.e the consumption behavior of customers. At the peak load, the power losses in different feeders and different transformers are to be estimated [15]. This will provide fair calculation of network pricing. Data mining and artificial intelligence techniques such as support vector machines [ 11 ], fuzzy clustering [ 12 ], etc. are explored in the identification of irregularities in energy consumption. A comparison of a load profile is done with standard or average load profile to identify the abnormal consumption [ 12, 16 ]. Extensive experimental testing was carried out in [17] for selection of parameter values such as the sensitivity threshold to detect anomalous events, maximum cluster radius for the nearest neighbor cluster method and parameter used for fuzzy rule extraction based on identified clusters. Different authors, in their research works, discussed various methods of classification of the electrical consumption data [ 14, 18–28 ]. These methods can facilitate development of different types of demand response strategies and improvement of grid reliability. For different customers, the representative load patterns (RLPs) are obtained and these are clustered on the basis of RLPs [ 26, 27 ]. The customers of each cluster will have same load pattern and thus, TLP (typical load profile) of each customer of a group is a centroid of that cluster [ 27 ]. Based on similar electrical consumption behavior, classical k-means [ 23, 27 ], fuzzy c-means [ 23, 27 ], hierarchical clustering, self-organizing feature maps (SOFM) [ 23, 27 ], principal component analysis (PCA) [23], curvilinear component analysis (CCA) algorithms [ 23 ], ant colony clustering (ACC) [ 28 ], support vector clustering (SVC) [ 26 ], etc. have been suggested for the classification of electrical load profile data. Different comparison methods such as clustering dispersion indicator, Davies-Bouldin indicator, stability index are utilized for cluster validity assessment [ 23, 27 ]. A data object is characterized by a set of similarity or dissimilarity measures which are described by distance function. Various clustering algorithms have been applied in separating the data object into different clusters while employing distance function. Major clustering methods which are applied in classification of data are partition based, hierarchical (agglomerative and divisive) clustering, neural network based, density based, grid based, model based, etc. [ 29–31 ]. Partitioning algorithms (k-means, fuzzy c-means, etc.) applied in clusters of load data need a number of clusters as input data. In the hierarchical clustering algorithm, dendrogram is created from the leaves up to the root (agglomerative approach) or from root down to leaves (divisive approach) with merge or divide operation in each iteration. A termination criterion is required to stop the iterations [ 23 ]. In an ant colony clustering (ACC) concept, a specified number of clusters are required as input or number of clusters is defined in post-processing phase. In an iterative process of ACC, an initialization phase requires a number of clusters and number of ants in ending phase, a stopping criterion is to be defined [ 28 ]. In support vector clustering (SVC), the final clusters are obtained in post-processing phase, which is computationally intensive [ 26 ]. The ISODATA algorithm, which includes temperature dependency and outlier filtering, is proposed in [ 32 ] for customer classification. For the classification of load profiles, the Gaussian mixture model is used in assigning the labels, only, to the most recurrent load profiles [ 33 ]. Intercluster behaviour classification model and intra-cluster consumption volume prediction model are constructed using agglomerative hierarchical clustering algorithm [ 34 ]. In density based clustering algorithm, random initialization of any parameter is not required. Therefore, after setting the global parameters heuristically, similar results are obtained in each iteration and hence, consistency of the algorithm is preserved. There are different variants of density based algorithm available in the literature. Density based spatial clustering of applications with noise (DBSCAN) is one of the most popular density based algorithms being used in data mining [ 29–31 ]. Most of the clustering algorithms require an iterative control strategy to optimize the objective function and random initialization of some parameters. Thus, clustering results vary with different iterations. Selection of appropriate number of clusters is another tedious task in implementing these algorithms. The problem in implementation of DBSCAN is selection of global parameter while kmeans and fuzzy c-means are based on iterative control scheme. Generally, statistical methods are used to identify the outliers and these methods are based on fixed probability distribution of data. However, the real time information is not fixed to any distribution. Further, all the irregular consumption detection methods work on whole load pattern data set. Outlier detection approaches based on k-means and fuzzy c-means approaches finds variation of data object from the centroid. The main problem with existing density based clustering algorithm is that intrinsic cluster structures cannot be detected by global density parameters. Different local densities are to be revealed to find local clusters in the data space with further partition [ 29, 35 ]. Motivated by aforementioned facts, in this paper, a new method, which is suitable for clustering and identifying the unusual electricity consumptions and their quantification according to the nature of irregularity, is proposed. The proposed method utilizes the concept of Local Outlier Factor (LOF) [ 36, 37 ] for ranking of unusual consumptions based on neighborhood densities i.e. k- nearest neighbors (k-NNs) of these consumptions in the load pattern data. Clustering results are compared with k-means and fuzzy cmeans with clustering validation using Davies Bouldin index and Silhouette coefficient. The major contributions of this paper: 1) A method is proposed to obtain global density parameters in order to find an optimal partition of a data set into high and low density regions. The low density regions are known as border points which are a little part of whole load data and utilized to find irregular loading on distribution substations. Hence, computation work is highly reduced in identification of only irregular demand. 2) Micro clusters are obtained to reveal local clusters and, hence, further partition of the data set is avoided. Core points in load data help in analyzing the occurrence of a peak-valley in load pattern. 3) Furthermore, an approach to characterize and quantify the different features of unusual consumptions using feature irregularity factor (FIF) is introduced on only border points of load data. It identifies the irregularities in unusual consumption based on different irregularity features. This approach is scalable as different irregular features of unusual loadings on substations can be identified and added to decide FIF of different unusual consumptions. The suitability of the proposed method is demonstrated on two practical distribution systems. 2 Clustering methods 2.1 k-means Classical k-means algorithm is a partition based clustering algorithm which separates a set of n data objects into k clusters based on similarity features. Given a set of nnumber of observations, each observation is a d-dimensional real vector. This observation set is partitioned into k sets (k \ n), while an objective function is minimized. Each set represents a cluster of data [ 29, 30 ]. 2.2 Fuzzy c-means In fuzzy c-means clustering, each data object is assigned to different clusters with different degrees of membership. Thus, membership of a data object is shared among different clusters. This algorithm tries to find the best partition of whole data while minimizing an objective function [ 29, 30 ]. 2.3 Density based clustering This algorithm separates high density and low density regions. A data point belongs to a cluster if its neighborhood density is high enough. Clusters get arbitrary shape while absorbing all the data points, those are in the neighborhood. Densities of all the clusters may be different. The classical density based spatial clustering of applications with noise (DBSCAN) forms clusters such that each data point in a cluster should consists of at least a minimum number of points (Nminpts) in its neighborhood defined by a given radius (reps). It means that the cardinality of the neighborhood has to exceed a threshold [ 29–31 ]. 3 Local outlier factor (LOF) LOF is density based outlier detection method [ 36, 37 ] in which the ratios between local density of data object and local density of the data objects’ neighborhood are obtained. An outlier is defined based on the density of data objects existing in its neighborhood. A comparison of the density of each object with the density of its k-NNs is to be done. The local density of an outlier is relatively low compared to the local density of other data objects around its neighboring objects. In this approach, each data object can be represented by an outlying factor as per their nature of anomalies. If the value of LOF of a data object is higher, it means that there is a large change in densities of the object and its k-NNs. If the value of LOF of a data object is approximately equal to 1, the data object is close to dense region and not to an outlier [ 36, 37 ]. 3.1 k-distance Basically, it is the distance between an object under consideration and its k-th nearest neighbor. Let D is whole data set; z 2 D is the k-th nearest neighbor of x 2 D and Ldist(x, z) is the distance of x to object z. The k-distance of x is written as Ldist;kðxÞ ¼ Ldistðx; zÞ ð1Þ where Dx is the set of k-th closest objects to x 2 D, then the distance of x to o 2 Dx is Ldist (x,o) B Ldist,k(x) while Dx ( D. Euclidean distance is considered for distance measurement. 3.2 k-distance neighborhood of x The k-distance neighborhood of object x consists of k-th nearest neighbors i.e. objects whose distances from x are less than or equal to k-distance of x. k-distance neighborhood of x is defined as NkðxÞ ¼ 8o 2 DxjLdistðx; oÞ Ldist;kðxÞ; Dx D ð2Þ 3.3 Reachability distance of x with respect to z The reachability distance is an asymmetric measure. The reachability distance is used to find the density of k-nearest neighborhood of an object. The reachability distance of an object x with respect to object z is given as Lrdeiastc;hkðx; zÞ ¼ max Ldist;kðzÞ; Ldistðx; zÞ It maintains minimal distance between two objects x and z while object x is kept outside the neighborhood of z. If x is not close to z, then the reachability distance is simply the distance between x and z i.e. Ldist(x, z). If x is very close to z then the reachability distance is k-distance of z i.e. Ldist,k(z). 3.4 Local reachability density of x The local reachability density of x represents the density of its neighborhood. It is defined as the reciprocal of average reachability distance of k-distance neighborhood of x. If jNkðxÞj is the number of objects in k-distance neighborhood of x, then the local reachability density of x is given as Rlrd;kðxÞ ¼ P z2NkðxÞ jNkðxÞj Lrdeiastc;hkðx; zÞ 3.5 Local outlier factor of x Basically, local outlier factor is the average of the ratio of local reachability densities of objects in k-distance neighborhood of x to the local reachability density of x itself and given as LOFkðxÞ ¼ P z2NkðxÞ jNkðxÞj Rlrd;kðzÞ Rlrd;kðxÞ The strength of reachability distance depends on positive integer k. The higher value of k ensures more stable results, but the burden of computation increases. 4 Outlier detection methods and problem assessments ‘‘An outlier is an observation which deviates largely from the other observations as to arouse suspicions that it was generated by a different mechanism.’’ Abnormalities, discordants, deviants, irregularities, or anomalies are the ð3Þ ð4Þ ð5Þ other terms used for outliers. Different basic models, such as extreme value analysis, probabilistic and statistical models, linear models, proximity-based models, information theoretical models, high dimensional outlier detection models, are used for detection of outliers in the data. These models are used depending on the type of the available data observation set. These algorithms are having pros and cons in the detection of outliers [ 38 ]. The objective of the outlier detection method is to identify data objects which are markedly different from or inconsistent with the normal set of data. The advantages and disadvantages of clustering based, nearest neighbour based, classification based and spectral anomaly detection techniques are discussed in [ 35 ]. It is shown that computational complexity is a big issue and most of the anomaly detection techniques are computationally expensive [ 35, 38 ]. These techniques work on the whole of the data observation set in the detection of anomalies. In this paper, a method is proposed for an optimal partition of load data into core points and border points. Irregular consumptions are part of border points. Accurate selection of two global parameters reps;o; Nminpts;o is to be done as per (6). Data point with LOF less than 1.0 is a part of the cluster. Possessing at least one LOF, of a data point, nearly equal to 1.0 but greater than 1.0, ensures that all less dense data points and outliers are included in border points. Thus, with this appropriate set of global parameters, it is ensured that all the high dense points are separated from the less dense points [ 36, 37 ], and clustering operation is performed on high dense points only and LOFs are computed of less dense points. Following equation is formulated for sub-optimal partition of whole load data into high and less dense regions. ðreps;o; Nminpts;oÞ ¼ 8> ðreps; NminptsÞjmin fLOF; < where fLOF ¼ LOFðllbrÞ > : LOFðlbÞ [ 1:0; lb; llbr 2 B; B 1; X ð6Þ where B is a set of border points lb; X is the complete load pattern data; llbr is the border point with lowest rank with LOF in B; l 2 X is data point (a load profile) and lc 2 X is a core point in load data. 5 Proposed method for identification of unusual consumptions and clustering The proposed method, which acquires the basic concept of density based clustering, focuses on the core points for clustering purpose and border points for the identification of outlier. The LOFs are computed for only border points. So, all the border points are quantified with LOF according to their outlying nature. In the method, there is no consideration of any defined distribution of data to isolate the irregular consumption while assigning the degree of being irregular as LOF in load pattern data. Computation of LOF is done only on border points which are a few percent of whole load pattern data. No iterative control scheme is required for optimal or close to optimal clustering results obtained on a practical system [ 39, 40 ] by the proposed method. Although the method can find optimal clusters, but appropriate clusters are obtained in each zone in order to find distinguishable peaks and valleys for peak load clipping, load shifting. Heuristically, it is found that clustering, which produces distinguishable peak and valley, is validated as optimal clustering or close to optimal. 5.1 Distance matrix Euclidean distance is considered to measure the closeness of data objects (load profiles). The distance between n-dimensional two data objects li and lj is described as given below Ldistðli; ljÞ ¼ eij ¼ Ldistðli; ljÞ n X jjlik k¼1 ljkjj ð7Þ ð8Þ A distance matrix represents the closeness of data objects and this matrix is a square matrix and its dimension is N N where N is number of data objects. Diagonal elements such as e11, e22, …, eNN are always zero. The scaling of the distance matrix is carried out by dividing all elements of distance matrix by a scaling factor if required. 5.2 Solution to obtain global parameters The parameters reps;o; Nminpts;o for a sub-optimal partition of load data can be obtained as given below: 1. 2. Set arbitrarily reps; Nminpts to generate set B; Tune reps; Nminpts to reps;o; Nminpts;o to satisfy (6). 5.3 Generation of small clusters Small cluster is formed from arbitrarily selected root core point and its direct density reachable core points at depth one. So, a small cluster, Csc, is formed according to following theorem [ 31 ]. Theorem If xi is core point and xi 2 Csc then xj 2 Csc if xj is core point and it is direct density reachable from xi. 5.4 Operation of merging small clusters Two or more than two smaller clusters are merged into a single cluster such that the maximum deviation of averages ð9Þ ð10Þ ð11Þ ð12Þ ð13Þ ð14Þ ð15Þ ð16Þ q hij ¼ qm¼a1:xn jvi vjqj hmax hmin max ¼ i;j¼1:m;i6¼j min ¼ i;j¼1:m;i6¼j hij hij of these small clusters at any dimension is less than a threshold. Consider Cs1c; Cs2c; . . .; Csmc is a set of small clusters of given n-dimensional data and fv1; v2; . . .; vmg is set of averages (centroids) of these small clusters. Maximum deviation of two averages at any dimension is defined as given below. hmax and hmin are maximum and minimum values of hij among all small clusters obtained as Suppose K is the number of clusters as fc1; c2; . . .; cK g after merging small clusters. h ¼ h1 can be set such that all small clusters are merged into single cluster i.e. K = 1. h1 ¼ hmin\h\hmaxjK ¼ 1 h ¼ hm can be set such that no small cluster is merged. In this case, the number of clusters K is equal to the number of small clusters m. Obviously, the number of clusters obtained, after merge operation, is less than number of small clusters i.e. 8K m. hm ¼ hmin\h\hmaxjK ¼ m hK can be set for K number of clusters as hK ¼ fh1\h\hmj1 K mg hoK is the value of h such that optimal number of clusters Ko is found. So hoK is defined as given below. hoK ¼ fh1\h\hmjK ¼ Kog 5.5 Assigning non-outliers to clusters Border points which are having LOF approximately equal to 1.0 are located close to a homogeneous dense region and these may be part of any cluster through density reachable and density connected concepts. Higher values of LOF of points show that there is a large difference in the densities of these points and their k-nearest neighbors and hence, these points are considered to be outliers [ 36, 37 ]. A limiting value ULOF is considered for LOF in order to define set of outliers, XU, out of border points B as given below. XU ¼ flb 2 BjLOFðlbÞ [ ULOF g The XU is, obviously, set of unusual consumptions. Assume fc1; c2; . . .; cK g is the set of clusters then the border points which are not designated as outlier can be assigned to a cluster via following way. lb 2 B :¼ lb 2 Cij max NkCNi N ðlbÞ i¼1:K where NkCNi N ðlbÞ is the number of k-nearest neighbors of point lb in cluster ci [ 30 ]. ð17Þ 5.6 Proposed method 5) 6) 7) 8) Repeat the process of step 4 to obtain other small clusters until all of the remaining core points are visited. Merge small clusters into clusters with variation in threshold h to obtain the optimal number of clusters. Compute LOF for each border point and consider a limiting value for LOF to isolate ranked outliers from border points. Merge the non-outliers border points to clusters. 6 Proposed characterization of unusual consumptions The electricity consumptions, which are different from regular electricity consumptions, are to be analyzed. Different types of peak demand, sudden large change and zero demand are some irregular consumption. These irregular consumption behaviors are defined below on only set XU. 6.1 Irregular peak unusual consumption Uirpeak ¼ lb 2 XU j9t : Dditrpeak [ Ddref ;a Irregular peak unusual consumption Uirpeak is defined as n o ð18Þ Dditrpeak ¼ dtðlbÞ Ddref ;a ¼ dpeak;a dref dref where dt(lb) is the demand of a load data point (a load profile) lb at time interval t 2 T . dref is the reference demand and the demand which is more than dref is termed as peak demand. dpeak,a is an acceptable peak demand in the system. Ddref,a is a predefined acceptable change in demand more than dref to decide irregular consumption. 6.2 Broadest peak demand Broadest peak demand Ubpeak is an unusual consumption as defined below. The demand in Ubpeak is more than dref for some consecutive time intervals speak and npeak is the cardinality of speak. Ubpeak ¼ l 2 XU jðdtðlÞ dref Þ [ 0; t 2 speak 6.3 Sudden large gain unusual consumption ð19Þ Sudden large gain unusual consumption, Usgain, is the amount of increase in demand more than dga which is an acceptable gain in demand at any time interval t 2 T . Usgain ¼ nl 2 XU j9t : Ddgtain [ dgao ð20Þ Ddgtain ¼ dtðlÞ dt 1 l ð Þ 6.4 Nearly zero demand unusual consumption Nearly zero electricity demand unusual consumption, Uzero, is the demand, which remains a very low value equal to zero at any time interval t 2 T or for some duration of time intervals. Uzero ¼ fl 2 XUjdtðlÞ ¼ 0 and t 2 szerog where szero is a set of time intervals on which demand is zero and nzero is the cardinality of szero. Based on aforementioned definitions, vector of features of unusual consumptions is defined as YU ¼ ðDditrpeak; npeak; Ddgtain; Dddtrop; nzeroÞ Dditrpeak [ Ddref ; Ddgtain [ dag; Dddtrop [ da d To identify the degree of irregularity in unusual consumptions, feature irregularity factor (IFIF) is introduced and defined below: IFIF ¼ jjYUjj Each feature of different unusual consumptions, in vector YU is normalized by min-max or z-score normalization method. In an unusual consumption, it is possible that more than one unusual characteristic may present. From a row of unusual consumption in YU, the most dominating unusual characteristics can be identified. Limiting values in IFIF directly relate to real unusual behaviors of outliers. Once limiting values are decided, feature vector and hence, IFIF of an unusual consumption are decided. ð22Þ ð23Þ ð24Þ 7 Case studies The proposed method to identify unusual consumptions and to find clustering results for peak valley analysis is tested on the two practical systems. Regular peaks and valleys are identified with clustering results obtained from proposed approach in order to distinguish irregular peaks in the load pattern data. The proposed characterization of unusual consumptions has also been carried out. The 365 days are numbered as day 01 is Jan 01, similarly day 365 is Dec 31 and so on. To validate the clustering of load pattern data, two most popular methods such as the Davies-Bouldin index (DBI) and Silhouette coefficient (SC) are used. Davies-Bouldin criterion depends on a ratio within the cluster and between cluster distances [ 25, 27, 30, 31 ]. The Silhouette coefficient criterion incorporates two approaches: cohesion and separation. Cohesion measures closeness of objects in a cluster and separation finds whether the clusters are well-separated or not [ 30, 31 ]. 7.1 Case study-1 The effectiveness of the proposed method tests on a practical system of 20 zones [ 39, 40 ]. The data are annual hourly loaded (in kW) for US utility with 20 zones of year 2007. In most of the zones, the electricity consumption data are given in the range of thousands of kW. Therefore, the distance matrix is required to be scaled down. For each zone, the distance matrix is divided by the suitable divisor (scaling factor such as 103; 104; etc.) so that elements of distance matrix are in the range of 10. Different notations are used in Table 1 as zi is Zone-id; minfD; minkD are minimum value of DBI with fuzzy c-means and k-means respectively; maxfS; maxkS are maximum values of silhouette coefficient with fuzzy c-means and kmeans respectively; Nof;D; Nok;D are optimal numbers of clusters with DBI and Nof;S; Nok;S are optimal numbers of clusters with Silhouette coefficient using fuzzy c-means and k-means, respectively. 7.1.1 Results with fuzzy c-means and k-means k-means and fuzzy c-means clustering algorithms are implemented to cluster the load pattern data of different zones with different number of clusters. Optimal numbers of clusters of each zone are identified with Davies-Bouldin index and Silhouette coefficient and results of all 20 zones are shown in Table 1. 7.1.2 Results with DBSCAN While implementing DBSCAN, various combinations of Nminpts, reps are chosen, but no set of these global parameters is found to get clusters. Results at different values of parameters are given in Table 2, for Zone-1 and Nminpts ¼ 5 only, with different values of reps. There are no cluster formations on complete days. Further partition of load data is needed to find regular and irregular consumptions. 7.1.3 Results with proposed method Let uU is the percentage data used for unusual consumptions detection and defined as given below: CB uU ¼ CX 100 where CB and CX are cardinalities of set of border points, B, and whole load data, X, respectively. Using (6), for optimal partition of load data, Nminpts;o and reps;o for different regions are obtained as shown in Table 3 and LOF are calculated on only uU. Thus, the computational work is highly reduced. With the proposed method, the irregular consumptions are identified in each zone and these are ranked using LOF as per the irregularity. Low to high anomalous levels of different unusual consumptions are identified with the assignment of LOF. For Zone-1,4,5, six irregular consumption days with their LOF are shown in Table 4. In most of the zones, except Zone-4, the highest LOF is close to 2.0 so unusualness in electricity consumptions is not large in these zones. It is found that Zone-4 has most varied unusual consumptions (Fig. 2). In Zone-4, on days 152, 153, 350, 351(i.e. June 01, June 02 and Dec 16 and Dec 17, 2007), the LOFs are more than 3.0. It shows that on mentioned days, the electrical load consumption deviates in large amount compared to the normal load consumptions. In different zones, a limiting value for LOF can be set to isolate the outliers so that utilities can extract requisite information from outliers. Irregular consumptions of Zone-4 and 5 are shown in Figs. 3, 4, respectively. Different irregularity features are obtained and shown in Table 5 only on border points of Zone-4 which consists most varied unusual consumptions. Minimum values in sudden drop and gain features are decided same as 100 kW for minmax normalization. The 1100 kW, heuristically, is assumed as an acceptable demand to decide the irregular peak unusual consumptions. In this zone in a year 2007, no day is found which has zero electricity demand. FIFs are calculated of different unusual consumptions to rank them as {350, 351, 152, 153, 37, 26, 36, 42, 103} based on irregularity features. ð25Þ Type and occurrence of regular peak and valleys in clustering results are detected in different zones. Peak and valley as demand response opportunities of only Zone-4 and 5 are shown in Table 6 and Figs. 5, 6. Morning peak (mp), evening peak (ep) and valley (v) are identified. In different zones, it is found with clustering results that 2 to 3 clusters are sufficient for peak-valley assessment and the numbers are optimal or close to optimal. Notations used in Table 7 are described as, minpD is minimum or close to minimum value of DBI and maxSp is maximum or close to maximum value of Silhouette coefficient with proposed method; Nop is optimal or close to optimal number of clusters with proposed method. minpD 0.6677 0.6316 0.5677 0.6587 0.6339 0.6644 0.6923 0.6568 0.8738 0.6098 maxSp 0.7396 0.7600 0.7602 0.7726 0.7571 0.7551 0.7513 0.7647 0.6961 0.7121 Np o 7.2 Case study-2 Indian Institute of Technology Kanpur (IITK) distribution system gets power supply from Panki power grid via 33 kV lines. One 10 MVA and two 5 MVA, 33 kV/11 kV transformers are installed in main substation [ 41 ]. The 10 MVA transformer (Tr-3) of main substation caters the major demand in IITK. Unusual consumptions along with regulars are identified and analyzed in hourly load data of year 2013 of 10 MVA, 33/11 kV transformer. Two optimal clustering are obtained and validated with Silhouette coefficient as 0.7865, 0.7832 and 0.7901 from k-means, fuzzy c-means and proposed method, respectively. Clustering results and unusual consumptions are shown in Figs. 7, 8, respectively. The ranked irregular consumptions with LOF are shown in Table 8. The global parameters are set as Nminpts;o ¼ 20 and reps;o ¼ 5 according to (6). The number of border points is identified as 27 which is 7.40% of all 365 load patterns of year 2013. Unusual characteristics, with the proposed approach of characterization, are identified only in different border points. For these consumptions, the IFIF are calculated while assuming limiting values of dref ¼ 325A, dpeak;a ¼ 375A, dag ¼ 150A, dad ¼ 150A. Day 198 is having 392 A showing an irregular peak demand at 20:00 whereas Day 218 is having the broadest peak demand more than 325 A for maximum consecutive 8 hours (from 10:00 to 17:00). On Day 249, the demand drops sharply, a maximum drop in load pattern data, from 346 A to 0 A between 12:00 to 13:00. On the Day 250, the demand increases, sharply, from 0 A to 317 A between 11:00 to 12:00. On Day 272, the demand remains zero for 13 from 11:00 to 23:00. Each column of Table 9 is normalized with min-max normalization method. Min values in sudden drop and gain features are decided same as 150 A, zero values for npeak and nzero, and 375 A in irregular peak while max values, in respective columns, are used for normalization. Thus, unusual consumptions are compared with one other and IFIF are calculated. IFIF is composed of irregularity features present in the unusual consumption and the features which are dominating and others which have less effect can be identified. The ranking of unusual consumptions with IFIF With proposed approach Approach with k-means Fig. 9 Pertencage load data used for identification of broadest peak demand Based on the analysis of loading on a 10 MVA transformer at 33 kV/11 kV substation of IITK, in 2013, authors have identified a critical load profile using k-means algorithm [ 41 ] while utilizing complete load pattern data. This profile decides possible size of energy storage, without PV generation, for peak shaving operation. The broadest peak demand, defined in (20), is basically a critical load profile and helps in deciding the size of energy storage for peak shaving. To decide the critical load profile, the proposed approach of this paper works only on 7.40% of the load pattern data as that shown in Fig. 9. The profile of Day 218 shows the broadest peak demand. 9 Conclusion In this paper, the unusual consumptions are obtained by the proposed method, using the local outlier factor (LOF), on only a few percent of whole load pattern data. Different, unusual loadings, and occurrence and type of peak-valley demand on substations are identified. The different features of unusual consumptions have been analyzed with proposed characterization on only border points of two practical test systems. Test results reveal that the proposed method is very effective in finding the irregular consumption, such as different types of unusual peak demand, sudden large change and zero demand. Regular peaksvalleys are identified with clustering results obtained from proposed approach in order to distinguish irregular peaks in the load pattern data. To validate the clustering of load pattern data, two most popular methods such as the DaviesBouldin index (DBI) and Silhouette coefficient (SC) are used. Acknowledgements This work is supported by the Department of Science and Technology (DST), New Delhi, India (No. DST/EE/ 2014127). Also, D.D. Sharma acknowledges the MJP Rohilkhand University, Bareilly, UP for providing leave for pursuing PhD at IIT Kanpur. The views presented in this paper do not necessarily represent those of the PJM Interconnection, USA. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Desh Deepak SHARMA pursued Ph.D. degree in Electrical Engineering at Indian Institute of Technology, Kanpur, Kanpur under QIP scheme. He is employed as Associate Professor in Department of Electrical Engineering, M.J.P. Rohilkhand University, Bareilly, India. His research interests include demand side management, application of data mining techniques in load profiling and distribution system; distributed control schemes for distributed generation and energy storages, cyber-security issues in distributed control system. S. N. SINGH received the M.Tech. And Ph.D. degrees from Indian Institute of Technology, Kanpur, Kanpur, India, in 1989 and 1995 respectively. Presently, he is working as Professor in the Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur. His research interests include power system restructuring, power system optimization and control, voltage security and stability analysis, power system planning, and ANN application to power system problems. Prof. Singh is a fellow of IE(India), a fellow of IETE(India), and a fellow IET(UK). Jeremy LIN received his M.S.E.E in power and energy system from University of Illinois at Urbana-Champaign in 1999, M.B.A in finance from Villanova University in 2010 and a PhD from Drexel University. For more than a decade, he has been working in the electric power industry in various technical assignments, previously at Mid-America Interconnected Network (MAIN) Inc., GE Energy/ Energy Consulting, and currently at PJM Interconnection, USA. His work included power system security operation and engineering support, power flow analysis, and calculation of available transfer capability. His recent assignments included economic valuation and reliability analysis of major transmission projects in PJM and neighboring regions. He is also leading some research projects at PJM, funded by DOE and other research consortium. Elham FORUZAN received the master’s degree in electrical engineering from the university of Tehran. Currently, she is a dual degree student pursuing a Ph.D. in electrical and computer engineering, and a master’s degree in computer science, at the University of Nebraska-Lincoln. Her research interests include smart grid, artificial intelligence and machine learning, multi-agent systems, and power system cyber security. [1] -operations/energy .aspx [2] -new-primer-nation-selectricity-markets [3] Ni YX , Zhong J , Liu HM ( 2005 ) Deregulation of power systems in Asia: special considerations in developing countries . In: Proceedings of the 2005 IEEE Power Engineering Society general meeting, vol 3 , San Francisco, CA, USA, 12 -16 Jun 2005 , pp 2876 - 2881 [4] U.S . Department of Energy ( 2006 ) Benefits of demand response in electricity markets and recommendations for achieving them: a report to the United States Congress . Washington, DC, USA. Pursuant to Section 1252 of the Energy Policy Act of 2005 [5] Alhadi MH , El-Saadany EF ( 2008 ) A summary of demand response in electricity markets . Elect Power Syst Res 78 ( 11 ): 1989 - 1996 [6] Saele H , Grande OS ( 2011 ) Demand response from household customers: experiences from a pilot study in Norway . IEEE Trans Smart Grid 2 ( 1 ): 102 - 109 [7] Mathieu JL , Price PN , Kiliccote S et al ( 2011 ) Quantifying changes in building electricity use, with application to demand response . IEEE Trans Smart Grid 2 ( 3 ): 507 - 518 [8] Huang D , Billington R ( 2012 ) Effects of load sector demand side management applications in generating adequacy assessment . IEEE Trans Power Syst 27 ( 1 ): 335 - 343 [9] Logenthiran T , Srinivasan D , Shun TZ ( 2012 ) Demand side management in smart grid using heuristic optimization . IEEE Trans Smart Grid 3 ( 3 ): 1244 - 1252 [10] Nizar AH , Dong ZY , Zhang P ( 2008 ) Detection rules for non technical losses analysis in power utilities . In: Proceedings of the 2008 IEEE Power and Energy Society general meeting: conversion and delivery of electrical energy in the 21st century , Pittsburgh, PA, USA, 20 -24 Jul 2008 , 8 pp [11] Nagi J , Yap KS , Tiong SK et al ( 2010 ) Nontechnical loss detection for metered customers in power utility using support vector machines . IEEE Trans Power Deliv 25 ( 2 ): 1162 - 1171 [12] Dos-Angelos EW , Saavedra OR , Corte´s OAC et al ( 2011 ) Detection and identification of abnormalities in customer consumptions in power distribution systems . IEEE Trans Power Deliv 26 ( 5 ): 2436 - 2442 [13] Depuru SSSR , Wang LF , Devabhaktuni V ( 2012 ) Enhanced encoding technique for identifying abnormal energy usage pattern . In: Proceedings of the North American power symposium (NAPS'12) , Champaign, IL, USA, 9 -11 Sept 2012 , 6 pp [14] Willis HL , Schauer AE , Northcote-green JED et al ( 1983 ) Forecasting distribution system loads using curve shape clustering . IEEE Trans Power Appl Syst 102 ( 4 ): 893 - 901 [15] Grigoras G , Cartina G , Bobric EC ( 2010 ) An improved fuzzy method for energy losses evaluation in distribution networks . In: Proceedings of the 15th IEEE Mediterranean electrotechnical conference (MELECON'10) , Valletta, Malta, 25 -28 Apr 2010 , pp 131 - 135 [16] Zhou G , Zhao W , Lu¨ XJ, et al ( 2014 ) A novel load profiling method for detecting abnormalities of electricity customer . In: Proceedings of the 2014 IEEE Power and Energy Society General Meeting , Washington, DC, USA, 27 -31 Jul 2014 , 5 pp [17] Wijayasekara D , Linda O , Manic M et al ( 2014 ) Mining building energy management system data using fuzzy anomaly detection and linguistic descriptions . IEEE Trans Ind Inf 10 ( 3 ): 1829 - 1840 [18] Chen CS , Kang MS , Hwang JC et al ( 2000 ) Synthesis of power system load profiles by class load study . Elect Power Energy Syst 22 ( 5 ): 325 - 330 [19] Chicco G , Napoli R , Postolache P et al ( 2003 ) Customer characterization options for improving the tariff offer . IEEE Trans Power Syst 18 ( 1 ): 381 - 387 [20] Gerbec D , Gasperic S , Smon I et al ( 2005 ) Allocation of the load profiles to consumers using probabilistic neural networks . IEEE Trans Power Syst 20 ( 2 ): 548 - 555 [21] Espinoza M , Joye C , Belmans R et al ( 2005 ) Short-term load forecasting, profile identification, and customer segmentation: a methodology based on periodic time series . IEEE Trans Power Syst 20 ( 3 ): 1622 - 1630 [22] Nizar AH , Dong ZY , Zhao JH ( 2006 ) Load profiling and data mining techniques in electricity deregulated market . In: Proceedings of the 2006 IEEE Power Engineering Society general meeting , Montreal, Canada, 18 -22 Jun 2006 , 7 pp [23] Chicoo G , Napoli R , Piglione F ( 2006 ) Comparisons among clustering techniques for electricity customer classification . IEEE Trans Power Syst 21 ( 2 ): 933 - 940 [24] Verdu SV , Garcia MO , Senabre C et al ( 2006 ) Classification, filtering, and identification of electrical customer load patterns through the use of self-organizing maps . IEEE Trans Power Syst 21 ( 4 ): 1672 - 1682 [25] Tsekours GJ , Kotoulas PB , Tsirekis CD et al ( 2008 ) A pattern recognition methodology for evaluation of load profiles and typical days of large electricity customers . Elect Power Syst Res 78 ( 9 ): 1494 - 1510 [26] Chicco G , Ilie I-S ( 2009 ) Support vector clustering of electrical load pattern data . IEEE Trans. on Power Systems 24 ( 3 ): 1619 - 1628 [27] Zhang T , Zhang G , Lu J et al ( 2012 ) A new index and classification approach for load pattern analysis of large electricity customers . IEEE Trans Power Syst 27 ( 1 ): 153 - 160 [28] Chicoo G , Ionel O-M , Porumb R ( 2013 ) Electrical load pattern grouping based on centroid model with ant colony clustering . IEEE Trans Power Syst 28 ( 2 ): 706 - 1715 [29] Jain A , Murty M , Flynn P ( 1999 ) Data clustering: a review . ACM Comput Surv 31 ( 3 ): 264 - 323 [30] Xu R , Wunsch D ( 2005 ) Survey of clustering algorithm . IEEE Trans Neural Netw 16 ( 3 ): 645 - 678 [31] Patwary MMA , Palsetia D , Agarwal A , et al ( 2012 ) A new scalable parallel DBSCAN algorithm using the disjoint-set data structure . In: Proceedings of the international conference on high performance computing, networking, storage and analysis (SC'12) , Salt Lake City, UT , USA, 10 -16 Nov 2012 , 11 pp [32] Mutanen A , Ruska M , Repo S et al ( 2011 ) Customer classification and load profiling method for distribution systems . IEEE Trans Power Deliv 26 ( 3 ): 1755 - 1763 [33] Stephen B , Mutanen AJ , Galloway S et al ( 2014 ) Enhanced load profiling for residential network customers . IEEE Trans Power Deliv 29 ( 1 ): 88 - 95 [34] Hsiao YH ( 2015 ) Household electricity demand forecast based on context information and user daily schedule analysis from meter data . IEEE Trans Ind Inf 11 ( 1 ): 33 - 43 [35] Chandola V , Banerjee A , Kumar V ( 2009 ) Anomaly detection: a survey . ACM Comput Surv 41 ( 3 ): 1 - 58 [36] Breunig MM , Kriegel HP , Ng RT et al ( 2000 ) LOF: identifying density-based local outliers . ACM SIGMOD Rec 29 ( 2 ): 93 - 104 [37] Schubert E , Zimek A , Kriegel HP ( 2014 ) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection . Springer Data Mining Knowl Discov 28 ( 1 ): 190 - 237 [38] Aggarwal CC ( 2013 ) Outlier Analysis . Springer, New York, NY, USA [39] Global energy forecasting competition 2012-load forecasting-a hierarchical load forecasting problem: backcasting and forecasting hourly loads (in kW) for a US utility with 20 zones . Kaggle, San Francisco, CA, USA [40] Hong T , Pinson P , Fan S ( 2014 ) Global energy forecasting competition 2012 . Int J Forecast 30 ( 2 ): 357 - 363 [41] Sharma DD , Singh SN , Rajpurohit BS , et al ( 2015 ) Critical load profile estimation for sizing of energy storage system . In: Proceedings of the 2015 IEEE Power and Energy Society General Meeting , Denver CO , USA, 26 -30 Jul 2015 , 5 pp

This is a preview of a remote PDF:

Desh Deepak SHARMA, S. N. SINGH, Jeremy LIN, Elham FORUZAN. Identification and characterization of irregular consumptions of load data, Journal of Modern Power Systems and Clean Energy, 2017, 465-477, DOI: 10.1007/s40565-017-0268-1