Labeling Communities Using Structural Properties

June 1, 2017 | Autor: M. Saravanan | Categoria: Machine Learning, Graph Theory, Group Behavior, Consumer Behavior, Mobile Communication, Instant Messaging, Social Web, Structural Properties, Social Network Analysis, University Student, Social Network, Instant Messaging, Social Web, Structural Properties, Social Network Analysis, University Student, Social Network

Share Embed

Denunciar este link

Descrição do Produto

2010 International Conference on Advances in Social Networks Analysis and Mining

LABELING COMMUNITIES USING STRUCTURAL PROPERTIES Karishma Surana and D. Suganthi

M. Saravanan and G.Prasad

Ericsson India Private Limited Ericsson R &D Chennai, India {[email protected], [email protected]}

Dept. of Information Technology Sri Venkateswara College of Engineering Chennai, India {[email protected], [email protected]}

millions of nodes and a few billion edges, our intention is to split the network into smaller groups in order of magnitude, but still it facilitates an improved performance on targeted advertisements. Graph mining and social network analysis together could be used to identify closely knit groups in social networks [2]. In this paper, we propose an interesting approach for combining the applications of both SNA and graph theories for breaking the community into smaller sub-units with further analysis in order to improve group targeting and retention.

Abstract—Mobile Social Network Analysis is the mapping and measuring of interactions and flows between people, groups, and organizations based on the usage of their mobile communication services. Social Network Analysis and Mining has been highly influenced by the online social web sites, telecom consumer data and instant messaging systems, and has widely analyzed the presence of dense communities using graph theory and machine learning techniques. Community mining is one of the recent major directions in social network analysis. In this paper we find the communities in the network based on a modularity factor. Then we propose a graph theory based algorithm for further split of communities resulting in smaller sized and closely knit sub-units, to understand consumer behavior in a comprehensive manner. These sub-units are then analyzed and labeled based on their group behavior pattern. In this paper we measure and analyze the uniqueness of the structural properties for each small unit, it is another quick way to assign suitable labels for each distinct group. The effectiveness of the employed algorithms was evaluated on a huge telecom database in three different stages of our work.

The main contribution of our research is the use of mobile phone call graphs (networks) obtained from the Call Detail Records (CDRs) for testing and evaluating community detection algorithms at a finer level and identifies the structural properties of the communities. It helps the operators to offer the right incentives to the groups, adopt the right marketing strategies, and use the resources appropriately. We believe the call graphs behave similar to other social networks such as those of email and expect that our results may be used in a more general setting. Graph theoretic information from subgraphs can allow service providers to better understand the underlying behavior of users in order to design incentives to increase subscriber satisfaction level and prevent/reduce churn.

Keywords—Mobile Social Network Analysis, Community Identification, Articulation Point, Structural Property, Usage and Spend Behavior

I. INTRODUCTION

The proposed approach is sub-divided into three major sequences with two additional steps of finding consumer behavior based on mobile usage and identifying the unique structural properties of the detached groups. Few experiments on telecom networks have been undertaken to determine the network parameters like degree distributions, connectivity’s and cliques [5, 6] and to discover and characterize a broad set of structural properties of networks [7]. However, to the best of our knowledge, this is first attempt to label the sub-units with distinct group name related to characteristics of telecom subscribers based on the structural properties of the sub-units. We have used the dataset from one of the largest telecom operators in the world. In this study, we report the findings on various structural properties of these massive telecom subgraphs.

A social network is a group of people who are connected through their use of communication services. The effectiveness of the social network analysis is concerned with the analysis of the influence of members in a community within a social network on a product purchase and service usage [1]. Mobile Social Network analysis (MSNA) is an upcoming and interesting area of concern in telecom industries since it not only helps in exploring the information regarding the social network of subscribers but also helps the operators to focus on their business analytics. MSNA is being used to give a solution to some of the telecom problems such as to improve churn prediction, expedite the campaigns for up-selling and cross-selling activities, and overall for customer satisfaction and retention [2]. The structure of customer communication networks provides a natural way to understand customers’ relationships and to a certain extent the behavior of groups of highly connected customers. There are many related works which have been presented in Social Network Analysis (SNA) like finding of communities, alpha users within the communities, hub member and non-hub members’ identification [3, 4]. Faced initially with a mobile social network comprising hundreds of

978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.49

Our first and foremost task is to find out the communities from a given huge telecom network using modularity as factor of initial splitting. The modularity of a partition is a scalar value between -1 and +1 that measures the density of links inside communities as compared to links between communities [8]. Hence we applied a community identification algorithm to segment the social network of a particular time slot into various communities and then analyze the overall 217

behavior of the highly connected component. Secondly, with the help of graph theory algorithm, we split the evolved community into small closely knitted sub-units. This is an essential step because the community identification algorithm does not produce smaller densely connected communities to understand their demographic segments. In order to understand consumer behavior comprehensively we need to take small dense communities instead of huge communities with sparse connections. It is also a basic logic that closely knit customers are of the same demographic segments. In the final step, we analyze the behavior pattern of the groups formed and also find the structural properties of the sub-units. We also believe that the structural properties of the sub-units would be an indicative of a particular characteristic group of customers [7]. We conduct additional experiments to highlight our thoughts to reveal the finer structure of the sub-units. The comprehensive flow of the execution of our research work is given below:

Figure 2 Flow chart for identifying the demographics of the group

Fig. 2 illustrates few rules in the form of flow chart in order to give a better understanding of the classification.

Step 1: Community Identification Communities are formed by considering the gain in modularity when two or more communities, which are initially nodes, merge. This is a recursive process which is continually applied on the network till further merging results in no more gain in modularity.

Step 4: Finding the structural properties of the group The structural properties which would be indicative of specific behavior are measured for each group Step 5: Mapping structural properties to usage behavior As proposed earlier the structural findings and the usage behavior are mapped from the groups formed in the training sets. This can be further made automated for identifying groups of customers using the structural properties itself instead of analyzing the usage behavior to simplify the process. In the rest of this paper we motivate our approach, describe the methods selected for community detection, applying graph theoretic algorithms to split the evolved community into smaller subgraphs and present results on applying our approach to a large real-world data in the telecommunication industry, and finally discuss our findings.

Figure 1 Shows connectivity of a particular community

II. RELATED WORK

Step 2: Finding sub-units in a community Applying the graph theory approach of splitting on the communities already detected using community identification algorithms, helps us to further get closely connected components as seen in Fig 1. The figure shows a community which could be split in to four sub groups. Finding out the articulation point in a network is one of the graph theoretical ways of splitting network. The articulation point forms the demarcation point where the network is split into groups to eliminate the weakly linked groups. Step 3: Finding usage behavior of the groups Finding common usage behavior predominant in that group

which

Identifying community structure is an important issue in network science and has attracted the attention of researchers in many fields [9]. Social Network analysis for telecom network is not much explored from a structural point of view. Community detection involves finding nodes that are closely and sparsely knit. The closely knit nodes fall in a single community with more similar characteristics. There are several interesting techniques of community detection available to segment the total network. One such algorithm is the CNM algorithm [10] that helps in finding the communities using a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms. Its running time on a network with n vertices and m edges is O (m d log n) where d is the depth of the dendrogram describing the community structure. This fast community identification algorithms optimizing the modularity of the partition in a greedy way [11], a method that even improved, does not allow to analyze more than a few millions nodes [12].

is

218

the communities based on the measurement of structural properties.

Recently another new algorithm has been proposed to find communities based on modularity factor where the modularity is found and then the communities are further split using simulated annealing [12]. Simulated annealing(SA) is a stochastic optimization technique that enables one to find ’low cost’ configurations without getting trapped in ’high cost’ local minima. This method is more ‘transparent’ than those relying on heuristic procedures. Furthermore, SA enables to carry out an exhaustive search and to minimize the problem of finding sub-optimal partitions. In this method one does not need to specify a priori of the number of modules; rather, the number of modules is an outcome of the algorithm. Moreover, it is claimed that this algorithm works extremely fast, which allows to analyze systems of unprecedented sizes [12]. Another new approach based on self-organizing map has been proposed for community detection in both weighted and bipartite networks [13]. In the recent times, there is a lot of interest in studying the community structure in graphs for obtaining finer details. Most of the studies address properties of the graph including its size, density, distance, in and out degree distributions, small-world phenomenon, clustering coefficient, connectivity status, etc. Email graphs and online interactions have been studied in the context of explaining and analyzing friendship and demonstrating the small-world and navigability properties of these graphs [14, 15]. In telecom domain, Abello et. al. [5] first experimented with the call graph of landline phones made on 1-day consisting of approximately 53 million nodes and 170 million edges. The generated call graph showed that the degree sequence was not quite a perfect power law and hence they introduced a unique class of random graphs with a power law degree consequence to capture the distribution. The TreasureHunt model [7] is an attempt to study a broad set of parameters that reveal various structural properties of mobile call graphs. It determines how the structure of these call graphs evolve over time. Uncovering such community structure not only helps in understanding the topological structure of large-scale networks, but also helps revealing the functionality of each component [9]. In this paper, we further explore the structural properties and try to label the community based on the uniqueness. In terms of group labeling, there are many works presented where the roles of each module/community have been defined using their structural properties [2]. It hypothesizes that the role of a node can be determined, to a great extent, by its within-module degree and its participation coefficient, which define how the node is positioned in its own module and with respect to other modules. It finds various hub roles and also enables one to identify more ‘subtle’ roles, such as non-hub connectors, which play important structural roles in spite of their small number of connections.

III. COMMUNITY IDENTIFICATION Areas of current interest to the scientific community can usefully be represented as networks. Examples include the Internet and the world-wide web, social networks, citation networks, telecom networks, food webs, and biochemical networks, etc. Each of these networks consists of a set of nodes or vertices representing, for instance, computers or routers on the Internet or people in a social network, connected together by links or edges, representing data connections between computers, friendships between people, and so forth. One network feature that has been emphasized in recent work is community structure, the gathering of vertices into groups such that there is a higher density of edges within groups than between them. The problem of detecting such communities within networks has been studied and led to publishing of many algorithms, within which CNM and fast unfolding algorithm are being popularly used. A. Algorithms There are many community detection algorithms produced based on modularity factor. Two major algorithms which we would implement are CNM and Fast Unfolding algorithm in our community identification process. In CNM algorithm [11], first each node is taken as one community and adjacency matrix is drawn for that particular telecom network. If Avw is the adjacency matrix and (cv,cw) indicates the connectivity is equal to 1 if the vertices are connected else 0, i.e,

is the number of edges in the graph and m= vertex v belongs to community cv. then the fraction of the edges making one community can be given as

(1) This value is directly proportional to the probability of two edges merging in one community. But the value should be taken care of since if equal to 1 this implies that all the nodes belong to same community. The main parameter which is taken into consideration while splitting is the modularity. If the degree kv of a vertex v is defined to be the number of edges incident upon it, the modularity (Q) is defined by,

The difficulty in obtaining temporal information about every node/edge in an evolving real-world graph [13] makes us to take snapshots of the graph at various points in time and use these snapshots to make inferences about the evolutionary process. Using this approach, an extensive study has been conducted to understand the structural properties of different snapshots of the world-wide web graph in [15]. In our paper, the same approach is used in telecom domain to study the call graphs for different time slots. In addition to this, we also label

(2) Thus while merging each time, the rows and columns of the corresponding communities are blended. Alternate way for this

219

implementing CNM (un-weighted) algorithm was 2918, with an average size of 21 members in each community and modularity of 0.6368 and the time taken was approximately 7 minutes. But on the other hand, Fast Unfolding algorithm generated 2339 communities on same number of users having average size of 26 and modularity of 0.8073, which is higher than that of CNM algorithm but the time taken, was lesser and measured in seconds. If we take weighted graph into consideration, taking the cost of call as the weight then the number of communities formed was same as un-weighted and so the average size but the modularity differed widely. From the above statistics, we chose to implement fast unfolding algorithm in our work for community detection from telecom networks.

the paper [10] suggests is storing the difference in values of the modularity that would result from the conglomeration of communities, choosing the largest of them, and performing the corresponding amalgamation. Thus resulting in lesser memory storage and computation cost. Another algorithm is the fast unfolding of large communities [12]. Fast unfolding algorithm is an alternate way of finding communities using modularity. The community formation algorithm is divided into two phases. In the first phase, each node has its individual community. Then the modularity is found with all its neighbors and the change in modularity value is evaluated. If there is a positive gain, the communities of nodes are merged into one. This procedure is applied to all nodes in the network. This first phase stops when a local maxima of the modularity is attained, i.e. when no individual move can improve the modularity. Part of the algorithm’s efficiency results from the fact that the gain in modularity ΔQ obtained by moving an isolated node i into a community (C) can be easily calculated. If

TABLE I STATISTICAL INFORMATION WHILE IMPLEMENTING TWO ALGORITHMS Algori thm

No. of users

No. of commu nities

CNM

61316

2918

Fast Unfold ing (U.W)

61316

Fast unfoldi ng (W)

61316

is the sum of

the weights of the links inside C, is the sum of the weights of the links incident to nodes in C, ki is the sum of the weights of the links incident to node i, ki, in is the sum of the weights of the links from i to nodes in C and m is the sum of the weights of all the links in the network then, ∆Q is calculated as,

Max Size

Min Size

Avg. Size

Time Taken (min.s ec)

Modularity Score

12647

2

21.01

0:7:41

0.6368

2339

2055

2

26.0

0:0:7

0.8073

2339

7788

2

26.0

0:0:1

0.6514

IV. STRUCTURAL PROPERTIES OF A COMMUNITY

(3)

Structural properties of a network play a vital role as they may be indicative of a particular characteristic behavior. Thus we take a community and find out the properties of that particular community. In this paper, we have taken some of the measures which are indicative of the main characteristics of a community [2]. We take into account the centrality measures of each node and generalize it to the entire community.

In second phase, each community is taken as node and the process is repeated, until all the weights are summed. Links between nodes of the same community lead to self-loops for this community in the new network. Once this second phase is completed, it is then possible to reapply the first phase of the algorithm to the resulting weighted network and to iterate. These two phases are iteratively performed unless stabilized value is reached.

A. Implementation

B. Data

We found the structural properties of the community of size 3099 nodes. Fig. 3 shows the Degree Distribution of the community which follows the power law distribution. The community has a maximum degree 22 and minimum of 1. Fig. 4 plots the z-score of each node; two nodes with the same z-score will probably play different roles if one of them is connected to several nodes in other modules while the other is not. It is found that many nodes have z-score greater than 2.5, which means, there are many hub-nodes in the given community. Splitting of the community would be a right choice. Fig. 5 shows the betweenness centrality of the same community. The community has 217 nodes having the

The network approximation procedure is applied on a node level to a telecom network of approximately 1.2 million nodes and 10 million edges, with few constraints. Firstly, the self loops should be removed. Secondly, the connectivity between the nodes represents the voice connectivity only. Thirdly, the foreign network nodes should be eliminated. C. Implementation The two algorithms discussed were implemented; the analytical information has been tabulated in Table I. The table gives insight to the number of communities generated for given number of nodes along with maximum, minimum and average size of the communities formed. As it can be seen, for a sample of 61316 users, the number of communities formed while

220

values in the range 1,00,000- 5,00,000 and 14 nodes with the betweenness value ranging between 6,00,000- 10,00,000. Hence many nodes act as a bridge between groups, and the removal of these nodes would lead in the formation of many small groups. Similarly Fig. 6 shows the closeness centrality. There are 2388 nodes having closeness centrality 6722197221, and hence showing that they are sparsely connected to each other. This statistics reaches to the conclusion that splitting of the community would lead to many closely knit groups.

Figure 6 Closeness centrality, with the node id as x-axis and closeness centraity as y-axis

V. SPLITTING COMMUNITIES THROUGH GRAPH THEORY APPROCH A Social network could also be viewed as a graph. In a network graph, each node is referred to as vertex and the link between two nodes is referred to as an edge. The communities could be taken as a graph and further split into smaller groups or sub units using some graph theory approach. Applying graph theory approach of splitting, on the communities which are detected using community identification algorithms, helps us to further get closely connected components. There are different approaches used for splitting the graph into sub-graphs. One of the approaches is to find the articulation point [6] and split the graph based on the articulation point so that the community is further broken down to sub groups. We go on in finding out the articulation point since articulation points in a network are those which are critical to communication: for an articulation point, all paths between certain nodes have to pass through this point. This point in the graph, on its removal causes increase in number of bi-connected components. This algorithm uses depth first search to find out the cut vertices. We use depth first algorithm for traversing since, DFS takes less memory, therefore is less likely to fail and it explores down the tree first which is required for finding out the articulation point .One more advantage is that the running time of the algorithm is O(|E|),where E is number of edges.

Figure 3 A log based graph of degree distribution of the community with value of degree centrality as x-axis and number of nodes as y-axis.It follows a power tail.The given community had highest of 22 links

Figure 4 z score of the community with z-score of a node as x-axis and f(x) as the y-axis.

A.

Articulation point graph splitting In a graph G = (V, E), v is an articulation point if: •

removal of v in G results in a disconnected graph

•

If v is an articulation point, then there exist distinct vertices w and x such that v is in every path from w to x.

Finding articulation points can be done by using DepthFirst Search. In a DFS tree of an undirected graph, a node ‘u’ is an articulation point if, for every child ‘v’ of ‘u’, there is no back edge from ‘v’ to a node higher in the DFS tree than ‘u’. That is, every node in the decedent tree of ‘u’ has no way to

Figure 5 Betweenness centrality with the node id as x-axis and betweeness value on y-axis.

221

visit other nodes in the graph without passing through the node ‘u’, which is the articulation point. Thus, for each node in DFS traversal, we calculate dfsnum(v) and LOW(v). The definition of dfsnum(v) is indicative of whether node is visited or not, LOW(v) is the lowest dfsnum of any vertex that is either in the DFS sub-tree rooted at v or connected to a vertex in that sub-tree by a back edge. Then, in DFS, if there are no more nodes to visit, we back up and update the values of LOW as we return from each recursive call. The node ‘x’ indicates node(s) that is (are) connected to ‘v’.

(a)

Figure 7 (a) Shows the community before applying articulation point (b) Shows the two large groups formed by removing the edges between the articulation point

VI. COMMUNITY MINING BASED ON CONSUMER BEHAVIOR PATTERN

Input: A mapped graph, G = (V, E) Output: Cut vertex components

(articulation

point),

bi-connected

Community mining is finding pattern in a community. Getting knowledge about pattern would give various kinds of information about what kind of nodes are present in a particular community, their behavioral characteristics which in turn help in business analytics. The study of groups helps firms and organizations improve their marketing strategies by understanding issues such as how marketers can adapt and improve their marketing campaigns and marketing strategies to more effectively reach the consumer, the psychology of how the group is influenced by other groups.

Procedure: 1. 2.

3.

4.

5.

Initialize the value of dfsnum as -1 indicating node is not yet been discovered. For all the edges (v, x), if the dfsnum of x is -1 then increment the dfsnum of v and also numChildren of v. Push the edge (v, x) to the stack and perform this recursively. The low value is calculated for v as min(v.low, x.low). The definition of LOW(v) is the lowest dfsnum of any vertex that is either in the DFS sub tree rooted at v or connected to a vertex in that sub tree by a back edge. If the articulation point is the root and there are more than 2 children, retrieve all edges in a bi-connected component.

A. Behavior pattern There are various parameters that can be taken into consideration for finding the behavior pattern of a particular group. The classification can be based on usage pattern, spent pattern or a location-wise discrimination.

If the x.low is lesser than v.dfsnum then v is the articulation point separating x and retrieves all the biconnected edges.

6.

If x.dfslevel < v.dfslevel-1 then x is at lower level than of v’s parent, equivalent to (v, x) is a back edge.

7.

The low value of v is the minimum of v.low and x.dfsnum.

(b)

According to usage pattern, the three broad denominations like youth, corporate and homebound are considered in our work. Group of youth can be portrayed as having a high frequency of SMS throughout the day, along with call frequency and usage high in the evening and having good level of reciprocity in messaging services as well as voice service. Group of corporate customers can be found having comparatively less SMS communications with Call frequency high during office hours. Group of homebound customers are characterized as having call duration more in the morning and evening, with least frequency of SMS.

B. Results For the purpose of evaluating the algorithm, a community of well connected customers was taken. This community consisted of 1015 vertices with 2193 edges. After applying articulation point algorithm 39 groups were formed with minimum of 4 edges and maximum of 1571edges. There are other communities which split into two major subgroups. Fig.7 (a) shows a community before applying the articulation point graph splitting. The algorithm when implemented splits the community into two groups, which is shown in Fig 7 (b).

222

B. Demonstration of structural characteristics for labeling Table III tabulates the various structural properties of the subgroups which have been labeled earlier using their behavior pattern as already discussed in section VI. From the table we infer that the corporate group has a higher participation coefficient than the homebound, implying that the corporate groups interact to the outside world proportionately. Homebound customers reciprocate in voice calls more than teenagers. Degree centrality of home bound is more. Closeness centrality values of teenagers are approximately between 0.10.2, which is lower than homebound which in turn ranges between 0.4-0.5. Homebound customers are closely knit to each other in the group. There is more number of influential users in teenagers than in corporate groups by seeing betweenness values. The corporate group has the highest zscore, ascertaining that nodes in the group have a higher in-out degree.

Figure 8 Shows the usage behavior of the group in different time slots the red bar denotes the duration of call, blue bar indicates the voice frequency, green bar indicates SMS count and yellow denotes GPRS frequency.

TABLE II RULES FOR IDENTIFYING DEMOGRAPHICS AND LABELING

In order to get a practical feel, we have listed few rules that can be used to label a group to one of the three categories (Corporate, Homebound, Teenagers). Fig. 8 gives the time slots on the basis of which the groups are labeled. Table II lists few rules for finding out the demographics using the usage pattern. For example, if the group makes a less frequent but long duration calls at night, with a high usage of messaging service are labeled as YOUTH.

If Voice usage >30 Voice frequency >=1 Messaging Service>10 at night then label=”YOUTH” If Voice usage >20 Voice frequency>10 Messaging Service>40 at evenings then label=”YOUTH” If Voice usage >30 at night Voice frequency30 at office hours then label=”YOUTH”

If Voice usage>20 in evening Voice frequency=15 in office Messaging Service=15 in evening then label=”CORPORATE”

223

TABLE III SHOWING THE STRUCTURAL PROPERTIES AND CLASSIFICATION Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No of edges 30 83 29 44 21 20 15 50 3046 13 24 16 1747 43 19

Participation coefficient 0.9979 0.9269 0.8995 0.8928 0.0712 0.7037 0.9877 0.9960 0.9942 0.1346 0.8687 0.8892 0.9752 0.9937 0.9886

Z score 6.56E-07 2.32E-06 5.96E-07 8.64E-07 1.19E-06 -3.43E-07 1.79E-07 -6.85E-07 7.24E-04 5.36E-07 -5.36E-07 -4.77E-07 1.27E-04 5.96E-07 -2.38E-07

Degree centrality 0.0736 0.1042 0.2821 0.0766 0.3333 0.3611 0.3571 0.0839 0.0011 0.2444 0.1978 0.1970 0.0051 0.0847 0.2545

Closeness centrality 0.2497 0.3498 0.4589 0.2413 0.5256 0.5270 0.5179 0.2442 0.0617 0.4472 0.3836 0.3860 0.1548 0.2226 0.4845

Betweenness centrality 0.1151 0.0596 0.1189 0.1112 0.1194 0.1429 0.1726 0.1149 0.0075 0.1667 0.1474 0.1727 0.0077 0.1393 0.1293

Clustering coefficient 0.0679 0.3360 0.3410 0.2182 0.4133 0.4333 0.3917 0.1664 0.0968 0.0767 0.1500 0.1750 0.1546 0.1869 0.3203

Reciprocity

Class label

0.3158 0.5060 0.4828 0.2727 0.5714 0.7000 0.6667 0.4400 0.3237 0.3077 0.5000 0.3750 0.3858 0.5116 0.5263

Corporate Corporate Corporate Corporate Home bound Home bound Home bound Homebound Teenager/Homebound Teenager Teenagers Teenager Teenagers Teenager Others

VII. CONCLUSION [6]

From the above implementation, we came up finding out one of the methodology for splitting of communities. We also found the behavior of these groups along with the structural properties. Few structural properties were unique to each group. For example, the reciprocity was high in youth group than the other groups in terms of Message service. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work.

[9]

REFERENCES

[10]

[1]

[2]

[3] [4]

[5]

[7]

[8]

Shaun Doyle, “Social network analysis in the Telco sector — Marketing applications” Journal of Database Marketing & Customer Strategy Management (2008) 15, 130 – 134. doi: 10.1057/dbm.2008.8 Lei Tang and Huan Liu, “Graph Mining Applications to Social Network Analysis”, Managing and Mining Graph Data, Springer US, PP 487-513, 2010 Kleinberg, J. “Authoritative sources in a hyperlinkedenvironment,” Journal of the ACM (JACM), vol. 46, no. 5, pp.604–632, 1999. Sen Wu; Lingyu Wu; Haiyan Jin; and Shujuan Gu; Customer Ranking Authority-Hub Algorithm for Mobile Communications in China, Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. pp. 559 - 563 Abello, J., Pardalos, P., and Resende, M. “On maximum clique problems in very large graphs”. In DIMACS Series, 50, American Mathematical Society (1999), pp. 119-130.

[11]

[12]

[13]

[14] [15]

224

William Aiello, Fan chung, L.L. “A random graph model for massive graphs”, In Proceedings of the thirty-second annual ACM symposium on Theory of computing (2000), pp. 171-180. Amit A. Nanavati, Rahul Singh, Dipanjan Chakraborty, Koustuv Dasgupta (Member, IEEE), Sougata Mukherjea, Gautam Das, Siva Gurumurthy, Anupam Joshi (Senior Member, IEEE). “Analyzing the Structure and Evolution of Massive Telecom Graphs”. In IEEE Transactions on Knowledge and Data Engineering, 2007. Michelle Girvan and M. E. J. Newman, "Community structure in social and biological networks," in Proceedings of the National Academy of Sciences (USA) 99 (2002), pp. 7821--7826. Li, Z., Wang, R.S., and Chen, L. “Extracting community structure of complex networks by self-organizing maps”, in Proceedings of the third international symposium on optimization and systems biology (OSB’09), pp.48-56, September 20-22, 2009. Clauset, A., Newman, M.E.J., and Moore, C. “Finding community structure in very large networks”. Phys. Rev. E 70, 066111 (2004) M. E. J. Newman "Modularity and community structure in networks", physics/0602124 = Proceedings of the National Academy of Sciences (USA) 103 (2006): 87577--8582 Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre, “Fast unfolding of communities in large networks” , J. Stat. Mech. (2008), pp.1-12 Wakita, K., and Tsurumi, T. “Finding community structure in a megascale social networking service”. in Proceedings of IADIS international conference on WWW/Internet 2007, pp.153-162 Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. “ Structure and evolution of blogspace”, CACM 47 (12), pp.35-39, 2004. Dodds, P.S., Muhumad, R., and Watts, D.J. “An experimental study of search in global social networks”. Science, 301, pp.827-829, 2003.

Lihat lebih banyak...

Labeling Communities Using Structural Properties

Descrição do Produto

Comentários