import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
import networkx as nx
Network Science - UDD
Network metrics
Cristian Candia-Castro Vallejos, Ph.D.
Yessica Herrera-Guzmán, Ph.D.
- [1] Data Science Institute (IDS), Universidad del Desarrollo,Chile
- [2] Northwestern Institute on Complex Systems, Kellogg School of Management, Northwestern Unviersity, USA
- [3] Center for Complex Network Research (CCNR), Northwestern Unviersity, USA
= "edgelist.txt" file_path
= nx.Graph() g
with open(file_path, "r") as file:
for line in file:
# Split each line to extract the node pairs
= line.strip().split()
nodes if len(nodes) == 2:
0], nodes[1]) g.add_edge(nodes[
# Fruchterman-Reingold force-directed layout algorithm ~ default
= plt.figure(figsize=(6,6))
fig =40)
nx.draw(g, node_size# if you need labels to check nodes:
# nx.draw(g, node_size=40, with_labels=True)
Three basic properties
= len(g)
N = g.size()
L = list(dict(g.degree()).values())
degrees = min(degrees)
kmin = max(degrees) kmax
print("Number of nodes: ", N)
print("Number of links: ", L)
print('-------')
print("Average degree: ", 2*L/N) #Formula vista en clases (qué sucedía con las redes reales?)
print("Average degree (alternative estimation)", np.mean(degrees))
print('-------')
print("Minimum degree: ", kmin)
print("Maximum degree: ", kmax)
Number of nodes: 197
Number of links: 1651
-------
Average degree: 16.761421319796955
Average degree (alternative estimation) 16.761421319796955
-------
Minimum degree: 1
Maximum degree: 43
Degree distribution
= np.logspace(np.log10(kmin), np.log10(kmax), num=20)
bin_edges
# remember to make the histogram!
= np.histogram(degrees, bins=bin_edges, density=True) density, _
= plt.figure(figsize=(4,4))
fig
= np.log10(bin_edges) # log scale
log_be = 10**((log_be[1:] + log_be[:-1])/2) # (X_i+X_(i+1))/2
x
='o', linestyle='none')
plt.loglog(x, density, marker# don't forget axis labels and right size!
r"degree $k$", fontsize=16)
plt.xlabel(r"$P(k)$", fontsize=16)
plt.ylabel(
plt.show()
# Check binning
= plt.figure(figsize=(4,4))
fig
=bin_edges[1:] - bin_edges[:-1]
a
='blue', lw=2)
plt.plot(a, color
plt.show()
Path length
nx.average_shortest_path_length(g)
NetworkXError: Graph is not connected.
Oh! Grapht is not connected!
How do we measure the average path length, then?
# Explore number of connected components
= list(nx.connected_components(g)) components
= max(components, key=len) largest_component
# This is a subgraph of the largest connected component
= g.subgraph(largest_component) lc_graph
nx.average_shortest_path_length(lc_graph)
1.696969696969697
nx.diameter(lc_graph)
2
Quick quizz:
How are the average shorthest path and diameter related?
What do they measure drom the network structure?
How would the diameter change for a random network of N nodes, when changed to a tree-like structure?
Compute the size of the largest connected component: number of nodes and edges respect to the entire network.
Clustering coefficient
= nx.clustering(g) clustering
= list(clustering.values()) clustering_values
=20, alpha=0.7)
plt.hist(clustering_values, bins"Clustering Coefficient")
plt.xlabel("Frequency")
plt.ylabel("Clustering Coefficient Distribution")
plt.title(True)
plt.grid(
# Show the histogram
plt.show()
Note that nx.clustering(g) provides the clustering for each node. So you can explore what is the clustering coefficient of specific nodes, for example:
= '87'
node_id = nx.clustering(g, node_id)
cc_node print(f"Clustering coefficient for Node {node_id}: {cc_node}")
Clustering coefficient for Node 87: 0.16666666666666666
Now we know the distribution of the clustering coefficient, let’s compute the average clustering coefficient:
= nx.average_clustering(g)
average_cc print(f"The average clustering coefficient of this network is {average_cc}")
The average clustering coefficient of this network is 0.1682917533972737
Lastly, let’s compute network density:
= nx.density(g)
density print(f"The density of this network is {density}")
The density of this network is 0.08551745571324977
Quick quizz:
What is the meaning of a high clustering coefficient?
What is the meaning of a low clustering coefficient?
How is the density of a network related to clustering coefficient?
How does the average clustering coefficient varies by each network model: random, BA, and WS?
What insights could you obtain about a network by only knowing that its average clustering coefficient is 0.46 and it has a 30% of random edges?
Lastly, describe what is the usefulness of the three network metrics for network analysis.
Network properties of different networks
Erdos-Renyi random graph
= 200 num_nodes
# Probability of edge creation (adjust and have fun learning!)
= 0.2 probability
= nx.erdos_renyi_graph(num_nodes, probability) er
= plt.figure(figsize=(6,6))
fig =40) nx.draw(er, node_size
Watts-Strogatz small-world graph
= 200 num_nodes
# Number of nearest neighbors to connect
= 4 k
# Probability of rewiring each edge
= 0.1 p
= nx.watts_strogatz_graph(num_nodes, k, p) ws
= plt.figure(figsize=(6,6))
fig =40) nx.draw(ws, node_size
Barabási-Albert graph
= 200 num_nodes
# Number of edges to attach from a new node to existing nodes ~ Hubs!
= 3 m
= nx.barabasi_albert_graph(num_nodes, m) ab
= plt.figure(figsize=(6,6))
fig =40) nx.draw(ab, node_size
Moving on:
Compute the network properties studied here for each of these networks.
Describe whether the three central metrics can be characteristic of each model.
Discuss the limitations of the metrics and what other metrics could be computed to understand the network and nodes’ relationships in more detail.