Benchmark for Network Sampling
Benchmark for Network Sampling
implemented by me
This archive contains the code for performing network sampling benchmark. It is the companion material for my paper with Luca Rossi “Benchmarking API Costs of Network Sampling Strategies”.
The library depends on the igraph Python package. Make sure you install it first: http://igraph.org/python/
To use the code, simply place the Python file into your working directory or in your path and import it. For instance, if you want to extract a directed sample using BFS exploration with a budget of 10000 seconds for an API giving 50 edges per page and one query every 5 seconds, this is the minimal code to do it:
import igraph import network_sampling as ns G = igraph.Graph.Read_Edgelist("path/to/network/file.ext", directed = True) api_policy = ns.APIPolicy(edges_page_size = 50, seconds_per_call = 5) api_engine = ns.APIEngine(G, api_policy) G_smpl = ns.bfs(api_engine, seed = 0, budget = 10000, directed = True)
You can easily define new API policies in your code.
The library is supposed to be extended. You can add new attributes to the API policy object (e.g. adding support for querying node attributes), and implement new network sampling methods. To be a valid sampling method, your function must:
– Start the exploration from a given seed;
– End the exploration when no further move is possible given the budget;
– Use the request_neighbors function of the APIEngine object to explore the graph;
– Return the Graph sample as a directed igraph object if the “directed” flag was set to True.
Sample networks containing the synthetic inputs used in the paper are available to test your methods are available here.