Multiplex Link Prediction (MAGMA)
This page now hosts the data and code for the newer version of our multiplex link predictor. For the old MAGMA framework, scroll down.
This is the data and code to reproduce the results of the paper “Fast Multiplex Graph Association Rules for Link Prediction”.
Simply run all the Python scripts in order to generate all the data at the bases of the figures, tables, and statistics included in the paper. None of the scripts require setting any parameter. We also provide *.gp files that, if run via Gnuplot, will reconstruct the figures as they appear in the paper.
We tested the code on a conda 22.9.0 environment running Python 3.10.6, with 32 CPUs and 192GB of memory — although every dataset but Pardus only require at most 32. You will need to install the following dependencies:
- numpy 1.23.4
- scipy 1.9.3
- scikit-learn 1.1.2
- pandas 1.5.1
- graph_tool 2.45
- orange3 3.33.0
- networkx 2.8.7
- tensorflow 2.10.0
- gensim 4.2.0
- simanneal 0.5.0
Everything is available and installable via conda-forge. You’re also going to need:
- Infomap version 1.9.0, available at https://github.com/mapequation/infomap/archive/refs/tags/v1.9.0.zip. I’m including my binary, but that likely won’t work on your system, unless you can spin a virtual machine compatible with mine (Intel Xeon W-11955M running Ubuntu 22.04, gcc version 11.3.0).
- A working Java installation. Everything has been tested on openjdk version 11.0.16.
- CUDA drivers version 11.8.
The package contains the following:
- “data” folder: this is all the data we used for the paper. Each dataset has its own folder. For the aarhus, phys, and celegans datasets, the folder contains one file group per fold used in the k-fold validation. The fold is identified by the number in the file name. Each file group includes:
- [network_name][fold_id]: the tab-separated edge list of the training set, one edge per line, three columns: source node, target node, layer. Note that the aarhus network is undirected.
- [network_name][fold_id]_nodeattrs: the information about the node labels, one node pe rline in two tab-separated columns: node id and node label.
- [network_name][fold_id].gspan: the training set in gspan format, starting with the node list (“v nodeid nodelabel”) followed by the edge list (“e src_node_id trg_node_id layer”).
- [network_name][fold_id]_test: the tab-separated edge list of the test set, one edge per line, four columns: source node, target node, layer, fold. Note that the aarhus network is undirected.
- There are two versions for each fold. If the network name contains the “_sl” part, it means that the files refer to a single layer projection of the network.
- The Aarhus network contains the additional “*_complete” files, which contain the complete version of the network, without a training-test split.
- The Pardus network contains the same types of files, but since we perform a temporal split rather than a k-fold cross validation, there is a single training file set.
- The Pardus network test set is also split in “oldold” and “oldnew” versions to distinguish the test sets used for these two different tasks — see paper for more details.
- “libraries” folder: this is a set of code files needed to run our experiments. It includes:
- “debacco” subfolder: the code necessary to run the debacco predictor. Code from https://github.com/cdebacco/MultiTensor. Since we use an old version that we needed to patch to make it work with Python 3 and we have untested the new version, we decided to repackage it for ease of reproducibility purposes**.
- “MELL” subfolder: the code necessary to run the MELL predictor. Code from https://github.com/ryutamatsuno/MELL, included for ease of reproducibility purposes**.
- “infomap” binary: the code to run the infomap algorithm version 1.9.0. This won’t likely work on your system, but it is useful for you to know where to place the binary (and how to name it) to make everything work.
- “magma.py”: the python package including the old version of our link prediction framework.
- “mlayerlinkpred.py”: the python package providing our implementation of all multilayer link prediction methods we compare with.
- “moss_new.jar”: the JAR of the new version of our multilayer link predictor. It should work out of the box if you have the correct Java version installed.
- “moss_old.jar”: the JAR of the old version of our multilayer link predictor. It should work out of the box if you have the correct Java version installed.
- “node2vec.py”: the python package providing the Node2Vec implementation. Code from https://github.com/aditya-grover/node2vec.
- *.py: These are the python scripts that run all of our experiments. The output of the scripts provides data for the figures referred in the script’s filename (and the produced filenames). No parameter needed. The following command will work and take some hours / days to finish (depending on your hardware):
for i in *.py; do python3 $i; done;
Note that the scripts make heavy use of the multiprocessing library, which might cause some incompatibilities between different OSes. All scripts are tested and work fine in Ubuntu 22.04.
- *.gp: These are Gnuplot scripts that take the outputs from the python script and generate the figures (provided you have Gnuplot installed in your system). No parameter needed. Once the previous command has finished, run the following to get all the figures:
for i in *.gp; do gnuplot $i; done;
** Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
This is the data and code necessary to reproduce the results of the paper “Multiplex Graph Association Rules for Link Prediction”.
The archive contains two folders: “code” and “data”.
Code folder:
There are many important dependencies that you’ll need to satisfy in order to launch the code:
- To run MAGMA, you’ll need to download moss.jar from http://www.borgelt.net/moss.html
- To run Jalili, you’ll need to have in the running folder the infomap binary, which you can compile from https://www.mapequation.org/code.html
- To run De Bacco, you’ll need to download the code from https://github.com/cdebacco/MultiTensor and unzip the archive in a subfolder named “debacco”.
The Python dependencies are as follows:
- You need BOTH Python 2 AND Python 3. Python 2 is only necessary to run De Bacco. If you translate De Bacco’s code to Python 3 (and make the proper adjustments to the library, which invokes the code via the subprocess command), you’ll only need Python 3.
- You need the following Python libraries:
- Orange
- Numpy
- Scipy
- Scikit-Learn
- Pandas
- Networkx
- Graph-tool
An example on how to run all algorithms is in the script “00_test.py” in the code folder.
MAGMA can take a long time to run, especially the rule scoring phase. For this reason, you can do the rule scoring in batches. Run the first time using the rule_bounds parameters to skip the scoring phase. Then re-run multiple times setting the “run” parameter to False and specifying the rule_bounds parameters to only take into account an interval of rule IDs.
Data folder:
It contains one subfolder per dataset. Each subfolder contains a file with the edgelist and the node attributes of each network. The edgelist has three tab-separated columns: source node, target node, layer. Note that the Aarhus network is undirected, so it doesn’t make a difference between source and target nodes in the edges. The node attribute file has two tab-separated columns: node id and node label. The node attribute file is necessary to run MAGMA.
The pardus folder also contains a third file: pardus_newedges. This is the edgelist (in the same format as above) containing the new edges appearing at the second observation. The other datasets are not evolving, thus if you want to reproduce the experiments results you’ll have to divide them in train-test sets using ten-fold cross validation (thus generating your own “x_newedges” file, where “x” is the name of the network file used as input).