Intolerance is Attractive
This is the data and code necessary to reproduce the results in the paper “Intolerance is Attractive: Low Tolerance Feedback Loops in Social Media Flagging Systems”
It consists of a collection of Python scripts and an input data table containing the popularity distribution of the pages of news sources on an Italian portion of Facebook. The other data file (“sources_factual_bias.csv”) records data from mediabiasfactcheck.com about the trustworthiness of almost 2,000 news websites.
The Python scripts require the following libraries to work: numpy 1.19.5
, scipy 1.6.0
, pandas 1.2.4
, networkx 2.5.1
, sklearn 0.24.1
.
The scripts should be run in order of their identifiers, although you can run the scripts 02, 03, and 04 in any order since they are all independent variants of the same model. Scripts 02 to 04 take a fairly bit of time to complete (around 50 seconds on a Xeon E-2286M), so you’re strongly encouraged to massively parallelize their runs, it can be done faily easily with xargs on a terminal:
awk 'BEGIN{for(run=1;run<=30;run++){for(phi=0.1;phi<=0.9;phi+=0.1){print 0.08,phi,run}}}' | xargs -n 3 -P 32 python3 02_bipolar_model.py
will run 270 simulations in a bit under 8 minutes, assuming you have 32 CPUs and enough memory to run 32 models in parallel (probably around 128 GB of RAM should suffice).
More specific instructions:
01_create_synth_data.py
creates all the data the model needs to run the simulation. It puts it in a “synth_data” folder it creates. You must specify a parameter as an integer number: this will identify the specific batch of synthetic data. To reproduce the results in the paper, you will need to create at least 30 independent batches of data. It is suggested to have an equal number of batches identified by an even and an odd identifier. After running this script, you have the outputs to reproduce Figures 1 and 2 from the paper.01_create_synth_data_rnd.py
works the same way as the other 01 script, but creates random networks rather than using realistic topologies. It is used to test the effect of the network topology on the results.02_bipolar_model.py
will run the simulations for the Bipolar model. It requires three paremeters: rho, phi, and run_id. rho is the shareability parameter and should be set to 0.08 to reproduce the results of the paper. phi is the tolerance parameter and it should be set between 0.1 and 0.9, with 0.1 increments to reproduce the results of the paper. run_id should be the identifier of the run as specified in the previous script. This will generate the flags as the result of the news item cascade, in the “results_bipolar/rho_phi/run_id.csv” file. The output file has a line per source and two columns: the polarity of the source and the number of flags it received.- 03 and 04 files are the same as 02, but require an additional parameter to be specified before run_id: delta. delta specifies how different phi_l will be from phi_r. Script 03 implements the Relative model (for which phi_l = phi_r * delta), while script 04 implements the Subtraction model (for which phi_l = phi_r – delta). The output file will have the same format as script 02, but it will be placed in the “results_modelname/delta/rho_phi/run_id.csv” path.
05_kde.py
will generate the Kernel Density Estimation for all the flags generated across all runs of a model for a specific rho-phi combination. You need to specify at least three parameters: model name, rho and phi. If the model name is not “bipolar”, then it is either of “relative” or “subtract” and you’ll also have to specify the value of delta as the last (fourth) parameter. The results will be placed in the “results_modelname/sources_rho_phi_kde.csv” or in the “results_modelname/delta/sources_rho_phi_kde.csv” depending whether the model requires delta or not. They will have two columns: a value of polarity and the KDE of the flags for that polarity. The script will also print to standard output the location of the most prominent left and right peaks. The output line will have the values of rho, phi, (optionally delta), then x and y for the left and right peak, respectively. If you run the proper parameter combinations, after running this script you have the output to reproduce Figures 3, 4, S1, S2 and Table 1 from the main paper.06_gradient_descent.py
will perform a gradient descent for each source to find their local minimum for the flag distribution. It requires the same input parameters as script 05. It provides an output in the same path, but the file will have a “gradient_move_” rather than a “sources_” prefix. The output has three columns: polarity value, original density of sources before the gradient descent, and resulting density of sources after the gradient descent. If you run the proper parameter combinations, after running this script you have the output to reproduce Figures 5 and S3 from the main paper.
In order to reproduce Figure 6 from the main paper you will have to manually make a run of the chosen model (either 03 or 04), find the delta causing the maximum shift in the source polarity average (which you can get after you run scripts 05 and 06), then set that as the new phi_l and re-run the model, this time modifying line 23 so that you will affect right users rather than left users. Repeat until you reach a cyclic phi_l-phi_r equilibrium.