07 July 2016 ~ 0 Comments

Building Data-Driven Development

A few weeks ago I had to honor to speak at my group’s  “Global Empowerment Meeting” about my research on data science and economic development. I’m linking here the Youtube video of my talk and my transcript for those who want to follow it. The transcript is not 100% accurate given some last minute edits — and the fact that I’m a horrible presenter 🙂 — but it should be good enough. Enjoy!


We think that the big question of this decade is on data. Data is the building blocks of our modern society. We think in development we are not currently using enough of these blocks, we are not exploiting data nearly as much as we should. And we want to fix that.

Many of the fastest growing companies in the world, and definitely the ones that are shaping the progress of humanity, are data-intensive companies. Here at CID we just want to add the entire world to the party.

So how do we do it? To fix the data problem development literature has, we focus on knowing how the global knowledge building looks like. And we inspect three floors: how does knowledge flow between countries? What lessons can we learn inside these countries? What are the policy implications?

To answer these questions, we were helped by two big data players. The quantity and quality of the data they collect represent a revolution in the economic development literature. You heard them speaking at the event: they are MasterCard – through their Center for Inclusive Growth – and Telefonica.

Let’s start with MasterCard, they help us with the first question: how does knowledge flow between countries? Credit card data answer to that. Some of you might have a corporate issued credit card in your wallet right now. And you are here, offering your knowledge and assimilating the knowledge offered by the people sitting at the table with you. The movements of these cards are movements of brains, ideas and knowledge.

When you aggregate this at the global level you can draw the map of international knowledge exchange. When you have a map, you have a way to know where you are and where you want to be. The map doesn’t tell you why you are where you are. That’s why CID builds something better than a map.

We are developing a method to tell why people are traveling. And reasons are different for different countries: equity in foreign establishments like the UK, trade partnerships like Saudi Arabia, foreign greenfield investments like Taiwan.

Using this map, it is easy to envision where you want to go. You can find countries who have a profile similar to yours and copy their best practices. For Kenya, Taiwan seems to be the best choice. You can see that, if investments drive more knowledge into a country, then you should attract investments. And we have preliminary results to suggest whom to attract: the people carrying the knowledge you can use.

The Product Space helps here. If you want to attract knowledge, you want to attract the one you can more easily use. The one connected to what you already know. Nobody likes to build cathedrals in a desert. More than having a cool knowledge building, you want your knowledge to be useful. And used.

There are other things you can do with international travelers flows. Like tourism. Tourism is a great export: for many countries it is the first export. See these big portion of the exports of Zimbabwe or Spain? For them tourism would look like this.

Tourism is hard to pin down. But it is easier with our data partners. We can know when, where and which foreigners spend their money in a country. You cannot paint pictures as accurate as these without the unique dataset MasterCard has.

Let’s go to our second question: what lessons can we learn from knowledge flows inside a country? Telefonica data is helping answering this question for us. Here we focus on a test country: Colombia. We use anonymized call metadata to paint the knowledge map of Colombia, and we discover that the country has its own knowledge departments. You can see them here, where each square is a municipality, connecting to the ones it talks to. These departments correlate only so slightly with the actual political boundaries. But they matter so much more.

In fact, we asked if these boundaries could explain the growth in wages inside the country. And they seem to be able to do it, in surprisingly different ways. If you are a poor municipality in a rich state in Colombia, we see your wage growth penalized. You are on a path of divergence.

However, if you are a poor municipality and you talk to rich ones, we have evidence to show that you are on a path of convergence: you grow faster than you expect to. Our preliminary results seem to suggest that being in a rich knowledge state matters.

So, how do you use this data and knowledge? To do so you have to drill down at the city level. We look not only at communication links, but also at mobility ones. We ask if a city like Bogota is really a city, or different cities in the same metropolitan area. With the data you can draw four different “mobility districts”, with a lot of movements inside them, and not so many across them.

The mobility districts matter, because combining mobility and economic activities we can map the potential of a neighborhood, answering the question: if I live here, how productive can I be? A lot in the green areas, not so much in the red ones.

With this data you can reshape urban mobility. You know where the entrance barriers to productivity are, and you can destroy them. You remodel your city to include in its productive structure people that are currently isolated by commuting time and cost. These people have valuable skills and knowhow, but they are relegated in the informal sector.

So, MasterCard data told us how knowledge flows between countries. Telefonica data showed the lessons we can learn inside a country. We are left with the last question: what are the policy implications?

So far we have mapped the landscape of knowledge, at different levels. But to hike through it you need a lot of equipment. And governments provide part of that equipment. Some in better ways than others.

To discover the policy implications, we unleashed a data collector program on the Web. We wanted to know how the structure of the government in the US looks like. Our program returned us a picture of the hierarchical organization of government functions. We know how each state structures its own version of this hierarchy. And we know how all those connections fit together in the union, state by state. We are discovering that the way a state government is shaped seems to be the result of two main ingredients: where a state is and how its productive structure looks like.

We want to establish that the way a state expresses its government on the Web reflects the way it actually performs its functions. We seem to find a positive answer: for instance having your environmental agencies to talk with each other seems to work well to improve your environmental indicators, as recorded by the EPA. Wiring organization when we see positive feedback and rethinking them when we see a negative one is a direct consequence of this Web investigation.

I hope I was able to communicate to you the enthusiasm CID discovered in the usage of big data. Zooming out to gaze at the big picture, we start to realize how the knowledge building looks like. As the building grows, so does our understanding of the world, development and growth. And here’s the punchline of CID: the building of knowledge grows with data, but the shape it takes is up to what we make of this data. We chose to shape this building with larger doors, so that it can be used to ensure a more inclusive world.


By the way, the other presentations of my session were great, and we had a nice panel after that. You can check out the presentations in the official Center for International Development Youtube channel. I’m embedding the panel’s video below:

Continue Reading

15 April 2013 ~ 0 Comments

Aid 2.0

After the era of large multinational empires (British, Spanish, Portuguese  French), the number of sovereign states exploded. The international community realized that many states were being left behind in their development efforts. A new problem, international development, was created and nobody really had a clue about how to solve it. Eventually, the solution started by international organizations such as the UN or the World Bank culminated on the Millennium Development Goals (MDGs): a set of general objectives that humanity decided to achieve. The MDGs are obviously very noble. Nobody can argue against eradicating hunger or promoting gender equality. The real problem is that the logic that produced them is quite flawed. Some thousands of people met around 2000 and decided that those eight points were the most important global issues. That was probably even true, but what about particular countries, where none of the eight MDGs is crucial, but a ninth is? More importantly: why the hell am I talking about this?

I am talking about this because, not surprisingly, network science can provide a useful perspective on this topic. And it did, in a paper that I co-authored with Ricardo Hausmann and César Hidalgo, at the Center for International Development in Boston. In the paper we explain that the logic behind MDGs is a classical top-down, or strictly hierarchical, one: there are few centers where all information is collected and these centers direct all efforts towards the most important problems. This implies that (see the above picture):

  1. The information generated at the bottom level passes through several steps to get to the top, in a perverted telephone game where some information is lost and some noise is introduced;
  2. If some organization at the bottom level wants to coordinate with somebody else at the same level, it has to pass through several levels even before starting, instead of just creating a direct link.

In this world, if all funds for health are allocated to fighting HIV and child mortality, countries that do not have these problems but face, say, a cholera or a malaria epidemic are doomed to be left behind.

What it is really necessary is a mechanism with which aid organizations can self-organize, by focusing on the issues they are related to and on the places where they are really needed, without broad and inefficient programs. In this world, a small world, everybody can establish a weak link to connect to anybody else, instead of relying on a cumbersome hierarchy. In an editorial in the Financial Times, Ricardo Hausmann used the Encyclopedia Britannica as a metaphor for representing the top-down approach of the MDGs, against the Wikipedia of a self-organized and distributed system.

The question now is: is it really possible to enable the self-organization of international aid? Or: how do we know what country is related to what development issue, and which organization has an expertise on it? Well, it is not an easy question to answer, but in our paper we try to address it. In the paper we describe a system, based on web crawling (i.e. systematically downloading web pages), that capture the number of times each aid organization mentions an issue or a country in its public documents. That is no different from what Google does with the entire web: creating a global knowledge index that is at your fingertips.

Using this strategy, we can create network maps, like the one above (click to see a higher resolution version), to understand what is the current structure of aid development. We are also able to match aid organizations, developing countries and development issues according to how closely they are related to each other. The possible combinations are still quite high, so to actually use our results it is necessary to create a nice visualization tool. And that’s another thing we did: the Aid Explorer (developed and designed by yours truly).

In the Aid Explorer you can confront organizations, countries and issues and see if they are coordinating as they should. For example, you can check what are the issues related to Nordic Fund. Apparently, Microenterprise is a top priority. So, you can check how Nordic Fund relates to countries, according to how they are related to Microenterprise. That’s a good positive correlation! It means that indeed the Nordic Fund really relates most to the countries that are very related to Microenterprise. If we would have found a negative correlation that would have been bad, because it would have meant that Nordic Fund relates with the wrong countries. A general picture over all issues (or over all countries) of Nordic Fund can also be generated. Summing up these general pictures, we can generate rankings of organizations, countries and issues: the more high relevance and high correlation we observe together, the better.

Hopefully, this is the first step toward an ever more powerful Aid Explorer, that can help organizations to get the maximum bang for their buck and countries to get more visibility for their peculiar issues, without being overlooked by the international community because they are not acting in line with the MDG agenda.

Continue Reading