Michele Coscia

2020
Coscia, M., et al., 2020. The Node Vector Distance Problem in Complex Networks. ACM Computing Surveys , 53 (6). Publisher's VersionAbstract

We describe a problem in complex networks we call the Node Vector Distance (NVD) problem, and we survey algorithms currently able to address it. Complex networks are a useful tool to map a non-trivial set of relationships among connected entities, or nodes. An agent—e.g., a disease—can occupy multiple nodes at the same time and can spread through the edges. The node vector distance problem is to estimate the distance traveled by the agent between two moments in time. This is closely related to the Optimal Transportation Problem (OTP), which has received attention in fields such as computer vision. OTP solutions can be used to solve the node vector distance problem, but they are not the only valid approaches. Here, we examine four classes of solutions, showing their differences and similarities both on synthetic networks and real world network data. The NVD problem has a much wider applicability than computer vision, being related to problems in economics, epidemiology, viral marketing, and sociology, to cite a few. We show how solutions to the NVD problem have a wide range of applications, and we provide a roadmap to general and computationally tractable solutions. We have implemented all methods presented in this article in a publicly available open source library, which can be used for result replication.

Knowledge Diffusion in the Network of International Business Travel
Coscia, M., Neffke, F. & Hausmann, R., 2020. Knowledge Diffusion in the Network of International Business Travel. Nature Human Behaviour , 4 (10). Publisher's VersionAbstract

We use aggregated and anonymized information based on international expenditures through corporate payment cards to map the network of global business travel. We combine this network with information on the industrial composition and export baskets of national economies. The business travel network helps to predict which economic activities will grow in a country, which new activities will develop and which old activities will be abandoned. In statistical terms, business travel has the most substantial impact among a range of bilateral relationships between countries, such as trade, foreign direct investments and migration. Moreover, our analysis suggests that this impact is causal: business travel from countries specializing in a specific industry causes growth in that economic activity in the destination country. Our interpretation of this is that business travel helps to diffuse knowledge, and we use our estimates to assess which countries contribute or benefit the most from the diffusion of knowledge through global business travel.

Additional content:

2018
Popularity Spikes Hurt Future Chances For Viral Propagation of Protomemes
Coscia, M., 2018. Popularity Spikes Hurt Future Chances For Viral Propagation of Protomemes. Communications of the ACM , 61 (1) , pp. 70-77. Publisher's VersionAbstract
A meme is a concept introduced by Dawkins12 as an equivalent in cultural studies of a gene in biology. A meme is a cultural unit, perhaps a joke, musical tune, or behavior, that can replicate in people's minds, spreading from person to person. During the replication process, memes can mutate and compete with each other for attention, because people's consciousness has finite capacity. Meme viral spreading causes behavioral change, for the better, as when, say, the "ALS Bucket Challenge" meme caused a cascade of humanitarian donations,a and for the worse, as when researchers proved obesity7 and smoking8 are socially transmittable diseases. A better theory of meme spreading could help prevent an outbreak of bad behaviors and favor positive ones.
Coscia, M., 2018. Using arborescences to estimate hierarchicalness in directed complex networks. PLoS ONE , 13 (1). Publisher's VersionAbstract
Complex networks are a useful tool for the understanding of complex systems. One of the emerging properties of such systems is their tendency to form hierarchies: networks can be organized in levels, with nodes in each level exerting control on the ones beneath them. In this paper, we focus on the problem of estimating how hierarchical a directed network is. We propose a structural argument: a network has a strong top-down organization if we need to delete only few edges to reduce it to a perfect hierarchy—an arborescence. In an arborescence, all edges point away from the root and there are no horizontal connections, both characteristics we desire in our idealization of what a perfect hierarchy requires. We test our arborescence score in synthetic and real-world directed networks against the current state of the art in hierarchy detection: agony, flow hierarchy and global reaching centrality. These tests highlight that our arborescence score is intuitive and we can visualize it; it is able to better distinguish between networks with and without a hierarchical structure; it agrees the most with the literature about the hierarchy of well-studied complex systems; and it is not just a score, but it provides an overall scheme of the underlying hierarchy of any directed complex network.
Birds of a feather scam together: Trustworthiness homophily in a business network
Barone, M. & Coscia, M., 2018. Birds of a feather scam together: Trustworthiness homophily in a business network. Social Networks , 54 (July 2018) , pp. 228-237. Publisher's VersionAbstract
Estimating the trustworthiness of a set of actors when all the available information is provided by the actors themselves is a hard problem. When two actors have conflicting reports about each other, how do we establish which of the two (if any) deserves our trust? In this paper, we model this scenario as a network problem: actors are nodes in a network and their reports about each other are the edges of the network. To estimate their trustworthiness levels, we develop an iterative framework which looks at all the reports about each connected actor pair to define its trustworthiness balance. We apply this framework to a customer/supplier business network. We show that our trustworthiness score is a significant predictor of the likelihood a business will pay a fine if audited. We show that the market network is characterized by homophily: businesses tend to connect to partners with similar trustworthiness degrees. This suggests that the topology of the network influences the behavior of the actors composing it, indicating that market regulatory efforts should take into account network theory to prevent further degeneration and failures.
Kosack, S., et al., 2018. Functional structures of US state governments. Proceedings of the National Academy of Sciences of the United States of America. Publisher's VersionAbstract

Governments in modern societies undertake an array of complex functions that shape politics and economics, individual and group behavior, and the natural, social, and built environment. How are governments structured to execute these diverse responsibilities? How do those structures vary, and what explains the differences? To examine these longstanding questions, we develop a technique for mapping Internet “footprint” of government with network science methods. We use this approach to describe and analyze the diversity in functional scale and structure among the 50 US state governments reflected in the webpages and links they have created online: 32.5 million webpages and 110 million hyperlinks among 47,631 agencies. We first verify that this extensive online footprint systematically reflects known characteristics: 50 hierarchically organized networks of state agencies that scale with population and are specialized around easily identifiable functions in accordance with legal mandates. We also find that the footprint reflects extensive diversity among these state functional hierarchies. We hypothesize that this variation should reflect, among other factors, state income, economic structure, ideology, and location. We find that government structures are most strongly associated with state economic structures, with location and income playing more limited roles. Voters’ recent ideological preferences about the proper roles and extent of government are not significantly associated with the scale and structure of their state governments as reflected online. We conclude that the online footprint of governments offers a broad and comprehensive window on how they are structured that can help deepen understanding of those structures.

Visualizations and datasets available on project website >>

 

government-structure-paper.pdf
Mapping the International Health Aid Community using Web Data
Coscia, M., et al., 2018. Mapping the International Health Aid Community using Web Data. EPJ Data Science , 7:12. Publisher's VersionAbstract
International aid is a complex system: it involves different issues, countries, and donors. In this paper, we use web crawling to collect information about the activities of international aid organizations on different health-related topics and network analysis to depict this complex system of relationships among organizations. By systematically collecting co-occurrences of issues, countries, and organization names from more than a hundred websites, we are able to construct multilayer networks describing, for instance, which issues are related to each other according to which organizations. Our results show that there is a surprising amount of homophily among organizations: organizations of the same type (multilateral, bilateral, private donors, etc.) tend to be co-cited in groups. We also create a taxonomy of issues that are generally mentioned together. Finally, we perform simulations, showing that messages originating from different organizations in the international aid community can have a different reach.
2017
Coscia, M. & Neffke, F., 2017. Network Backboning with Noisy Data. 2017 IEEE 33rd International Conference on Data Engineering (ICDE) , (May) , pp. 425-436. Publisher's VersionAbstract
Networks are powerful instruments to study complex phenomena, but they become hard to analyze in data that contain noise. Network backbones provide a tool to extract the latent structure from noisy networks by pruning non-salient edges. We describe a new approach to extract such backbones. We assume that edge weights are drawn from a binomial distribution, and estimate the error-variance in edge weights using a Bayesian framework. Our approach uses a more realistic null model for the edge weight creation process than prior work. In particular, it simultaneously considers the propensity of nodes to send and receive connections, whereas previous approaches only considered nodes as emitters of edges. We test our model with real world networks of different types (flows, stocks, cooccurrences, directed, undirected) and show that our Noise-Corrected approach returns backbones that outperform other approaches on a number of criteria. Our approach is scalable, able to deal with networks with millions of edges.
Coscia, M., Cheston, T. & Hausmann, R., 2017. Institutions vs. Social Interactions in Driving Economic Convergence: Evidence from Colombia.Abstract

Are regions poor because they have bad institutions or are they poor because they are disconnected from the social channels through which technology diffuses? This paper tests institutional and technological theories of economic convergence by looking at income convergence across Colombian municipalities. We use formal employment and wage data to estimate growth of income per capita at the municipal level. In Colombia, municipalities are organized into 32 departamentos or states. We use cellphone metadata to cluster municipalities into 32 communication clusters, defined as a set of municipalities that are densely connected through phone calls. We show that these two forms of grouping municipalities are very different. We study the effect on municipal income growth of the characteristics of both the state and the communication cluster to which the municipality belongs. We find that belonging to a richer communication cluster accelerates convergence, while belonging to a richer state does not. This result is robust to controlling for state fixed effects when studying the impact of communication clusters and vice versa. The results point to the importance of social interactions rather than formal institutions in the growth process.

 

colombia_convergence_cidwp_331.pdf
2016
Coscia, M., Hausmann, R. & Neffke, F., 2016. Exploring the Uncharted Export: An Analysis of Tourism-Related Foreign Expenditure with International Spend Data.Abstract

Tourism is one of the most important economic activities in the world: for many countries it represents the single largest product in their export basket. However, it is a product difficult to chart: "exporters" of tourism do not ship it abroad, but they welcome importers inside the country. Current research uses social accounting matrices and general equilibrium models, but the standard industry classifications they use make it hard to identify which domestic industries cater to foreign visitors. In this paper, we make use of open source data and of anonymized and aggregated transaction data giving us insights about the spend behavior of foreigners inside two countries, Colombia and the Netherlands, to inform our research. With this data, we are able to describe what constitutes the tourism sector, and to map the most attractive destinations for visitors. In particular, we find that countries might observe different geographical tourists' patterns - concentration versus decentralization -; we show the importance of distance, a country's reported wealth and cultural affinity in informing tourism; and we show the potential of combining open source data and anonymized and aggregated transaction data on foreign spend patterns in gaining insight as to the evolution of tourism from one year to another.

tourism_cid_wp_328.pdf
2015
Coscia, M., Neffke, F. & Lora, E., 2015. Report on the Poblacion Flotante of Bogota.Abstract

In this document we describe the size of the Poblacion Flotante of
Bogota (D.C.). The Poblacion Flotante is composed by people who live
outside Bogota (D.C.), but who rely on the city for performing their job.
We estimate the Poblacion Flotante impact relying on a new data source
provided by telecommunications operators in Colombia, which enables us
to estimate how many people commute daily from every municipality of
Colombia to a specic area of Bogota (D.C.). We estimate that the size of
the Poblacion Flotante could represent a 5.4% increase of Bogota (D.C.)'s
population. During weekdays, the commuters tend to visit the city center
more.

rf_wp_67.pdf
Evidence That Calls-Based and Mobility Networks Are Isomorphic
Coscia, M. & Hausmann, R., 2015. Evidence That Calls-Based and Mobility Networks Are Isomorphic. PLOS One , 10. Publisher's VersionAbstract

Social relations involve both face-to-face interaction as well as telecommunications. We can observe the geography of phone calls and of the mobility of cell phones in space. These two phenomena can be described as networks of connections between different points in space. We use a dataset that includes billions of phone calls made in Colombia during a six-month period. We draw the two networks and find that the call-based network resembles a higher order aggregation of the mobility network and that both are isomorphic except for a higher spatial decay coefficient of the mobility network relative to the call-based network: when we discount distance effects on the call connections with the same decay observed for mobility connections, the two networks are virtually indistinguishable.

2014
Pennacchioli, D., et al., 2014. The retail market as a complex system. EPJ Data Science , 3 (33). Publisher's VersionAbstract

Aim of this paper is to introduce the complex system perspective into retail market analysis. Currently, to understand the retail market means to search for local patterns at the micro level, involving the segmentation, separation and profiling of diverse groups of consumers. In other contexts, however, markets are modelled as complex systems. Such strategy is able to uncover emerging regularities and patterns that make markets more predictable, e.g. enabling to predict how much a country’s GDP will grow. Rather than isolate actors in homogeneous groups, this strategy requires to consider the system as a whole, as the emerging pattern can be detected only as a result of the interaction between its self-organizing parts. This assumption holds also in the retail market: each customer can be seen as an independent unit maximizing its own utility function. As a consequence, the global behaviour of the retail market naturally emerges, enabling a novel description of its properties, complementary to the local pattern approach. Such task demands for a data-driven empirical framework. In this paper, we analyse a unique transaction database, recording the micro-purchases of a million customers observed for several years in the stores of a national supermarket chain. We show the emergence of the fundamental pattern of this complex system, connecting the products’ volumes of sales with the customers’ volumes of purchases. This pattern has a number of applications. We provide three of them. By enabling us to evaluate the sophistication of needs that a customer has and a product satisfies, this pattern has been applied to the task of uncovering the hierarchy of needs of the customers, providing a hint about what is the next product a customer could be interested in buying and predicting in which shop she is likely to go to buy it.

 

 

s13688-014-0033-x.pdf
2013
The Atlas of Economic Complexity: Mapping Paths to Prosperity
Hausmann, R., et al., 2013. The Atlas of Economic Complexity: Mapping Paths to Prosperity 2nd ed., Cambridge: MIT Press. Publisher's VersionAbstract

From the foreword:

It has been two years since we published the first edition of The Atlas of Economic Complexity. "The Atlas," as we have come to refer to it, has helped extend the availability of tools and methods that can be used to study the productive structure of countries and its evolution.

Many things have happened since the first edition of The Atlas was released at CID's Global Empowerment Meeting, on October 27, 2011. The new edition has sharpened the theory and empirical evidence of how knowhow affects income and growth and how knowhow itself grows over time. In this edition, we also update our numbers to 2010, thus adding two more years of data and extending our projections. We also undertook a major overhaul of the data. Sebastián Bustos and Muhammed Yildirim went back to the original sources and created a new dataset that significantly improves on the one used for the 2011 edition. They developed a new technique to clean the data, reducing inconsistencies and the problems caused by misreporting. The new dataset provides a more accurate estimate of the complexity of each country and each product. With this improved dataset, our results are even stronger.

All in all, the new version of The Atlas provides a more accurate picture of each country’s economy, its "adjacent possible" and its future growth potential.

For up-to-date datasets and new visualizations, visit atlas.cid.harvard.edu.

 

atlas_2013_part1.pdf
Coscia, M., Hausmann, R. & Hidalgo, C.A., 2013. The Structure and Dynamics of International Development Assistance. Journal of Globalization and Development , 3 (2) , pp. 1-42. Publisher's VersionAbstract

We study the structure of international aid coordination by creating and analyzing a tripartite network of donor organizations, recipient countries and development issues using web-based information. We develop a measure of coordination and find that it is moderate, achieving about 60% of its theoretical maximum. Many countries are strongly connected to organizations that are related to the issues that are salient there. Nevertheless, we identify many countries that are poorly served, issues that are inadequately attended to, and organizations that focus on the wrong combination of places and issues. Our approach may be used to improve decentralized coordination.

2011
The Atlas of Economic Complexity: Mapping Paths to Prosperity
Hausmann, R., et al., 2011. The Atlas of Economic Complexity: Mapping Paths to Prosperity,Abstract

Over the past two centuries, mankind has accomplished what used to be unthinkable. When we look back at our long list of achievements, it is easy to focus on the most audacious of them, such as our conquest of the skies and the moon. Our lives, however, have been made easier and more prosperous by a large number of more modest, yet crucially important feats. Think of electric bulbs, telephones, cars, personal computers, antibiotics, TVs, refrigerators, watches and water heaters. Think of the many innovations that benefit us despite our minimal awareness of them, such as advances in port management, electric power distribution, agrochemicals and water purification. This progress was possible because we got smarter. During the past two centuries, the amount of productive knowledge we hold expanded dramatically. This was not, however, an individual phenomenon. It was a collective phenomenon. As individuals we are not much more capable than our ancestors, but as societies we have developed the ability to make all that we have mentioned – and much, much more.

For up-to-date datasets and new visualizations, visit atlas.cid.harvard.edu.

HarvardMIT_Atlas2011.pdf