The Node Vector Distance Problem in Complex Networks

We describe a problem in complex networks we call the Node Vector Distance (NVD) problem, and we survey algorithms currently able to address it. Complex networks are a useful tool to map a non-trivial set of relationships among connected entities, or nodes. An agent—e.g., a disease—can occupy multiple nodes at the same time and can spread through the edges. The node vector distance problem is to estimate the distance traveled by the agent between two moments in time. This is closely related to the Optimal Transportation Problem (OTP), which has received attention in fields such as computer vision. OTP solutions can be used to solve the node vector distance problem, but they are not the only valid approaches. Here, we examine four classes of solutions, showing their differences and similarities both on synthetic networks and real world network data. The NVD problem has a much wider applicability than computer vision, being related to problems in economics, epidemiology, viral marketing, and sociology, to cite a few. We show how solutions to the NVD problem have a wide range of applications, and we provide a roadmap to general and computationally tractable solutions. We have implemented all methods presented in this article in a publicly available open source library, which can be used for result replication.

Knowledge Diffusion in the Network of International Business Travel

We use aggregated and anonymized information based on international expenditures through corporate payment cards to map the network of global business travel. We combine this network with information on the industrial composition and export baskets of national economies. The business travel network helps to predict which economic activities will grow in a country, which new activities will develop and which old activities will be abandoned. In statistical terms, business travel has the most substantial impact among a range of bilateral relationships between countries, such as trade, foreign direct investments and migration. Moreover, our analysis suggests that this impact is causal: business travel from countries specializing in a specific industry causes growth in that economic activity in the destination country. Our interpretation of this is that business travel helps to diffuse knowledge, and we use our estimates to assess which countries contribute or benefit the most from the diffusion of knowledge through global business travel.

Additional content:

Popularity Spikes Hurt Future Chances For Viral Propagation of Protomemes

A meme is a concept introduced by Dawkins¹² as an equivalent in cultural studies of a gene in biology. A meme is a cultural unit, perhaps a joke, musical tune, or behavior, that can replicate in people’s minds, spreading from person to person. During the replication process, memes can mutate and compete with each other for attention, because people’s consciousness has finite capacity. Meme viral spreading causes behavioral change, for the better, as when, say, the “ALS Bucket Challenge” meme caused a cascade of humanitarian donations,^a and for the worse, as when researchers proved obesity⁷ and smoking⁸ are socially transmittable diseases. A better theory of meme spreading could help prevent an outbreak of bad behaviors and favor positive ones.

Birds of a feather scam together: Trustworthiness homophily in a business network

Estimating the trustworthiness of a set of actors when all the available information is provided by the actors themselves is a hard problem. When two actors have conflicting reports about each other, how do we establish which of the two (if any) deserves our trust? In this paper, we model this scenario as a network problem: actors are nodes in a network and their reports about each other are the edges of the network. To estimate their trustworthiness levels, we develop an iterative framework which looks at all the reports about each connected actor pair to define its trustworthiness balance. We apply this framework to a customer/supplier business network. We show that our trustworthiness score is a significant predictor of the likelihood a business will pay a fine if audited. We show that the market network is characterized by homophily: businesses tend to connect to partners with similar trustworthiness degrees. This suggests that the topology of the network influences the behavior of the actors composing it, indicating that market regulatory efforts should take into account network theory to prevent further degeneration and failures.

Network Backboning with Noisy Data

Networks are powerful instruments to study complex phenomena, but they become hard to analyze in data that contain noise. Network backbones provide a tool to extract the latent structure from noisy networks by pruning non-salient edges. We describe a new approach to extract such backbones. We assume that edge weights are drawn from a binomial distribution, and estimate the error-variance in edge weights using a Bayesian framework. Our approach uses a more realistic null model for the edge weight creation process than prior work. In particular, it simultaneously considers the propensity of nodes to send and receive connections, whereas previous approaches only considered nodes as emitters of edges. We test our model with real world networks of different types (flows, stocks, cooccurrences, directed, undirected) and show that our Noise-Corrected approach returns backbones that outperform other approaches on a number of criteria. Our approach is scalable, able to deal with networks with millions of edges.

Functional structures of US state governments

Governments in modern societies undertake an array of complex functions that shape politics and economics, individual and group behavior, and the natural, social, and built environment. How are governments structured to execute these diverse responsibilities? How do those structures vary, and what explains the differences? To examine these longstanding questions, we develop a technique for mapping the Internet “footprint” of government with network science methods. We use this approach to describe and analyze the diversity in functional scale and structure among the 50 US state governments reflected in the webpages and links they have created online: 32.5 million webpages and 110 million hyperlinks among 47,631 agencies. We first verify that this extensive online footprint systematically reflects known characteristics: 50 hierarchically organized networks of state agencies that scale with population and are specialized around easily identifiable functions in accordance with legal mandates. We also find that the footprint reflects extensive diversity among these state functional hierarchies. We hypothesize that this variation should reflect, among other factors, state income, economic structure, ideology, and location. We find that government structures are most strongly associated with state economic structures, with location and income playing more limited roles. Voters’ recent ideological preferences about the proper roles and extent of government are not significantly associated with the scale and structure of their state governments as reflected online. We conclude that the online footprint of governments offers a broad and comprehensive window on how they are structured that can help deepen understanding of those structures.

Visualizations and datasets available on project website >>

Mapping the International Health Aid Community using Web Data

International aid is a complex system: it involves different issues, countries, and donors. In this paper, we use web crawling to collect information about the activities of international aid organizations on different health-related topics and network analysis to depict this complex system of relationships among organizations. By systematically collecting co-occurrences of issues, countries, and organization names from more than a hundred websites, we are able to construct multilayer networks describing, for instance, which issues are related to each other according to which organizations. Our results show that there is a surprising amount of homophily among organizations: organizations of the same type (multilateral, bilateral, private donors, etc.) tend to be co-cited in groups. We also create a taxonomy of issues that are generally mentioned together. Finally, we perform simulations, showing that messages originating from different organizations in the international aid community can have a different reach.

Institutions vs. Social Interactions in Driving Economic Convergence: Evidence from Colombia

Are regions poor because they have bad institutions or are they poor because they are disconnected from the social channels through which technology diffuses? This paper tests institutional and technological theories of economic convergence by looking at income convergence across Colombian municipalities. We use formal employment and wage data to estimate growth of income per capita at the municipal level. In Colombia, municipalities are organized into 32 departamentos or states. We use cellphone metadata to cluster municipalities into 32 communication clusters, defined as a set of municipalities that are densely connected through phone calls. We show that these two forms of grouping municipalities are very different. We study the effect on municipal income growth of the characteristics of both the state and the communication cluster to which the municipality belongs. We find that belonging to a richer communication cluster accelerates convergence, while belonging to a richer state does not. This result is robust to controlling for state fixed effects when studying the impact of communication clusters and vice versa. The results point to the importance of social interactions rather than formal institutions in the growth process.

Exploring the Uncharted Export: An Analysis of Tourism-Related Foreign Expenditure with International Spend Data

Tourism is one of the most important economic activities in the world: for many countries it represents the single largest product in their export basket. However, it is a product difficult to chart: “exporters” of tourism do not ship it abroad, but they welcome importers inside the country. Current research uses social accounting matrices and general equilibrium models, but the standard industry classifications they use make it hard to identify which domestic industries cater to foreign visitors. In this paper, we make use of open source data and of anonymized and aggregated transaction data giving us insights about the spend behavior of foreigners inside two countries, Colombia and the Netherlands, to inform our research. With this data, we are able to describe what constitutes the tourism sector, and to map the most attractive destinations for visitors. In particular, we find that countries might observe different geographical tourists’ patterns – concentration versus decentralization -; we show the importance of distance, a country’s reported wealth and cultural affinity in informing tourism; and we show the potential of combining open source data and anonymized and aggregated transaction data on foreign spend patterns in gaining insight as to the evolution of tourism from one year to another.

Report on the Poblacion Flotante of Bogota

In this document we describe the size of the Poblacion Flotante of
Bogota (D.C.). The Poblacion Flotante is composed by people who live
outside Bogota (D.C.), but who rely on the city for performing their job.
We estimate the Poblacion Flotante impact relying on a new data source
provided by telecommunications operators in Colombia, which enables us
to estimate how many people commute daily from every municipality of
Colombia to a specic area of Bogota (D.C.). We estimate that the size of
the Poblacion Flotante could represent a 5.4% increase of Bogota (D.C.)’s
population. During weekdays, the commuters tend to visit the city center
more.