Inventing modern invention: The professionalization of technological progress in the US

Over the course of the mid-19th and early 20th century, the US transformed from an agricultural economy to the frontier in technology. To study this transition, we digitize half a million pages of patent yearbooks that describe inventors, organizations and technologies on over 1.6M patents. We combine this with demographic information from US census records and information on corporate research from large-scale repeated surveys of industrial research labs. Our data reveal that in the early 1920s a new system of innovation — based on teamwork and engineers — started to rapidly replace the existing craftsmanship-based invention that had dominated innovation in the 19th century. We argue that this new system relied on an organizational innovation: industrial research labs. These labs supported high-skill teamwork, replacing the collaborations within families with professional ties in firms and industrial research labs. The systemic shift in innovation had far-reaching consequences: it changed the division of labor in invention, led to an explosion of novelty and teamwork, and reshaped the geography of innovation in the US.

For a deeper dive into the research and visuals, explore this analysis by the Complexity Science Hub.

Bridging the short-term and long-term dynamics of economic structural change

Economic development hinges on structural change, that is, transformations in what an economy produces. The field of economic complexity has investigated this process through two related but distinct branches: one studying how economies diversify, the other how the complexity of an economy is reflected in its output. However, a formal connection between these approaches, and their relationship to classic accounts of structural transformation (for example, from agriculture to manufacturing), remains unclear. Here we introduce a simple dynamical model that links these perspectives through one core idea: economies diversify preferentially into activities related to those they already do. Studying this model yields three main results: It generates quantities resembling economic complexity metrics, suggests these metrics summarize long-term structural change rather than directly infer an economy’s complexity, and reproduces stylized facts of development. Our framework formally connects the field’s conceptual strands, bridges short and long timescales of change, and adds granularity to classic descriptions of development.

The Coherence of US Cities

Diversified economies are critical for cities to sustain their growth and development, but they are also costly because diversification often requires expanding a city’s capability base. We analyze how cities manage this trade-off by measuring the coherence of the economic activities they support, defined as the technological distance between randomly sampled productive units in a city. We use this framework to study how the US urban system developed over almost two centuries, from 1850 to today. To do so, we rely on historical census data, covering over 600M individual records to describe the economic activities of cities between 1850 and 1940, as well as 8 million patent records and detailed occupational and industrial profiles of cities for more recent decades. Despite massive shifts in the economic geography of the United States over this 170-year period, average coherence in its urban system remains unchanged. Moreover, across different time periods, datasets, and relatedness measures, coherence falls with city size at the exact same rate, pointing to constraints to diversification that are governed by a city’s size in universal ways.

From Products to Capabilities: Constructing a Genotypic Product Space

Economic development is a path-dependent process in which countries accumulate capabilities that allow them to move into more complex products and industries. Inspired by a theory of capabilities that explains which countries produce which products, these diversification dynamics have been studied in great detail in the literature on economic complexity analysis. However, so far, these capabilities have remained latent and inference is drawn from product spaces that reflect economic outcomes: which products are often exported in tandem. Borrowing a metaphor from biology, such analysis remains phenotypic in nature. In this paper we develop a methodology that allows economic complexity analysis to use capabilities directly. To do so, we interpret the capability requirements of industries as a genetic code that shows how capabilities map onto products. We apply this framework to construct a genotypic product space and to infer countries’ capability bases. These constructs can be used to determine which capabilities a country would still need to acquire if it were to diversify into a given industry. We show that this information is not just valuable in predicting future diversification paths and to advance our understanding of economic development, but also to design more concrete policy interventions that go beyond targeting products by identifying the underlying capability requirements.

Eight Decades of Changes in Occupational Tasks, Computerization and the Gender Pay Gap

We build a new longitudinal dataset of job tasks and technologies by transforming the U.S. Dictionary of Occupational Titles (DOT, 1939 -1991) and four books documenting occupational use of tools and technologies in the 1940s, into a database akin to, and comparable with its digital successor, the O*NET (1998 -today). After creating a single occupational classification stretching between 1939 and 2019, we connect all DOT waves and the decennial O*NET databases into a single dataset, and we connect these with the U.S. Decennial Census data at the level of 585 occupational groups. We use the new dataset to study how technology changed the gender pay gap in the United States since the 1940s. We find that computerization had two counteracting effects on the pay gap -it simultaneously reduced it by attracting more women into better-paying occupations, and increased it through higher returns to computer use among men. The first effect closed the pay gap by 3.3 pp, but the second increased it by 5.8 pp, leading to a net widening of the pay gap.

The impact of return migration on employment and wages in Mexican cities

How does return migration from the US to Mexico affect local workers? Return migrants increase the local labor supply, potentially hurting local workers. However, having been exposed to a more advanced U.S. economy, they may also carry human capital that benefits non-migrants. Using an instrument based on involuntary return migration, we find that, whereas workers who share returnees’ occupations experience a fall in wages, workers in other occupations see their wages rise. These effects are, however, transitory and restricted to the city-industry receiving the returnees. In contrast, returnees permanently alter a city’s long-run industrial composition, by raising employment levels in the local industries that hire them.

Evaluating the Principle of Relatedness: Estimation, Drivers and Implications for Policy

A growing body of research documents that the size and growth of an industry in a place depends on how much related activity is found there. This fact is commonly referred to as the “principle of relatedness.” However, there is no consensus on why we observe the principle of relatedness, how best to determine which industries are related or how this empirical regularity can help inform local industrial policy. We perform a structured search over tens of thousands of specifications to identify robust – in terms of out-of-sample predictions – ways to determine how well industries fit the local economies of US cities. To do so, we use data that allow us to derive relatedness from observing which industries co-occur in the portfolios of establishments, firms, cities and countries. Different portfolios yield different relatedness matrices, each of which help predict the size and growth of local industries. However, our specification search not only identifes ways to improve the performance of such predictions, but also reveals new facts about the principle of relatedness and important trade-offs between predictive performance and interpretability of relatedness patterns. We use these insights to deepen our theoretical understanding of what underlies path-dependent development in cities and expand existing policy frameworks that rely on inter-industry relatedness analysis.

What Can the Millions of Random Treatments in Nonexperimental Data Reveal About Causes?

We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic ‘treatment’—differences in factors between units—and an effect—a resultant outcome difference. It is then proposed that all pairs can be combined to provide more accurate estimates of causal effects in nonexperimental data, provided a statistical model relating combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the O(n2) pairwise observations typically available in nonexperimental data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases. As a tool, the approach also allows researchers to represent and visualize possible causes, and heterogeneous subpopulations, in their samples.

The Economic Geography of the War in Ukraine

The war in Ukraine has been waging for a month now, not only causing human suffering on a massive scale, but also sending economic tremors that are felt far beyond the country’s borders. Since the collapse of the Soviet Union, Ukraine’s economy has been pulled between its strong historical ties with the Russian economy and the opportunities in forging new ties with the European Union (EU). With the help of Metroverse, an online tool for analyzing the local economies of over a thousand cities worldwide, and of the data that power this tool, we analyze the evolving economic relations between Ukraine, Russia and the West and weigh the consequences of their disruption.

Explore: The Economic Geography of the War in Ukraine

The Node Vector Distance Problem in Complex Networks

We describe a problem in complex networks we call the Node Vector Distance (NVD) problem, and we survey algorithms currently able to address it. Complex networks are a useful tool to map a non-trivial set of relationships among connected entities, or nodes. An agent—e.g., a disease—can occupy multiple nodes at the same time and can spread through the edges. The node vector distance problem is to estimate the distance traveled by the agent between two moments in time. This is closely related to the Optimal Transportation Problem (OTP), which has received attention in fields such as computer vision. OTP solutions can be used to solve the node vector distance problem, but they are not the only valid approaches. Here, we examine four classes of solutions, showing their differences and similarities both on synthetic networks and real world network data. The NVD problem has a much wider applicability than computer vision, being related to problems in economics, epidemiology, viral marketing, and sociology, to cite a few. We show how solutions to the NVD problem have a wide range of applications, and we provide a roadmap to general and computationally tractable solutions. We have implemented all methods presented in this article in a publicly available open source library, which can be used for result replication.