Harvard Business School Strategy Unit Working Paper on The Value of Open Source Software by Manuel Hoffmann, Frank Nagle, Yanuo Zhou
The key highlights from the paper are
We estimate the supply-side value of widely-used OSS is $4.15 billion,but that the demand-side value is much larger at $8.8 trillion
…
Further, 96% of the demand-side value is created by only 5% of OSS developers
Why studying the value of FOSS is important
The parallels between shared grazing lands and shared digital infrastructure are palpable –the availability of communal grass to feed cattle, and in turn feed people, was critical to the agrarian economy, and the ability to not have to recreate code that someone else has already written is critical to the modern economy
Ammunition for FOSS advocates
Other recent studies have come to similar conclusions showing that open source software (OSS) appears in 96% of codebases (Synopsys 2023), and that some commercial software consists of up to 99.9% freely available OSS (Musseau et al., 2022)
With data from the United States the resulting estimates show a value of $2 billion for the OSS Apache Web Server in 2012 (Greenstein and Nagle, 2014) and a combined value of $4.5 billion for Apache and the increasingly popular OSS web server nginx in 2018 (Murciano-Goroff, et al., 2021)
We find a value ranging from $1.22 billion to $6.22 billion if we were to decide as a society to recreate all widely used OSS on the supply side. However, considering the actual usage of OSS leads to a demand-side value that is orders of magnitude larger and ranges from $2.59 trillion to $13.18 trillion, if each firm who used an OSS package had to recreate it from scratch(e.g., the concept of OSS did not exist). … However, as for any project, the evidence is not complete and we argue that we underestimate the value since our data, e.g.,does not include operating systems, which are a substantial omitted category of OSS
Not so great assumptions
Here, we do not incorporate consumption externalities, i.e., we do not allow a benefit to arise for the general public when a package has been created and we further make sure that each firm is only replacing a package they use once, since a replaced package can be used within a firm as a club good (e.g., see Cornes and Sandler, 1996).
For large firms, there will be overhead coordination costs associated with building and maintaining a club good (an internal package). This potentially means that the demand-side 8.8 trillion $ number is a lower-bound.
In this calculation, we implicitly do not incorporate any production externalities since we assume that there is no spillover knowledge from one package to the next that would lower the cost of programming.
This too we know to be false. Packaging and project management add considerable overhead for a software project. Spillover knowledge definitely reduces the cost of programming as the developer becomes comfortable with those aspects of a project over time. This potentially means that the supply-side 4.15 billion $ is an upper-bound.
At the repository level, we quantified each developer’s proportional work contribution by calculating their share of commits to the total number of commits for a repository
Commits aren’t the best indicator of a developer’s work contribution to a FOSS project e.g. what if the project uses squash merge to merge a large feature branch that contained 10s or 100s of commits. Lines of code aren’t a great indicator either e.g. complicated bugs that require 10s of hours of debugging might be fixed by a change in a single line of code. There are no clean/easy indicators to quantify the work contribution of a developer so such assumptions are inevitable.
Unexpected (to me) findings
We find that OSS packages created in Go have the highest value with $803millionin value that would have to be created from scratch if the OSS packages did not exist. Go is closely followed by JavaScript and Java with$758 million and$658 million, respectively. The value of C and Typescript is $406 million and $317 million, respectively,while Python has the lowest value of the top languages with around $55 million
Potential growth areas for FOSS
The industry with the highest usage value of around $43 billion is “Professional, Scientific, and Technical Services.”“Retail Trade” as well as “Administrative and Support and Waste Management and Remediation Services” make up another large part of the demand-side externally facing value of OSS with $36billion and $35billion, respectively. In contrast, industries that constitute just a small portion of the value are “Mining, Quarrying, and Oil and Gas Extraction”, “Utilities”, “Agriculture, Forestry, Fishing, and Hunting.” The latter industries are classical non-service sector industries and as such software is expected to play less of a role there.
I’ve been singing this song for a while but now I have the evidence to back it up - we should be advocating for people to apply computing to their domains instead of expecting them to abandon their domains to become generic software developers.
Giants in the FOSS ecosystem
Indeed, the last five percent of programmers, or 3,000 programmers, generate over 93% of the supply side value. Similarly, Panel B shows –when accounting for usage–that those last five percent generate over 96% of the demand side value.