Papers We Love (FOSS Edition)

Papers We Love is a repository of academic computer science papers and a community who loves reading them but i’m co-opting the term to also refers to FOSS-related papers. I just read the Harvard Business School Strategy Unit Working Paper on The Value of Open Source Software and I wanted to share a few personal highlights from the paper. A number of references within the paper also look interesting and I want to add highlights from those papers too within this topic in the future.

I would love it if the community also shares Papers they love in the topic.

4 Likes

Harvard Business School Strategy Unit Working Paper on The Value of Open Source Software by Manuel Hoffmann, Frank Nagle, Yanuo Zhou

The key highlights from the paper are

We estimate the supply-side value of widely-used OSS is $4.15 billion,but that the demand-side value is much larger at $8.8 trillion

Further, 96% of the demand-side value is created by only 5% of OSS developers

Why studying the value of FOSS is important

The parallels between shared grazing lands and shared digital infrastructure are palpable –the availability of communal grass to feed cattle, and in turn feed people, was critical to the agrarian economy, and the ability to not have to recreate code that someone else has already written is critical to the modern economy

Ammunition for FOSS advocates

Other recent studies have come to similar conclusions showing that open source software (OSS) appears in 96% of codebases (Synopsys 2023), and that some commercial software consists of up to 99.9% freely available OSS (Musseau et al., 2022)

With data from the United States the resulting estimates show a value of $2 billion for the OSS Apache Web Server in 2012 (Greenstein and Nagle, 2014) and a combined value of $4.5 billion for Apache and the increasingly popular OSS web server nginx in 2018 (Murciano-Goroff, et al., 2021)

We find a value ranging from $1.22 billion to $6.22 billion if we were to decide as a society to recreate all widely used OSS on the supply side. However, considering the actual usage of OSS leads to a demand-side value that is orders of magnitude larger and ranges from $2.59 trillion to $13.18 trillion, if each firm who used an OSS package had to recreate it from scratch(e.g., the concept of OSS did not exist). … However, as for any project, the evidence is not complete and we argue that we underestimate the value since our data, e.g.,does not include operating systems, which are a substantial omitted category of OSS

Not so great assumptions

Here, we do not incorporate consumption externalities, i.e., we do not allow a benefit to arise for the general public when a package has been created and we further make sure that each firm is only replacing a package they use once, since a replaced package can be used within a firm as a club good (e.g., see Cornes and Sandler, 1996).

For large firms, there will be overhead coordination costs associated with building and maintaining a club good (an internal package). This potentially means that the demand-side 8.8 trillion $ number is a lower-bound.

In this calculation, we implicitly do not incorporate any production externalities since we assume that there is no spillover knowledge from one package to the next that would lower the cost of programming.

This too we know to be false. Packaging and project management add considerable overhead for a software project. Spillover knowledge definitely reduces the cost of programming as the developer becomes comfortable with those aspects of a project over time. This potentially means that the supply-side 4.15 billion $ is an upper-bound.

At the repository level, we quantified each developer’s proportional work contribution by calculating their share of commits to the total number of commits for a repository

Commits aren’t the best indicator of a developer’s work contribution to a FOSS project e.g. what if the project uses squash merge to merge a large feature branch that contained 10s or 100s of commits. Lines of code aren’t a great indicator either e.g. complicated bugs that require 10s of hours of debugging might be fixed by a change in a single line of code. There are no clean/easy indicators to quantify the work contribution of a developer so such assumptions are inevitable.

Unexpected (to me) findings

We find that OSS packages created in Go have the highest value with $803millionin value that would have to be created from scratch if the OSS packages did not exist. Go is closely followed by JavaScript and Java with$758 million and$658 million, respectively. The value of C and Typescript is $406 million and $317 million, respectively,while Python has the lowest value of the top languages with around $55 million

Potential growth areas for FOSS

The industry with the highest usage value of around $43 billion is “Professional, Scientific, and Technical Services.”“Retail Trade” as well as “Administrative and Support and Waste Management and Remediation Services” make up another large part of the demand-side externally facing value of OSS with $36billion and $35billion, respectively. In contrast, industries that constitute just a small portion of the value are “Mining, Quarrying, and Oil and Gas Extraction”, “Utilities”, “Agriculture, Forestry, Fishing, and Hunting.” The latter industries are classical non-service sector industries and as such software is expected to play less of a role there.

I’ve been singing this song for a while but now I have the evidence to back it up - we should be advocating for people to apply computing to their domains instead of expecting them to abandon their domains to become generic software developers.

Giants in the FOSS ecosystem

Indeed, the last five percent of programmers, or 3,000 programmers, generate over 93% of the supply side value. Similarly, Panel B shows –when accounting for usage–that those last five percent generate over 96% of the demand side value.

2 Likes

“Do Software Developers Understand Open Source Licenses?” published in 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). See IEEE link and PDF link shared by one of the authors.

The key highlight from the paper is

The 375 respondents to the survey, who were largely developers, gave answers consistent with those of a legal expert’s opinion in 62% of 42 cases. Although developers clearly understood cases involving one license, they struggled when multiple licenses were involved

Survey

seven hypothetical software development scenarios, someof which included multiple license combination

I highlight recommend checking out the survey questions first before looking at the legal answers to them in the paper in Table III - https://www.cs.ubc.ca/labs/spl/projects/softwarelicensing/resources/UBC_SPL_software_licensing_survey.pdf

Key observations

  1. Developers cope well with single licenses even in complex scenarios
  2. Developers have difficulty interpreting which actions are allowed in scenarios where more than one open source license is in use.
  3. Developers understand technical decisions will impact open source license use
  4. Developers recognize that there are interactions between open source licenses, but those interactions were not always correctly interpreted
  5. Questions that arise about the use of multiple open source licenses are situationally dependent.
  6. A number of developers lack knowledge of the details of open source licenses.

Survey underestimates the problem

In particular,we observed that the large majority of the participants (85.3%)had chosen a software project’s license before, which might bean instance of self-selection bias.

1 Like