-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better define success #20
Comments
From internal comms strategy doc:
Address w/ this. |
One in four Open Source maintainers burn out. 44% of 58% = 25% |
So, OSS is a good measure of who can suffer the best? 🗡️ From your last post, I'm not sure I can fully agree with one of your conclusions:
Unfortunately, there is no specific data I can point to to back up my feelings :D. However, if we were in a buy-only world, I do think that: a. prices for software would be much, much higher than they are today I bet there would be other ramifications on the price of "Buy/Build" if "Borrow" were taken off the table entirely but I haven't thought through all of them yet. I felt like your conclusions were based on what the cost of software is right now in a world where "Borrow" keeps the price of software lower than it might otherwise be. Just as you point out (correctly) that the authors of that HBS paper weren't considering the "Buy" option in their comparison, I think some more consideration needs to be given to the effect on the price of software when "Borrow" is taken off the table. Sorry I didn't get to stay for the full discussion earlier today. Thanks for this, though! |
Thanks for weighing in @stackedsax!
Fair enough, and the considerations you point out are good ones (prices being passed on, training costs). Perhaps a more nuanced approach would increase the number above the $177 million reported in the appendix for a naive goods market approach, though I'm not sure it would approach the $8.8 trillion for the naive labor market approach. In any case, the thing in the working paper I found most suggestive was the part about how few developers there are who produce the bulk of Open Source software. Rather than engage in speculative thought experiments (which, to be clear, I've also certainly done), I'm intrigued at the idea of determining who the developers actually are and figuring out which of them need how much money to be sustainable. The suggestion in "The Value of Open Source" is that there are only a few thousand of them(!). This seems quite tractable and much more productive. |
I think I've got a lot of data to help with that question, would be happy to help investigate. I also did some similar research back in 2018 and found that just 262 people maintained the majority of the whole of the rubygems community! https://youtu.be/hW4wUpoBHr8?feature=shared&t=708 |
Yay @andrew! 😄 99.95 to 0.05 is quite a dramatic figure. On the one hand, it's alarming, to think that so many depend on so few. In another light, it can almost be seen as encouraging, because it's a much more tractable problem to fund 262 people (in the few-years-ago Ruby case) vs. an amorphous unknown group that we think would need "trillions of dollars." How do you think we might best go about investigating this? The sort of thing I think would be valuable to be able to say is:
Thoughts on how to fill in those blanks? 🤔 (3) and (5) might want geo variation. |
Focusing on What I did with rubygems was total up the amount of downloads across all, then find the most downloaded gems which made up the majority of the downloads, the long tail of projects that don't get any downloads is very long. If we take this approach, we can do a breakdown by software ecosystem, because things like download counts aren't really comparible between say NPM and CRAN (the R package manager), the top 1% of R packages downloads will look negligable compared directly to NPM but are still very impactful within their community (scientific and statistical software). Steps for each ecosystem:
Notes: I suspect we'd see some relation between the number of top maintainers and the total size of the ecosystem (more for NPM than for CRAN for example). Not all ecosystems have download counts but there are other comparible metrics that can be used for most large ecosystems. I'm focusing primarily on software projects delivered via package managers as that's what I have the most data for now. Questions/options/further investigations:
Once we have that list, moving on to the next points, do we want to trying and guestimate the amount of maintaince work a given project takes given it's current activity levels, or just treat them all the same. For example, https://www.npmjs.com/package/left-pad probably doesn't need as much ongoing maintaince work as something like https://github.com/pytorch/pytorch |
I like the thought that a known quantity of 262 developers will get to slice up a trillion dollars for themselves 😄 |
/me considering how to make it 263. 🤔 |
It seems like we could use this for "2. MM% percent of their labor is voluntary and sponsorable." Do the professional and semi-professional maintainers count for voluntary and sponsorable, though? 🤔 (slide source, numbers source) cc: @karasowles |
Folks let's do this. Let's come up with a methodology and refresh it annually. "Fair share." Definitive, credible, simple to apply, "Here's what your company should be paying and here's how you can pay it." I've updated the issue description along these lines. @andrew Your notes for (1) seem like a reasonable place to start.
I think we might need to look at the distribution curve for each ecosystem. Might be closer to 80/20 for some, closer to 99/1 for others. Can we get little sparklines for each eco? Also what is the definitive list of ecos we are working with? The answer is not jumping out at me on https://ecosyste.ms/.
What is everyone's thinking on person vs. project? I tend to think funding flow should focus on company to project, and each project should be responsible for flow from there to people. It's the projects that companies need to exist, and people need to be free to move in and out of projects. I know a guy who was making $500+/mo on GHS from the Clojure community even though he had moved on from Clojure. He ended up turning off his Sponsors account. From the POV of any companies depending on those libraries, it would've been better for funding to have been kept in place to incentivize a new maintainer to step up.
Good call. Hash the email addresses and use that as an ID? |
We can start with the package ecosystems listed here: https://packages.ecosyste.ms/ |
There it is! Knew it had to be somewhere. :-) This seems like quite a comprehensive set of ecosystems to address. |
First off I'll see if I can recreate similar data for rubygems that I did back then and in theory a script for one ecosystem should work for all of them then. |
I published a CTA for people to join us here. Will promote tomorrow, lmk if you have early feedback. 🙏 |
Quick first pass at the key rubygems maintainers: Sum downloads for a registry:
find 80% of total downloads:
Start from most downloaded package, sum downloads, keep fetching packages until target is reached:
For rubygems.org:
unique maintainers:
For 80% of downloads:
For 99% of downloads:
Generate a csv of downloads and maintainers of top 10,000 most downloads ruby gems:
csv: https://gist.github.com/andrew/815014222ccf1825b37defc004454446 Downloads chart of csv: Notes:
|
Other ecosystems where I've got both downloads and maintainer data for in https://packages.ecosyste.ms that I can run the same analysis for:
for other ecosystems we can get some form of maintainer data from the source repositories and look at using other popularity metrics (dependents, stars, forks, docker downloads etc) |
Very back-of-the-napkin maths of extrapolating the rubygems numbers to all the other ecosystems I have maintainer data, would be around 7100 maintainers of the top 1% of open source packages. |
Good analysis @andrew. To get a fairer picture I think it would be important to also look at commit data. Another approach albeit frozen in time, would be to look at the repositories that was archived by github. https://archiveprogram.github.com/ The full list of those repos can be found here > https://archiveprogram.github.com/assets/img/archive-repos.txt |
I've also setup a daily cron to sum up the total downloads for each registry, visible on the homepage: https://packages.ecosyste.ms and will be available in the API shortly too: https://packages.ecosyste.ms/api/v1/registries/ With this it should be possible to generate most of this analysis just using the packages api in future. |
I'm also indexing commit data from github, gitlab, codeberg etc over here when we're ready for it: https://commits.ecosyste.ms Not all packages have a strong reference to a source repository, so we'll need to fill in the gaps in certain places. |
I'd not seen this before, some how 4 projects I created on in there?! |
I guess a congrats is in order ;). You are forever memorialized into the future (or at least for a 1000 years). |
lol any chance being on that list might be enough evidence to get your own wikipedia page? :-p |
A quick summary of ecosystems that I have quick access to download and maintainer data for:
Note: excluding hackage and bioconductor for now as they are much smaller and don't have the same kind of 80/20 skew. So around 4,000 developers are responsible for packages that make up 80% of downloads for those 9 registries of 4.2m packages, totalling 870 billion downloads, the 0.1% if you will! |
I'm curious how things change for the 90 percentile and for the 95 percentile. |
@jonathan-s 90%:
|
and 95%:
Note: Over 1 trillion downloads! |
Gotta ask ... 99th percentile, for completeness? |
Only because it's you @chadwhitacre
Note: something went wrong with cran, a stray null that I didn't handle, so just left that row out for now (it's late in the UK) One extra note: There may be some maintainers that are in the top 1% in multiple ecosystems, but they are currently treated as different people in these calculations (if there are they are incredibly productive people!) |
If you then also were to include contributors that at least contributed 1% of all commits the ballpark here is that the amount of people would increase by about an order of a magnitude. (Checking sentry and django as two examples, for both projects ~20 people have each contributed at least 1% of all commits). |
(Cross-posting thoughts from DMs at @chadwhitacre's request). Funding 5000 is a really great and approachable problem definition. A few follow-on thoughts, offered in a yes-and spirit. From a narrative perspective, I suspect "sustainability/fair share" frame might encourage zero-sum thinking. "Our slice of the pie". From my perspective, the pie isn't fixed. Rather, we're looking at a broken feedback loop and a resulting ecological desert.
There's probably some upper limit to this positive ecological feedback, defined by the rate at which a market can absorb new products, but I doubt we've reached that yet. Rather, we seem to be limited at the level of fundamental research / new low-cost enablers due to lack of non-speculative funding. Low-cost enablers are what open source is all about. Let's terraform the ecological desert :) One more thought: the graph of active maintainers clearly follows a power law distribution. This is no surprise! Approximate power laws are intevitable in all evolving networks. So, there will be a very few startups that make exponentially more money, a very few maintainers that make exponentially more open source impact, etc. However, the exponent matters a lot! Even small changes in exponent make enormous differences in qualitative behaviour. Another way we might frame the problem is to "fatten the curve". Change the exponent so that the number of high-performing open source contributors is not 5k but 50k, 500k... This is what expanding the carring capacity of the open source ecosystem could do. Change the exponent. |
|
Thanks for jumping in, @gordonbrander! :-)
I'm thinking of this in terms of designing institution(s) to manage Open Source as a common pool resource, lots more to say about that under #14. As to the exponent—if I'm reading you right, I would call this a problem of income inequality (cf.), and bucket it with:
|
@chadwhitacre you've got a localhost link there btw ;) |
lolsob, fixed |
In other words, asking "How much money do we need to cover the expenses of existing open source maintainers?" is a little misleading. As Gordon says, if we start allocating more resources to open technologies, more companies and individuals will begin producing open technologies (I'm one of the people waiting in the queue; today, I can't become an open source maintainer because I already know there is no money for me). For example, here is an alternative question: How much of our technologies/software could become open source and still provide some utility? What can be the maximum share of open source in the pie? Since I learned about open source software, my dream has always been to live in a free and open source world where anyone can use, modify, and contribute to any existing technology or software. That world would be an exciting world to live in, boosting tech innovation, closing the gap between the nations, and potentially addressing crucial social issues. We are expected to spend 1 trillion dollars on software in 2024, while 99% of the money will go to companies producing closed tech. So, instead of focusing on existing open source initiatives, here is a challenging question: what must we do to align the incentives for companies to produce their tech as open source? In other words, how can we keep that 1 trillion dollars on the table but get open tech in return for this deal? There are a few reasons why it is crucial to ask this question:
Overall, I love having these conversations. At some point, should we consider having a dedicated online call to get into details? Edit: Visualization always helps 📊 |
I see that as what we are working with FSL / DOSP / Software Commons. This aligns with OSI's bylaws:
I think what we want to come up with is a methodology for calculating a "fair share" (or whatever framing we land on) that is repeatable, both so that it can be independently reproduced, but also so that we can repeat it annually to adjust the corporate fair share over time. If we craft the methodology properly, it should account for changes to the balance between open and proprietary over time.
We will get you there some day, @coni2k! :-) |
Indeed! 💯 And having these conversations is a crucial part of the process.
I will follow the "fair share" part of the conversation. On that topic, I only want to point out that we should consider the possibility of the "OSS production can/should be profitable" scenario and not fix ourselves to only cover the base cost/salaries. We might see "profitability" as the next step in the conversation, but it may also change how we calculate the initial "fair share."
Allow me to improve my "success" definition then: can we turn the entire pie open source (as previously mentioned, not overnight but over a period):
Put differently, can we have a state that allows tech companies to maximize their profits and software/tech freedom simultaneously? Today, these two aspects contradict each other, not allowing us to reach "maximum utility." From this perspective:
It might be handy to have a separate discussion on the importance of maximizing software/tech freedom at a macro level, on the startup ecosystem, the pace of innovation, competition, technological independence, etc. Briefly, every friction we add to these transactions has a ripple effect on all these aspects. Hence, having a permissionless digital economy would be my ultimate state. And, no doubt, the inconvenience/limitation the FSL adds on the consumer side is insignificant compared to a fully proprietary product. Still, FSL is more of an answer to the existing open source initiatives to limit the Free-riding rather than offering a solid incentive for existing proprietary software companies. I will derail the conversation from defining "success" and expand on how I see the problem and the solution in creating incentives for the existing tech companies to start producing open tech. Scenario 1: Here is a private goods scenario with proprietary software:
Scenario 2: Here is the open source version without a social contract:
Both scenarios cannot maximize the utility:
The question is, can we find an alternative scenario to address these two aspects and maximize the utility (maximizing revenue and software freedom simultaneously)? In other words, since open source software is categorically a public good rather than a private good, can we have a new social contract around it and establish a "public goods transaction" by imitating the "private goods transaction"? Scenario 3: Here is an alternative scenario with "public goods transaction":
There are significant challenges with this new scenario. I will only mention some, but I will not get into details for the sake of simplicity:
However, with this scenario:
In short, the way I see it, if we want to see a transition towards it, we need to start treating open tech production like a business activity, establish an economic model (public goods transaction) around these new goods, and ensure companies won't lose revenue by producing open tech. Other solutions will be either some form of charity or introduce some form of limitation. My post became longer than expected, but I hope not an exhaustive one. Thank you for the discussion ✌ |
Coming here from #9 and #16. I did some napkin math years ago that put a company's fair share at $2,000/dev/yr. Last month The Value of Open Source Software came out. It feels so, so fluffy to me. It did get me thinking though, that if there are only a few thousand developers producing all of Open Source, then we should be able to get pretty specific about what their needs are to achieve sustainability:
Let's figure out how to fund the few thousand developers currently out there today. That will be the basis for growing the maintainer community beyond its current state.
The question here is: what does success look like? How much money per year do we need to fund the few thousand devs that are currently maintaining Open Source? What is each company's fair share?
The text was updated successfully, but these errors were encountered: