Putting a value on Open Source communities
Is bigger necessarily better?
Dear Friends,
After Abel, it’s my turn to share some thoughts on how we approach investments in Open Source Software companies. This month, we’re detailing how we calculate a community score for an open source opportunity.
Oh, and after you’ve read that, feel free to check out the new version of our website: http://commit.fund
When I was a student, I really enjoyed debating. With my fellow team members, we would compete against other universities in a mock British Parliament style by debating “motions”.
I remember one in particular: “this House believes that bigger is better”.
I recently thought about it when working on the value of communities.
What makes a community?
Many things in the world depend on communities to thrive.
There are communities of fans, whether it’s fans of Taylor Swift (the so-called swifties, we’ll get back to them later) or the fans of Real Madrid Football club (we’ll talk about them too). There are communities of volunteers without whom charities would not operate, alumni of a university etc... Closer to our investment focus, and what drove my thoughts behind this article are communities of open source developers who are building the vast majority of the software we use today.
Communities are usually defined by utility and belonging[1], not just a common interest. That’s why you can talk about the community around a football club but not the community of “football fans”.
As a result, communities have a form of homogeneity, they’re federated by a common vision, interest or set of values; but they are also made of members who have very different engagement levels and different types of contribution.
There are football fans who keep trying to convince everybody who would listen that Real Madrid is the best club ever, and on the other extreme of the spectrum, swifties who would never admit they are. In most cases, a fan doing heavy promotion is a greater asset than silent members as they’re more likely to convince their friends to join the community.
Communities are extremely valuable. Both for member individually and for the organization at the center: typically, individual fans who buy jerseys or attend concerts provide direct cash to the organization, but they also act as promoters, distributors, R&D contributors etc…
The value of communities was illustrated last year with Taylor Swift’s perceived ability to influence the Presidential Election given the sheer size of her fan community or closer to us, in 2019, IBM’s $34 billion acquisition of Red Hat – the largest software acquisition at that time included a 70% premium, not just for Red Hat’s $3.4B in annual revenues, but for its community leadership and influence in the open-source ecosystem.
But how do you quantify this value? Is the community around Real Madrid football club more valuable than Barcelona’s for example? Is it better to have a large volume of followers on Instagram or a smaller number of people who buy the product and talk about their experience? Do you prefer a community which is growing fast or one that is stable but super engaged?
Quantity is not the only factor and even if size is an important aspect, bigger is not necessarily better.
How to divide a community (and conquer its inner working)?
As we mentioned, above, there are various elements that members of a community can bring:
Product development contribution: for a charity, this could be volunteers who organize distribution of food or do fund raising
Product improvement or support: this can be a user who point out bugs/issues with the product so that they can be fixed.
Direct revenue: In many cases, the community represent the first target customers, e.g cricket fans who buy the jersey of their favorite team/player, or the season ticket for the stadium.
Distribution/Marketing/Communication: In turn, members of a committee with a strong sense of belonging will be great advocates: for example, Swifties (told you they’d be back) who convince their friends to attend the concert, buy the t-shirt or simply listen to songs on repeat on Spotify. This can be done directly with word of mouth or at scale on social media.
Awareness / influence (if so many people are talking about it, it must be good): Passive followers of a brand on social media do provide value. Eg. Rafael Nadal still has a very large base of followers on social media (surely if he has more followers than Djokovic, he’s the GOAT!)
In addition, communities are not static, they grow or shrink, members can start as passive members but then decide to contribute in a more meaningful way. Volunteers in charity can then be involved in fund raising, allowing the charity to raise more money, reach more people who one day might also become more involved etc… We can see that community have the potential to create an attractive snowball effect and therefore, the pace at which they grow is perhaps more important than their size.
GitLab ‘s strategy creates “virtuous cycle where more contributions lead to more features, leading to more users and more contributions
Gitlab’s SEC filing 2024[2]
So what about Open Source communities?
“Open source is about more than just code. It’s about building a community of people who share a common vision and work together to make it a reality”
Mitchell Baker, Executive Chairwoman at Mozilla Foundation
At Red River West, we have a very data driven culture and a special interest in Open Source software.
Open source is a decentralized and collaborative approach to software development, where the source code is made publicly accessible, allowing a global community of contributors to view, modify, and distribute it. Open Source powers 65%[3] of software used by businesses, 70%[4] of smartphones and even NASA’s rovers on Mars[5]. And we think it represents an exciting investment opportunity. That’s why we’re leveraging our data driven investment style and tech platform to develop proprietary scores that assess open source opportunities from different angles: maturity, momentum, team, potential to develop in the US etc..
For Open Source opportunities, it was obvious we had to develop a new model to assess community value. Indeed, in the age of Co-Pilot, Cursor etc… code production is becoming a commodity, this progressively shifts the value of a project more and more toward its community, and no sector embodies that phenomenon better than Open Source Software.
Our Community Score aims at quantifying the value of a community around an open source project, in order to benchmark it against its competitors and better assess an investment opportunity. It can also be used as a performance indicator after the investment is made.
We’ve seen initiatives looking at the growth of contributors (high/low) vs growth of users (high/low),[6] I know founders who divide their community into only two groups, those who look at the code and those who look at the binary[7].
After lengthy studies and tests, we chose to rely on a concentric circles model for the community. At the center, the contributors closer to the product are likely to represent a relatively small number, but provide extremely meaningful contribution, whereas at the outer layer, passive followers have rather insignificant contribution, their strength is in their number.
After months of studying various Open Source communities, we’ve defined these layers for the first version of our community score. For each, we’re looking at their size, growth and engagement.
For a complete view of the type of metrics, it’s worth considering the Linux Foundation’s CHAOSS project which defines many metrics for community health.
Level 1: code contributors[8]
What they bring: the code itself i.e new product features, or bug fixing!
Where to find them: Github[9] has obviously the largest marketshare, but it’d be the same on alternative source code management platform like Bitbucket and Gitlab.
Important secondary metrics: quality of PRs[10], Growth, Retention, share of contributors outside the core team, profile of contributors
Typical number: 1 to 100
Level 2: Content creators
What they bring: These people write blog posts about their experience, share use cases, run meet-ups etc… they amplify the product’s reach and reduce support burdens: Good documentation and active forums maintained by the community can lower the company’s support costs massively.
Where to find them: blog platforms (dev.to for example), Social media (mostly Youtube), personal blogs etc
Important secondary metrics: follower base of each content creator.
Typical number: 1-100
Level 3: Issue creators
What they bring: An issue on GitHub is a feature used by developers to track tasks, bugs, enhancements, or discussions related to a project. The most important ones for us are feature requests and bug reports as they make the product perform better and help the core team build the roadmap. This allows open source software to follow the latest trends and customer expectations better than closed source companies which have to rely on customer survey or founder intuition.
Where to find them: Github or other equivalent platform
Important secondary metrics: type of issue, number of comments for each issue.
Typical number: 10-1,000
Level 4: commentators
What they bring: Most open source communities display a high level of chatter: people complaining about the product or vice versa, praising it, others discussing the product roadmap or asking questions, debating the merits of competing solutions etc…
Where to find them: Hacker News, Reddit, Substack, Discord, Discourse
Important secondary metrics: Volume of comment, engagement and sentiment analysis.
Typical number: 100 – 10,000
Level 5: Followers (social media)
What they bring: Individuals who follow the projects’ social media accounts, occasionally like a model on Hugging Face or star a repository on Github. Even if they’re rather passive (liking a comment, following an account on X doesn’t mean much), they bring awareness or a form of validation, they can also play the role of “amplifiers” of news related to the open source project.
Where to find them: Github, Hugging Face and of course, all mainstream social platforms, but mostly X (mastodon or Blusky, despite being themselves open source, don’t have a significant weight)
Important secondary metrics:Growth, engagement levels
Typical number: 1,000-100,000
Calculating the score
For each of the different layers, we harvest relevant data sources in order to calculate a score. Previous studies[11] have tried to estimate the replacement cost of the community’s contributions – essentially, how much would it cost to reproduce the open-source output with paid labor. This only works for the first level (and even that would require arbitrary assumptions on the time and skill necessary for each contribution) and it’s a static metric.
We prefer to use a relative scoring method considering a combination of size and growth and other secondary factors listed above. We combine these metrics using a proprietary weight allocation algorithm in order to have a harmonized view for each level.
For most of these metrics, the larger number the better… But not for all! If the first layer (code contributors) is too fragmented with hundreds or thousands of contributors, this is likely to cause governance issues and risks of fork[12] of the project. As an investor, we tend to prefer a tight and engaged group of people for that first layer rather than a large one[13]. Some companies actually push it ever further and clearly say that they’re not looking for code contribution outside of their core team (e.g Sentry and Sonar’s CTOs told me they work this way) but they still listen carefully to what the community says.
As we noted above, for each layer, the numbers vary a lot, having 50 code contributors is great, but having only 50 followers on X is insignificant. Therefore, the final step for us to calculate a Community Score is to normalize sub-components and combine each dimension into a unique score. It’s not trivial and we tested different options, we use a weighting mechanism that we validated empirically[14]. It goes a bit beyond the goal of this article (and we like to keep our formula to ourselves a bit of our secret sauce).
So, what does it look like?
As an illustration, we used this score to rank Open Source AI projects in Europe in our latest Newsletter [link]. See below companies ranked by Momentum Score then by Community score. We already knew that Hugging Face’s community was exceptional, but their community score is actually through the roof!
Seeing how engaged the community behind Zama is was a surprise to me : it’s a pretty technical subject: Zama is a startup working on Full Homomorphic Encryption.
This is the very first iteration of our Community Score, we have many ideas to refine it, more data that we’d like to include but the learnings it brings are already super useful for us!
Back to football fans
The approach we used for Open Source can be replicated to other sectors.
So let’s wrap this up with a bit of fun and apply the reasoning above to communities around football clubs. I picked three of them: Barcelona, Paris St Germain and Real Madrid[15]. We can define layers in the same concentric circles
Football clubs have become assets attracting private equity investors, and performances are not easy to predict, therefore, the value of their community is a critical asset.
In this case, applying our community score shows that Real Madrid is ahead of the other two for every circle. (Barcelona is negatively affected by a smaller stadium, which I’ve been told is a temporary situation, PSG is negatively impacted by a decrease in social media followers).
You could then use these scores to track the value over time and assess to what extent it is correlated with results on the field or presence of star players! I leave that to specialists!
THIS HOUSE BELIEVES THAT…
Given the sheer number of open source projects[16], analyzing them at early stage level can’t be done without data. Community is one of the scores we’re looking at and following the approach above, we developed a first version, which we’ll improve over time.
As we saw, for community, each member contributes in very different way, and dynamics is often more important than size, so a “bigger is better” approach definitely doesn’t make sense.
So generally speaking, the House’s motion is rejected!
But now that we know how to value a community, shall we move on to debate another question? In light of the emergence of Deepseek, Mistral, Kimi, Qwen etc:
“This house believes that Open Source communities are the future of Artificial Intelligence”
What do you think?
OH
PS: If you were wondering, I’m neither fan of Taylor Swift nor of Real Madrid, I’d be hard pressed to name more than 2 songs from her and more than 2 players from Real Madrid but “from a distance”, I do find these communities fascinating!
[1] Read Sarah Drinkwater’s blog posts for more details
[2] https://s204.q4cdn.com/984476563/files/doc_financials/2024/q4/10K-2024.pdf
[3] Source: Nauta Capital 2021
[4] https://gs.statcounter.com/os-market-share/mobile/worldwide
[5] https://www.jpl.nasa.gov/news/mapping-the-red-planet-with-the-power-of-open-science/
[6] Working in Public, the making Making and Maintenance of Open Source Software – Nadia Eghbal
[7] Binary is the compiled file which can be used directly on a computer
[8] In practice, not all PRs are created equal, and we make the distinction between PR corresponding to real improvement vs code “cleaning” but for simplicity, I’ve considered only one layer for this article.
[9] By the way, Github is a great example of companies which acquisition value by Microsoft was largely driven by its community
[10] PR: Pull Request, a mechanism used by developers to propose changes to a codebase and request that these changes be reviewed and merged into the main branch of the repository.
[11] Carol Robbins et al. (2021), “A First Look at Open-Source Software Investment in the United States”
[12] A fork project refers to the creation of an independent version of software from the source code of an existing project, to develop it autonomously
[13] This happened to Langchain which struggled to expand following its initial momentum as the number of 1st level members was too big.
[14] With a plan to validate this very iteration of the community score against a larger dataset
[15] We’ve also calculated it for Olympique de Marseille but as a French person, I know that comparing the two football clubs is a very dangerous idea.
[16] About 90m of Open Source projects are started every year on GitHub alone!









Can you value brands with the same logic?