-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first draft for blog post on fork/upstream relationship #886
base: main
Are you sure you want to change the base?
Changes from 7 commits
2cb5d12
1e3c981
e6fd8b6
9db41e5
395f74a
604d565
b2a77b0
16484f5
644e9c8
3c6de38
115ce7c
3026bd8
9892128
c04400a
3b81e45
81fddb1
a64d567
fd93f56
ea06547
6707d41
7642eca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,116 @@ | ||||||
--- | ||||||
slug: "forks-upstream-relationship" | ||||||
title: The dynamic relationship of forks with their upstream repository | ||||||
date: '2024-12-08' | ||||||
tags: | ||||||
- community | ||||||
- meetings | ||||||
- open science | ||||||
- maintenance | ||||||
- open-source | ||||||
- carpentries | ||||||
- coworking | ||||||
--- | ||||||
|
||||||
## Context setting | ||||||
|
||||||
rOpenSci organizes [monthly co-working hours](/coworking/) on a variety of topics. | ||||||
But the constant is the quality of the discussion that ensues and the renewed energy that comes from it. | ||||||
|
||||||
The November coworking session welcomed Stefanie Butland from [openscapes](https://www.openscapes.org/) as a co-host. | ||||||
She shared openscape's approach of ["Forking as a worldview"](https://rladiesrome.org/talks/2024/meetup/11182024/index.html) | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
and invited participants to share their experience using forks to build on existing projects. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
This blog post summarizes some of the insights that emerged during the subsequent discussion. | ||||||
|
||||||
## The different types of forks | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth having a very brief explanation of what a fork is to those who might not know, or even though they may have heard the term, may be a bit hazy on it's meaning. |
||||||
|
||||||
We identified two different types of forks, which take place in different contexts, | ||||||
and will have different relationships with their upstream repository in the long-run: | ||||||
|
||||||
1. Forks can be used as a starting point for a new diverging project, rather than starting from scratch. | ||||||
In this situation, the fork author does not intent to sync with the upstream repository after the initial fork. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
This type of fork may therefore miss out on new features developed in upstream. | ||||||
This type of fork will often happen when you copy template for a personal website (example shared by Will Gearty), | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
or when you re-use an existing project of yours as a boilerplate to build your new project. | ||||||
|
||||||
2. Forks can also be used to custom or tweak the original project, | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
in a way that upstream cannot or doesn't want to address. | ||||||
In this type of forks, we usually try to not diverge too much from upstream to benefit from new features, | ||||||
and we regularly sync with them. | ||||||
A major downside is that syncing will often come with gnarly git conflicts. | ||||||
An example of this is various organizations forking the Carpentries workbench to customize the look and feel or their internal Carpentries-style training materials. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: Add link to Carpentries-style lessons listing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First sentence of 2. should have similar structure to 1. where the main contrast is we don't want to diverge, and then say upstream source does not want to incorporate our adaptations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think it would be fantastic to link to your own forked version of this so folks can see what it looks like in action. ... haha I see that below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be great to give a sense of how many orgs have done this. I thought it was pretty impressive. Shows the value of the open sharing of the original project |
||||||
This is also what typically happens when open-source projects are packaged in Operating System distributions, such as Debian. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this important to say? My non-software-dev mind doesn't get it. Does it fit the big picture we're trying to convey? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm on Linux, so I thought immediate, "Oh right, that's exactly what's going on there", but I can appreciate that it's not a common situation. |
||||||
The Debian package maintainers often tweak some settings or make minor code modifications to improve integration with the rest of the system. | ||||||
|
||||||
## Facilitating the cross-pollination between forks and upstream | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
In this section, we focus on the second type of forks, | ||||||
where the fork author wants to stay in sync with the upstream repository. | ||||||
We mentioned that conflicts were a major downside of this approach. | ||||||
|
||||||
What are the strategies to minimize the conflicts? | ||||||
And why is it in the interest of everyone (the fork owners, but also upstream maintainers) to minimize them? | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Simplifying syncs and reducing git conflicts {#simplifying-syncs} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section could benefit from code examples or illustrations. There are links to PRs but something in the post itself may also help. |
||||||
|
||||||
Going back to the example of organizations forking the Carpentries workbench, | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
Hugo has [forked for the workbench to customize Epiverse-TRACE lessons with their own branding](https://github.com/epiverse-trace/varnish/pull/7). | ||||||
But the upstream repository regularly receives updates with bug fixes or new features. | ||||||
|
||||||
The few first months were painful because of the conflicts that arose when syncing with upstream. | ||||||
But a couple of changes in upstream have greatly reduced the probability of conflicts. | ||||||
|
||||||
In a nutshell, upstream can make it easier for forks to stay in sync by reducing the number of files that a fork needs to modify, | ||||||
and by isolating these changes in specific locations. | ||||||
|
||||||
In practice, this can for example be done by using configuration files and variables. | ||||||
This serves the double purpose of being more DRY and of separating changes. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
In the example of the Carpentries workbench, [a pull request added a Sass variable to change the font](https://github.com/carpentries/varnish/pull/151), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure it's necessary to mention that it was the Carpentries example (so far that's the only one) nor that it was a PR. From what I understand, the important part was a change to make forking easier... unless I've misunderstood?
Suggested change
|
||||||
removing the need to change it in multiple locations. | ||||||
|
||||||
But from the fork owner perspective, some techniques can also make conflict resolution smoother. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
A major location of conflicts are auto-generated changes or files. | ||||||
In the workbench example, this is the concatenated and minified CSS files. | ||||||
They tend to concentrate conflicts because they regroup changes from multiple files, | ||||||
and they are minified on a single line, which means git cannot distinguish what was changed. | ||||||
|
||||||
But you can automate the conflict resolution for these cases. | ||||||
The key is to make automated changes in a separate commit. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you link to an example commit? So ppl can see and copy approach, rather than having to imagine how to do it |
||||||
When this commit conflicts, you can re-run the script that generates the file, | ||||||
and commit the result, without having to think about what should be kept or not. | ||||||
|
||||||
Alternatively, upstream can sometimes stop tracking auto-generated files from git, | ||||||
and instead render them on the fly at build time. | ||||||
Comment on lines
+103
to
+104
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: Propose this in varnish and link from here |
||||||
|
||||||
### Contributing back to upstream | ||||||
|
||||||
The previous sections highlighted how much work maintaining a fork. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
Is it possible to integrate the features to upstream and avoid having to maintain a fork? | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Forks can be a good way to pilot new features or approaches. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which can be then integrated back into the upstream? |
||||||
Note that this is not always possible as upstream may have a restricted scope, or limited resources for maintenance. | ||||||
|
||||||
Forks are sometimes intended from the start as short-lived. | ||||||
They only exist until the PR is merged when the user cannot wait for the PR to be merged, especially if upstream is slow to review PRs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needs some clarification |
||||||
In this case, the fork is already contributing back to upstream. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How, if it's not merged? |
||||||
|
||||||
But in other cases, can forks that started as a way to customize a project and start accumulating new features contribute back to upstream? | ||||||
Forks can indeed be a great way to pilot new features, and to show the value of these features to upstream. | ||||||
|
||||||
When contributing back, each pull request should focus on a single feature or bug fix, rather than pushing all changes the fork has accumulated. | ||||||
It can also build confidence to show that you've been using this feature for a while without issues. | ||||||
|
||||||
### It takes two to tango | ||||||
|
||||||
Getting forks to contribute back is only possible is they can stay in touch with upstream. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add a comment on value of communicating your intent up front with the upsteam source? "I'd like to contribute back to upstream. Does the project have the capacity for this (merging PRs in a timely way). How could that work for you?" |
||||||
Indeed, forks that have diverged too much from upstream will have a hard time contributing back, and making nice palatable pull requests. | ||||||
In other words, upstream repositories should follow the steps outlined in the ["Simplifying syncs and reducing git conflicts"](#simplifying-syncs) section to ensure forks can more easily stay in sync. | ||||||
Bisaloo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
This highlights the symbiotic relationship that can exist between forks and upstream repositories, when both parties are willing to make the effort. | ||||||
|
||||||
## Conclusion | ||||||
|
||||||
While sometimes presented as a fracture, forks are an integral part of the open-source ecosystem. | ||||||
They offer a way to build on existing projects, to add extra features without increasing the maintenance burden of the upstream repository, to pilot new features. | ||||||
They can be hard to maintain, but it is also possible for the fork and upstream maintainers to work together to make the process smoother. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe at bottom have a call to action asking people to consider what projects they are planning that could be forked. Code, lessons, documentation. Do they envision diverging or contributing back? With this info in the post how might they approach it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Determine authoring policy: everyone in the comm call or only whoever gives feedback on this draft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's find to keep it to those who contribute/review this post specifically. I don't have a strong opinion on this, however, except that I think people can only be authors if they've reviewed the post and somehow "approved" their authorship 😁