Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft for blog post on fork/upstream relationship #886

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/author/hugo-gruson/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Hugo Gruson
twitter: grusonh
link: https://www.normalesup.org/~hgruson/
mastodon: https://mastodon.social/@grusonh
link: https://hugogruson.fr/
github: Bisaloo
gitlab: Bisaloo
orcid: 0000-0002-4094-1476
Expand Down
116 changes: 116 additions & 0 deletions content/blog/2024-12-01-forks-upstream-relationship/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
slug: "forks-upstream-relationship"
title: The dynamic relationship of forks with their upstream repository
date: '2024-12-08'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determine authoring policy: everyone in the comm call or only whoever gives feedback on this draft?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's find to keep it to those who contribute/review this post specifically. I don't have a strong opinion on this, however, except that I think people can only be authors if they've reviewed the post and somehow "approved" their authorship 😁

tags:
- community
- meetings
- open science
- maintenance
- open-source
- carpentries
- coworking
---

## Context setting

rOpenSci organizes [monthly co-working hours](/coworking/) on a variety of topics.
But the constant is the quality of the discussion that ensues and the renewed energy that comes from it.

The November coworking session welcomed Stefanie Butland from [openscapes](https://www.openscapes.org/) as a co-host.
She shared openscape's approach of ["Forking as a worldview"](https://rladiesrome.org/talks/2024/meetup/11182024/index.html)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
and invited participants to share their experience using forks to build on existing projects.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

This blog post summarizes some of the insights that emerged during the subsequent discussion.

## The different types of forks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a very brief explanation of what a fork is to those who might not know, or even though they may have heard the term, may be a bit hazy on it's meaning.


We identified two different types of forks, which take place in different contexts,
and will have different relationships with their upstream repository in the long-run:

1. Forks can be used as a starting point for a new diverging project, rather than starting from scratch.
In this situation, the fork author does not intent to sync with the upstream repository after the initial fork.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
This type of fork may therefore miss out on new features developed in upstream.
This type of fork will often happen when you copy template for a personal website (example shared by Will Gearty),
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
or when you re-use an existing project of yours as a boilerplate to build your new project.

2. Forks can also be used to custom or tweak the original project,
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
in a way that upstream cannot or doesn't want to address.
In this type of forks, we usually try to not diverge too much from upstream to benefit from new features,
and we regularly sync with them.
A major downside is that syncing will often come with gnarly git conflicts.
An example of this is various organizations forking the Carpentries workbench to customize the look and feel or their internal Carpentries-style training materials.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add link to Carpentries-style lessons listing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence of 2. should have similar structure to 1. where the main contrast is we don't want to diverge, and then say upstream source does not want to incorporate our adaptations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it would be fantastic to link to your own forked version of this so folks can see what it looks like in action. ... haha I see that below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An example of this is various organizations forking the Carpentries workbench to customize the look and feel or their internal Carpentries-style training materials.
An example of this is various organizations forking the Carpentries workbench to customize the look and feel of their internal Carpentries-style training materials.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to give a sense of how many orgs have done this. I thought it was pretty impressive. Shows the value of the open sharing of the original project

This is also what typically happens when open-source projects are packaged in Operating System distributions, such as Debian.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this important to say? My non-software-dev mind doesn't get it. Does it fit the big picture we're trying to convey?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on Linux, so I thought immediate, "Oh right, that's exactly what's going on there", but I can appreciate that it's not a common situation.

The Debian package maintainers often tweak some settings or make minor code modifications to improve integration with the rest of the system.

## Facilitating the cross-pollination between forks and upstream
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

In this section, we focus on the second type of forks,
where the fork author wants to stay in sync with the upstream repository.
We mentioned that conflicts were a major downside of this approach.

What are the strategies to minimize the conflicts?
And why is it in the interest of everyone (the fork owners, but also upstream maintainers) to minimize them?
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

### Simplifying syncs and reducing git conflicts {#simplifying-syncs}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could benefit from code examples or illustrations. There are links to PRs but something in the post itself may also help.


Going back to the example of organizations forking the Carpentries workbench,
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Hugo has [forked for the workbench to customize Epiverse-TRACE lessons with their own branding](https://github.com/epiverse-trace/varnish/pull/7).
But the upstream repository regularly receives updates with bug fixes or new features.

The few first months were painful because of the conflicts that arose when syncing with upstream.
But a couple of changes in upstream have greatly reduced the probability of conflicts.

In a nutshell, upstream can make it easier for forks to stay in sync by reducing the number of files that a fork needs to modify,
and by isolating these changes in specific locations.

In practice, this can for example be done by using configuration files and variables.
This serves the double purpose of being more DRY and of separating changes.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
In the example of the Carpentries workbench, [a pull request added a Sass variable to change the font](https://github.com/carpentries/varnish/pull/151),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's necessary to mention that it was the Carpentries example (so far that's the only one) nor that it was a PR. From what I understand, the important part was a change to make forking easier... unless I've misunderstood?

Suggested change
In the example of the Carpentries workbench, [a pull request added a Sass variable to change the font](https://github.com/carpentries/varnish/pull/151),
In this example, the upstream repository added [Sass variable to change the font](https://github.com/carpentries/varnish/pull/151),

removing the need to change it in multiple locations.

But from the fork owner perspective, some techniques can also make conflict resolution smoother.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
A major location of conflicts are auto-generated changes or files.
In the workbench example, this is the concatenated and minified CSS files.
They tend to concentrate conflicts because they regroup changes from multiple files,
and they are minified on a single line, which means git cannot distinguish what was changed.

But you can automate the conflict resolution for these cases.
The key is to make automated changes in a separate commit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link to an example commit? So ppl can see and copy approach, rather than having to imagine how to do it

When this commit conflicts, you can re-run the script that generates the file,
and commit the result, without having to think about what should be kept or not.

Alternatively, upstream can sometimes stop tracking auto-generated files from git,
and instead render them on the fly at build time.
Comment on lines +103 to +104
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Propose this in varnish and link from here


### Contributing back to upstream

The previous sections highlighted how much work maintaining a fork.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Is it possible to integrate the features to upstream and avoid having to maintain a fork?
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

Forks can be a good way to pilot new features or approaches.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which can be then integrated back into the upstream?

Note that this is not always possible as upstream may have a restricted scope, or limited resources for maintenance.

Forks are sometimes intended from the start as short-lived.
They only exist until the PR is merged when the user cannot wait for the PR to be merged, especially if upstream is slow to review PRs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
They only exist until the PR is merged when the user cannot wait for the PR to be merged, especially if upstream is slow to review PRs.
They only exist until the PR to upstream is merged when the user cannot wait for the PR to be merged, especially if upstream is slow to review PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs some clarification

In this case, the fork is already contributing back to upstream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How, if it's not merged?


But in other cases, can forks that started as a way to customize a project and start accumulating new features contribute back to upstream?
Forks can indeed be a great way to pilot new features, and to show the value of these features to upstream.

When contributing back, each pull request should focus on a single feature or bug fix, rather than pushing all changes the fork has accumulated.
It can also build confidence to show that you've been using this feature for a while without issues.

### It takes two to tango

Getting forks to contribute back is only possible is they can stay in touch with upstream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Getting forks to contribute back is only possible is they can stay in touch with upstream.
Getting forks to contribute back is only possible if they can stay in touch with upstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment on value of communicating your intent up front with the upsteam source? "I'd like to contribute back to upstream. Does the project have the capacity for this (merging PRs in a timely way). How could that work for you?"

Indeed, forks that have diverged too much from upstream will have a hard time contributing back, and making nice palatable pull requests.
In other words, upstream repositories should follow the steps outlined in the ["Simplifying syncs and reducing git conflicts"](#simplifying-syncs) section to ensure forks can more easily stay in sync.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

This highlights the symbiotic relationship that can exist between forks and upstream repositories, when both parties are willing to make the effort.

## Conclusion

While sometimes presented as a fracture, forks are an integral part of the open-source ecosystem.
They offer a way to build on existing projects, to add extra features without increasing the maintenance burden of the upstream repository, to pilot new features.
They can be hard to maintain, but it is also possible for the fork and upstream maintainers to work together to make the process smoother.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at bottom have a call to action asking people to consider what projects they are planning that could be forked. Code, lessons, documentation. Do they envision diverging or contributing back? With this info in the post how might they approach it.