One of the coolest features of git, and code hosting services in general (e.g. Github, Bitbucket, etc) is the concept of forking a repository of code. The idea is that you can go to a project that is open source (e.g. https://github.com/mui/material-ui), and copy/clone the entire project via one button (Fork) or one command (
git clone [email protected]:mui/material-ui.git), as shown below:
After forking a project, you are free to modify and expand upon that project to your own end:
- Maybe you use it as a starting point for a new project of your own
- Or perhaps instead you use it to learn best practices around a new language, including project structure and architecture
- Or possibly you just want to keep a copy of a codebase in case the original is removed
As long as you are following the
LICENSE.md of a project, you are generally considered free and clear to do what you wish, as the concept of allowing almost “free use” of code is core to the idea behind open source software.
But there’s a core piece of forking / cloning a project, specific to version control software like git, that makes this feature controversial: the idea that not just code is copied over during a fork, but contributions and their authors are copied over as well.
What’s controversial about that? Well, you can imagine an open source project to be a bit like a community — people have decided they agree with the mission of the project, and have decided to contribute to the project over potentially many years, putting a lot of time and effort into making sure the project stays great. Those contributions and discussions are then, to an extent, stored in the git history of a project, through the commit history. These commits are a way for the code authors to say a few things:
- I’m showing that I’m the author of the code I contributed
- I’m (to an extent) signing off on approving of the project’s mission
- I’m becoming part of the shared history of a project
But when a new user forks/clones a project, these contributions and history are copied over as well. And this makes sense: why wouldn’t we want to give credit to the authors who have added to the codebase, along with allowing us to understand the history of how the project evolved? Well, consider the following:
Above, I’ve forked the
material-ui repo, and added a new commit in the fork: “WE HATE BROCCOLI”. Because projects are a bit like a community, and my commit appears alongside the other authors of the project, does this show an implicit acceptance from the other authors that “hating broccoli” represents the project’s mission and community?
Of course not — the other authors have no idea about my forked version of the project, and even if they did, I’m sure they would not agree with what I wrote. But they also have no real say in the matter as well: there’s no ability (from what I understand) for them to strip their usernames / authorship from my codebase.
For large projects, this won’t matter — no one will ever find my forked project, or use it. But what if the fork or clone of the project becomes larger than the original one it cloned? Imagine my fork became the largest project dedicated to hate of broccoli on the web, and people started looking through the git history, and started saying things like “Oh, I didn’t realize [author from original repo] hates broccoli, that makes me think of them in a different light”. What will be that author’s recourse? From what I can tell, their recourse is non-existent.
Should we consider the rights of these authors to disavow from projects if they wish? If we do, will this break the core concepts of git and open source? Food for thought.