Since 2012, LLVM has relied on its self-hosted Phabricator instance (on Google Cloud Platform) for code review, but now it's making a transition to GitHub pull requests. In this post, I'll share my perspective on this switch, highlighting GitHub offers significant benefits in some areas while having major drawbacks in the review process.
I may update this article as the process stabilizes further.
Transition to GitHub pull requests
The move to GitHub pull requests has been a topic of discussion over the past few years. Several lengthy threads on the subject have emerged:
- November 2019 Enable Contributions Through Pull-request For LLVM
- January 2020 Phabricator -> GitHub PRs?
- November 2020 Notes from GitHub Pull Requests round table
- August 2021 Phabricator Creator Pulling the Plug
- June 2022 Update on GitHub pull requests
- November 2022 Pull Request Progress Update
This transition could very well be the most contentious infrastructure change in LLVM's history. If you have the patience to delve into the discussions within the mentioned threads, you'll come across numerous remarks critiquing GitHub pull requests as being subpar. I'd like to express my gratitude to Joerg Sonnenberger for sharing a post by Gregory Szorc titled Problems with Pull Requests and How to Fix Them, which made a great analysis comparing several code review tools.
Nevertheless, a decision has been made, and the targeted transition date was set for September 1, 2023. The negotiation during the decision-making process could have been handled more strategically to mitigate some confusion.
On September 1, 2023 (or 2nd on some parts of the world), the pull request lockdown was completely removed.
Accessibility
In general, I believe that the majority of contributors consider GitHub to offer better accessibility. However, I have heard quite a few opinions suggesting that GitHub's code review capability is significantly worse than that of Phabricator. I will delve into this topic further later in this post.
Having contributed patches to more than 200 projects, many of which are one-off and even trivial, I genuinely appreciate it when a project uses GitHub pull requests. This is because I am already familiar with the system. On the other hand, if a project relies on a self-hosted code review website, I might find it less convenient as I'm not particularly keen on registering a username on a website I may never visit again. Even worse, I might need to invest time in getting acquainted with the system if I haven't encountered similar instances before.
The same argument applies to LLVM's self-hosted Phabricator instance. Many contributors have not used Phabricator before and would consider both the website and the command-line tool (https://github.com/phacility/arcanist), to be challenging to use. In general, I believe that GitHub is more contributor-friendly but perhaps not as reviewer-friendly.
Automation
GitHub provides GitHub Apps and GitHub Actions to extend its functionality. With these, we can automate pull request labelling, testing, code analysis, code coverage, and potentially even a merge queue in the future. Some fantastic tooling is available for free and widely used by other open source projects, making it easy to emulate how other projects have set up automation.
Phabricator can also handle automation, but there are far fewer resources available for it. LLVM's self-hosted Phabricator instance, for instance, relies on https://github.com/google/llvm-premerge-checks and Buildkite.
Patch subscription
The llvm-project repository is vast. With a code frequency of 100+ commits every day, it's practically impossible for anyone to monitor every new commit. Nonetheless, many people wish to stay informed about changes to specific components, making patch subscription essential.
One way to achieve this is through mailing lists, such as llvm-commits. This list contains emails about new pull requests, edits, GitHub actions, labelling, resolved issues, and more, making it quite noisy.
The other method is to utilize the code review tool, formerly Phabricator and now GitHub. With Phabricator, users can set up fairly complex subscription rules known as Herald. When a patch title, description, affected files, or the acting user matches certain criteria, you can take actions like adding yourself as a reviewer/subscriber or sending a one-off email.
GitHub, however, is less flexible in this regard. Individual users can choose to watch all pull requests, but they can't do so selectively. Interestingly, users can watch issues with a specific label but not pull requests with a specific label.
To enable component-based subscription, the llvm organization on
GitHub has created multiple pr-subscribers-*
teams, which
users can freely join them. ( Note:
A maintainer is supposed to accept every join request. Unfortunately,
GitHub only displays the "pending request" information on the
https://github.com/orgs/llvm/teams/pr-subscribers-*
page
and not on any other page. It's not reasonable to expect a maintainer to
routinely check the
https://github.com/orgs/llvm/teams/pr-subscribers-*
pages
for pending join requests. So if they miss the email notification that
says would like to join "LLVM"
, the request may remain
pending indefinitely. )
Then we use label-based
subscription. A GitHub action is set up to label every pull
request, and these labels are used to notify
pr-subscribers-*
teams. For example, a pull request
affecting clang/lib/Driver/XX
will receive the labels
clang
and clang:driver
, and the
pr-subscribers-clang
and
pr-subscribers-clang:driver
teams will be notified.
.github/CODEOWNERS
Previously, we
added pr-subscribers-*
teams to
.github/CODEOWNERS
. Due to GitHub's CODEOWNERS
mechanism, the pr-subscribers-clang
team would be added as
a reviewer when a pull request affecting clang/xx
was
created. pr-subscribers-clang
members would receive an
email notification about the pull request with a full diff.
However, a complication arose when a member of the
pr-subscribers-*
team approved a change. It resulted in a
message saying
$user approved these changes on behalf of llvm/pr-subscribers-xx
,
which could be misleading if the user did not wish to assume such
authority. In addition, the team was automatically removoed as a team
reviewer, adding to the confusion. This use case wasn't in line with
GitHub's intended functionality, and there was a risk that GitHub's
future changes might disrupt our workflow.
Email filtering
Filtering is another crucial aspect of managing notifications. GitHub supports numerous notification reasons.
I am still in the process to make a curated list of Gmail filters. Here are some filters I currently use:
- Apply label "Mention":
from:([email protected]) to:([email protected])
- Apply label "Team mention":
from:([email protected]) to:([email protected])
- Apply label "Review request":
to:([email protected])
- Mark as read:
to:([email protected])
- Skip Inbox, Apply label "llvm/issues":
from:([email protected]) subject:(llvm-project "Issue #")
Code review
Patch evolution
In Phabricator, the body a differential (Phabricator's term for a
patch) contains is a patch file. The patch file is based on a specific
commit, but Phabricator is not required to know the base commit. A
stable identifier,
Differential Revision: https://reviews.llvm.org/Dxxxxx
, in
the commit message connects a patch file to a differential. When you
amend a patch, Phabricator recognizes that the differential has evolved
from patch X to patch Y. The user interface allows for comparisons
between any two revisions associated with a differential. Additionally,
review comments are confidently associated with the source line.
On the other hand, GitHub structures the concept of pull requests around branches and enforces a branch-centric workflow. A pull request centers on the difference (commits) between the base branch and the feature branch. GitHub does not employ a stable identifier for commit tracking. If commits are rebased, reordered, or combined, GitHub can easily become confused.
When you force-push a branch after a rebase, the user interface displays
a line such as "force-pushed the BB branch from X to Y". Clicking
the "compare" button in GitHub presents X..Y
, which
includes unrelated commits. Ideally, GitHub would show the difference
between the two patch files, as Phabricator does, but it only displays
the difference between the two head commits. These unrelated in-between
commits might be acceptable for projects with lower commit frequency but
can be challenging for a project with a code frequency of 100+ commits
every day.
The fidelity of preserving inline comments after a force push has always been a weakness. In the past, there was a notorious "lost inline comment" problem. Nowadays, the situation has improved, but some users still report that inline comments may occasionally become misplaced.
Due to the difficulties in comparing revisions and the lack of confidence in preserving inline comments, some recommendations suggest adopting less flexible and less preferred workflows, which involve only appending new commits and discouraging rebases. This approach sometimes results in a cluttered commit history, with commit messages like "fix typo," "add test," and "fix ci."
In a large repository, avoiding rebases may not be realistic due to other commits frequently modifying nearly identical lines. When working with both the latest main branch and the pull request branch, switching between branches results in numerous rebuilds.
Commit message
GitHub's repository setting allows three options for pull requests: "Allow merge commits", "Allow squash merging", and "Allow rebase merging".
The default effect is quite impactful.
- Many projects utilize merge commits when a linear history is preferred, even for just a single commit.
- When "Squash and merge" is selected, the default commit message becomes a concatenation of all commits, often resulting in a message filled with "fix typo," "add test," and "fix ci".
In 2022, GitHub finally introduced an option called "Pull request title and description" for the "Allow squash merging" option. This new option mitigates some problems.
Patch series
I am not well-versed in reviewing patch series on GitHub, but this is a widely acknowledged pain point. Numerous projects are exploring ways to alleviate this issue.
Miscellaneous
Pros
- Fetching a pull request is more straightforward.
git fetch origin pull/xx/head:local_branch
. Phabricator allows the less-known method:curl -L 'https://reviews.llvm.org/Dxxxxx?download=1' | patch -p1
@mention
applies to every user on GitHub, which is convenient when you need to seek for a user's opinions. You cannot expect every user to have an account on a self-hosted instance.gh
is more powerful thanarcanist
.
Cons
- Fetching a pull request at an old revision can be challenging.
- The narrow display can be somewhat inconvenient.
- The loss of side-by-side diff.
- Viewing diff and conversations requires switching between tabs.
- Collapsed comments are challenging to locate and often require multiple clicks, makeing it difficult for reviewers to confirm whether a comment has been resolved.
- One cannot comment on a source line a few lines out of the affected lines.
- The inability to search for lines in nearby context when they are not displayed.
I've noticed that my review productivity has decreased, a sentiment shared by many others. It's disheartening that alternative solutions haven't been thoroughly considered. However, we must move forward and adapt to the new workflow while striving to mitigate the loss in productivity.
I hope that future iterations of GitHub will incorporate some ideas from Pull Request feature requests for GitHub #56635.
I've voiced numerous concerns regarding GitHub pull requests, and for that, I apologize. It's essential to acknowledge that GitHub contributes significantly to open source projects in many positive ways. My intention in sharing these concerns is to express a genuine hope that GitHub pull requests can be enhanced to better support large projects.
I would also like to express my gratitude to Phorge, a community-driven fork of Phabricator, for their dedication and contributions, even as LLVM decided to migrate to the free but proprietary solution provided by GitHub. Phorge's commitment to providing alternatives and nurturing an open-source community for organizations that favor self-hosted solutions is truly commendable.
Future of reviews.llvm.org
On September 5, 2023, I added a red banner to reviews.llvm.org to discourage new patches.
Transitioning existing differentials to GitHub pull requests could potentially cause disruption. Resurrecting old patches is a practice that people regularly engage in. As per the current schedule, new differentials will be disallowed on October 1, 2023.
It is anticipated that at some point next year reviews.llvm.org
will become a read-only website. To preserve /Dxxxxx
pages (referenced by many commits), we can utilize phab-archive
from Mercurial.
As activities on Phabricator wind down, maintenance should become more lightweight.
In the past two weeks, there have been different IP addresses
crawling /source/llvm-github/
pages. It very like a botnet
as adjacent files are visited by IP addresses from very different
autonomous systems. Such a visit will cause Phabricator to spawn a
process like
git log --skip=0 -n 30 --pretty=format:%H:%P 988a16af929ece9453622ea256911cdfdf079d47 -- llvm/lib/Demangle/ItaniumDemangle.cpp
that takes a few seconds (llvm-project is huge). I have redirected some
pages to https://github.com/llvm/llvm-project/.
On September 8, 2023, I resized the database disk to 850GB to fix a full disk issue. Hopefully, we won't need to resize the disk again!