More and more open-source software is taking over key positions in the supply chain in various domains. More and more programmers are proud to contribute to well-known open-source software, regarding open-source contributions and titles earned in the open-source community as a prominent part of their resume.
In this context, HR in technology-driven companies also needs to understand the methods of finding talents from the open-source community. If the company's business depends on open-source software development, or even if the company is directly involved in the open-source community, then HR also needs to understand the basic process of how open-source projects develop.
In the last three years, I have been meeting with HR people asking how to solve these two problems. Currently, most open-source projects host their code on GitHub. This article answers both questions from what I've come across, starting with the capabilities GitHub offers and the common patterns of project collaboration on GitHub.
Finding Talent #
Since GitHub doesn't have an explicit endorsement mechanism like LinkedIn, HR can't simply filter by tags to find candidates who fit the requirements. The most appropriate filter is to go to the appropriate repository to find talents from active contributors.
The first step is to locate the appropriate repositories.
If the business team has identified relevant open-source software, such as Storm, Flink, or Spark Streaming, then it's easy to navigate to the corresponding repository using a Google search using the keywords GitHub + the corresponding phrase. Filter the results with the github.com
domain.
Otherwise, if the business team only provides vague keywords, you need to discover the corresponding repository first. In general, it is better to feedback to the business team to re-provide candidate repositories, even if there is only one, you can find the reliable one from the code repository for their own tagging and click in to discover other similar repositories.
For example, if you want to recruit storage experts, if you know that FoundationDB is eligible, you can find a batch of related repositories from the tag: key-value-store/transactional/distributed-database. Then you can catch a group of experts.
If you want to do filtering more precisely or complexly, you need a data analysis team or an open-source community operations team to develop analysis tools. For example, based on ClickHouse's GitHub event public dataset, you can select the most closely related projects using the commits as a dimension. Here's an example of how to find other projects that are related to the Apache Flink project.
In the second step, discover active contributors from the Contributors page of the repository.
For example, TiDB's Contributors page shows that @tiancaiamao, @coocood, @zimulala, and @crazycs520 have the most commits. GitHub Contributors page shows up to 100 contributors. While there are other means of listing all participants with code commits, the top 100 should be sufficient for selecting candidates.
The default sorting counts all events in the history. HR may focus more on developers who are still active on the project recently, and this can be filtered by dragging the mouse over a period of time in the timeline in the chart above. The image below shows the most active participants from 2021 to the present You can see that after @coocood's focus shifted, you don't see him in the top 100, and although I'm fourth on the graph, the specific curve shows that I haven't made any new commits in recent months compared to the top three.
The third step is to fully evaluate the selected candidates and find their contact information.
Now, we have found a group of qualified candidates. It's time to do some simple background research and find the candidates' contact information.
You can see a candidate's GitHub profile by clicking on their avatar or username on the Contributors page or just typing http://github.com/
+ username.
For example, if you want to hire a C++ engineer familiar with distributed systems development, you find @PragmaTwice from projects like Kvrocks or OneFlow, and then check @PragmaTwice's profile.
- His description writes, "Open to graduate job opportunities!”. It shows that he’s a new graduate looking for a job.
- Looking into his profile. It is carefully designed - at least not blank. The business team leader can review this for further filtering.
- The green bricks on the code submission wall below show that he has been active on GitHub for the past year.
- If you scroll down the screen above, you can see that 67% of his contributions are code commits, code reviews and PRs are around 15%, and issues are only 5%. This is a typical individual contributor, and most of the time, he pushes commits directly to the repository, so we can't tell if he has enough experience in teamwork.
- Finally, we look at the pinned projects; only projects the user has contributed can be listed here. @PragmaTwice has been involved in developing two well-known projects, OneFlow and Kvrocks. Also, his projects, protopuf and proxinject, earn a decent number of stars (Don't be misled by the notable projects, most open-source projects don't have many stars at all. The threshold set by Open Collective is 100 stars, which is already a relatively effective filter), you can click into these projects to filter out their actual participation and further analyze them.
The above lists a few perspectives to consider. Evaluating a candidate is not a rigid formula but often requires heuristic discovery of the candidate's highlights. Generally speaking, you can make certain judgments by looking at the projects he has worked on and the specific tasks he has completed. Since the identity on GitHub is only one facet of the candidate, I recommend that HR people treat it as one perspective of uncovering highlights rather than inferring what the candidate is weak from the GitHub activity.
Let me share a few other common evaluation tools and perspectives. To see the full history of the candidate’s activities, you can check the individual pages and filter year by year. However, it can be easier to list them directly by entering a username on the github-contributions site.
As you can see, I started to get involved in open-source communities in late 2017. Since joining Pinduoduo.com in May 2020, my activities have been about closed-source applications. Later after I joined PingCAP in early 2021, I started working on open-source software again. My activities get gradually recovered and surpassed the past. Of course, from HR's point of view, there is no way to know such detailed personal work changes. Still, we can make corresponding assumptions from the changes in activity and verify them to outline the candidate's work experience.
Another issue is recognizing the quality of the candidate's contribution.
If you want to see the activity of a GitHub user among all repositories, you can filter by searching from the following page.
This page is rich in search functionality. In addition to exploring it directly from the UI, you can also find more ways to filter it from the GitHub search documentation, so I won't expand on that here.
If you want to look specifically at a GitHub user's involvement in a particular repository, you can investigate using the following links, with the username section replaced accordingly.
- https://github.com/apache/pulsar/commits?author=tisonkun
- https://github.com/apache/pulsar/issues/created_by/tisonkun
- https://github.com/apache/pulsar/pulls/created_by/tisonkun
These links show the user's commits, issues, and PRs activity in the current repository. Again, richer filters can be found in the UI or the GitHub search documentation, mainly the involves
and reviewed-by
filters, which show issues the users participated in or patches the user reviewed, respectively.
Depending on HR's familiarity with the industry, these specific activities can be used to judge the candidates further. Suppose the user's commit is a typo fix. In that case, his involvement in this repository can be minimal, and the reputation of this project does not endorse the candidate.
Finally, HR needs to find the candidate's contact information to send an offer.
Some users (including me) show their personal email and Twitter account directly on their personal page, so they can easily send invitations.
In addition, users like @PragmaTwice, who are actively seeking employment, often leave their contact information in a prominent way. His personal page includes a code block:
1echo -n "My email address: " && echo QkVzzAyYQ0kMoVEH0mihz7zDbk6aalkDYvfnW1OaccM= | openssl enc -d -base64 | openssl enc -d -aes-128-cbc -iv 205731624 -K 230549126 2>/dev/null
Execute the command, you'll get:
1My email address: twice@apache.org
This is a typical way for programmers to introduce themselves, showing their hacker spirit on the one hand, and avoiding the harassment of bots that simply crawl personal information to send emails on the other.
This is a typical way for programmers to introduce themselves, showing their hacker spirit on the one hand, and avoiding the harassment of bots that purely crawl personal information to send emails on the other.
You'll have to use some techniques to dig in for users who don't have a public email address. I did this by cloning the repository where the candidate has committed and using the git log
command to look at the personal information on the commits submitted by the candidate. For example, one of my commits writes:
Translated with www.DeepL.com/Translator (free version)
commit 9053519c0b81b765919aad9a9695910580586ea1 (origin/main, origin/HEAD)
Author: tison <wander4096@gmail.com>
Date: Sun Aug 21 23:07:49 2022 +0800
post over-communication
Signed-off-by: tison <wander4096@gmail.com>
My email address is included.
Since the commit information in the git log is not necessarily related to the GitHub username, it takes some insight to correlate candidates with the author of a specific commit.
Understanding Project Management and Development Workflow #
(It's too long to translate the original version. Please read at https://www.tisonkun.org/2022/08/22/github-for-hrs/ and use Google Translation.)