Python developers are the most giving
GitHub, you probably know, has quickly become the main (or, at least, the most well-known) repository for open source software. It currently hosts millions of code repositories for hundreds of programming languages supported by several million users. The sheer volume of activity on GitHub reflects the growing popularity of open source software and the commitment of many people to working together to improve code across many different programming languages.
But I wondered recently if some programming languages tended to generate more open source contributions than others. Thanks to Google BigQuery, I was able to poke around raw GitHub Archive data myself to look into this further. Specifically, to try and quantify this, I looked at the average number of pull requests opened per GitHub repository by programming language. I thought that would be a good (but certainly not perfect) proxy for measuring the number of contributions (or attempted contributions) to a code base by someone other than the repository owner.
First, let me present the results.
Image credit: ITworld/Phil Johnson
Now, here's my methodology
As mentioned, I queried the GitHub Archive using Google BigQuery which, at the time, covered (roughly) GitHub activity from March 11, 2011 through March 14, 2014.
First, I queried the number of (non-forked) repositories per programming language using the following:
SELECT repository_language, count(distinct repository_url) as cnt
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
group by repository_language;This gave me results for 150 programming languages covering over 4.1 million repositories (I ignored repositories with no programming language specified), for an average of 27,473 repositories per language.
I then queried the number of pull requests opened per programming language using the following:
SELECT repository_language, count(*) as cnt
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type="PullRequestEvent" and payload_action="opened"
group by repository_language;Again ignoring repositories without a programming language specified, this gave me a total of just under 2.8 million pull requests across the 150 programming languages, for an average of 20,567 pull requests per language.
Overall, for the 4.1 million repositories with a programming language specified, the average number of pull requests opened per repository was .67.
From the results, then, we see that Python repositories are generating the most pull requests, on average, with .94 per repository. It's interesting that, using this measure, the Python community is the most giving, though the language currently only ranks as the eighth most popular programming language based on the latest TIOBE rankings. So, the Python community may be smaller than, say the Java community (#2 on the TIOBE list), but this suggests it's a tighter group.
After Python, we see that PHP (.83), CoffeeScript (.78), JavaScript (.74) and C++ (.74) also generate an above average number of code contributions. Of these languages, C++ had the highest TIOBE ranking (#4). The top 3 languages on the current TIOBE list all scored below average on the number of pull requests: C (.65 pull requests/repository, #1 on TIOBE), Java (.58, #2) and Objective-C (.41, #3).
Does this really mean that the Python community is more helpful than other language communities? Not necessarily, of course. Using the number of GitHub pull requests as a proxy for measuring outside (non-repository owner) contributions is far from perfect. Pull requests can come from outsiders forking and updating a repository, or from other project owners working on the same project but using a shared repository model. Maybe Python developers are more likely to use shared repositories and pull requests as development methodology.
Still, I think the results are interesting and suggest that, if you want to choose a programming language for a project that has a large, active and helpful community of developers behind it, you could do a lot worse than Python.
You must Sign up as a member of Effecthub to view the content.
A PHP Error was encountered
Severity: Notice
Message: Undefined index: HTTP_ACCEPT_LANGUAGE
Filename: helpers/time_helper.php
Line Number: 22
1659 views 0 comments
You must Sign up as a member of Effecthub to join the conversation.