Crawling the top 15,000 Drupal websites: 2016 edition
This is an update to last year's blog post.
The source was the same as last time, 1,000,000 websites from Alexa, dumped on 4th of January 2016.
Out of those websites, I was able to recognize 217,000 websites as five of the most popular CMS's. I used a PHP library which detects CMS's. It's my project on Github, so feel free to contribute!
Why did I do this? I was curious about what are the markets shares of the top CMS's. But my main reason was to look into Drupal versions, to see how many of them are kept updated and how many are still vulnerable to the Drupalgeddon bug.
CMS Market shares
This shouldn't come as a surprise, but people STILL really love Wordpress! Their install base has increased massively from last year. The sad news is that Drupal installs had actually decreased a bit, from 14.5k to 13.3k. This year I had vBulletin and Liferay as newcomers in the chart (Liferay is hard to see, I know, but the number is 267.)
Most popular Drupal versions
My script recognized 13,279 different Drupal websites running 60 different versions. Here are the 5 most popular versions.
From this we can gather that people are really eager to update to the latest 7 version, which is great! I was expecting far more older versions. You can also see that there still are quite a lot of 6.x sites around.
Latest Drupal versions
The 7.41 version dominates, which is great.
Vulnerable to Drupalgeddon
Way more than a year later, are sites still vulnerable to Drupalgeddon bug?
Yep, around 10% still, that's a thousand websites!
Please note: Drupal version is not the best way of determining the vulnerability. You can patch your Drupal against Drupalgeddon bug, which will not update the version number. So some older Drupal websites could still be protected against the bug.
Other findings
Only one website was running 5.x, but five were running 4.x!
Really interestingly, no 8.x versions were found. I was expecting at least a couple of them, honestly. I know that the file structure had changed a bit, so I did that change to my script but still, no matches. Let's see what's the situation in a year.
Quite a lot of websites where I couldn't determine the Drupal version, over 3000. This is mostly because they did not have their CHANGELOG.txt crawlable. I could recognize them as Drupal though, usually due to headers or metatags.
What's the most popular Drupal website? Just after I did my blog post last year, Weather.com was relaunched as a Drupal website and some people speculated that it would be the most popular website. And sure enough, it is. The top five is:
It took my home server a week to crawl through all the websites and a day to determine the Drupal versions.
CSV Download
Last time people were interested in a downloadable file of the top Drupal websites, so here they are:
Comments
"no 8.x versions were found" ... isn't www.drupal.org running on D8?
At least mine are Drupal 8 versions.
Nope, they while ago updated it to Drupal 7. https://www.drupal.org/CHANGELOG.txt
Really interesting analysis and thanks for sharing it.
I am curious about something else. Would the market share change significantly if you consider only the top 10000 sites, or the top 100K?
I've been running some stats on your data and the Drupal share of total sites -that's the only thing I can get from the csv file- does change, actually doubles, if you consider the top 1000 sites agains the top 100K (0.9% v 1.73%).
My hypotheis would be that Drupal could have a bigger share considering only more "important" sites, would appreciate your help for either confirming or discarding it.
Thanks.
Hi,
Interesting there isn't any site on D8...
Could you share the version of Drupal detected in the CSV?
Sorry, out of security reasons I don't want to share the versions. Although they are pretty easy to determine, usually from CHANGELOG.txt.
I'm seeing a JS error on the page and the charts aren't loading?
Thanks for reporting this! Syntaxhighlighter module does that from time to time, it's a pain.
Hello Kristian,
A very interesting post. Thanks for sharing.
I noticed that my website didn't come up on the list. Any ideas? The website is hosted on pantheon.
http://paulbooker.co.uk/CHANGELOG.txt
Best, Paul
Hey,
I checked my database, looks like your website wasn't in the top 1 million websites!
Hello Kristian,
Thanks for sharing this work.
I am working onto something very similar for my college project. I was able to have a working script for detecting whether the site uses drupal or not.
Can you help by telling how you automated finding the version number?
Hey,
Thanks for your comment!
The easiest way to to find the version number is to look for the CHANGELOG.txt file.
In Drupal 6 / 7, it's at /CHANGELOG.txt, line 2, so for example http://vaiste.com/CHANGELOG.txt
In Drupal 8, it's at /core/CHANGELOG.txt, line 1, so for example http://druid.fi/core/CHANGELOG.txt
Weather.com good
Hello Kristian,
Any plans to generate another list of top Drupal websites? Would be interesting to see where Drupal is 5 years later.
Best, Paul
Best, Paul
Add new comment