Crawling the top 15,000 Drupal websites: 2016 edition
Kristian Polso • January 18, 2016
Drupal Planet Wordpress Joomla Security Drupalgeddon Drupal Vbulletin LiferayThis is an update to last year's blog post.
The source was the same as last time, 1,000,000 websites from Alexa, dumped on 4th of January 2016.
Out of those websites, I was able to recognize 217,000 websites as five of the most popular CMS's. I used a PHP library which detects CMS's. It's my project on Github, so feel free to contribute!
Why did I do this? I was curious about what are the markets shares of the top CMS's. But my main reason was to look into Drupal versions, to see how many of them are kept updated and how many are still vulnerable to the Drupalgeddon bug.
CMS Market shares
This shouldn't come as a surprise, but people STILL really love Wordpress! Their install base has increased massively from last year. The sad news is that Drupal installs had actually decreased a bit, from 14.5k to 13.3k. This year I had vBulletin and Liferay as newcomers in the chart (Liferay is hard to see, I know, but the number is 267.)
Most popular Drupal versions
My script recognized 13,279 different Drupal websites running 60 different versions. Here are the 5 most popular versions.
From this we can gather that people are really eager to update to the latest 7 version, which is great! I was expecting far more older versions. You can also see that there still are quite a lot of 6.x sites around.
Latest Drupal versions
The 7.41 version dominates, which is great.
Vulnerable to Drupalgeddon
Way more than a year later, are sites still vulnerable to Drupalgeddon bug?
Yep, around 10% still, that's a thousand websites!
Please note: Drupal version is not the best way of determining the vulnerability. You can patch your Drupal against Drupalgeddon bug, which will not update the version number. So some older Drupal websites could still be protected against the bug.
Other findings
Only one website was running 5.x, but five were running 4.x!
Really interestingly, no 8.x versions were found. I was expecting at least a couple of them, honestly. I know that the file structure had changed a bit, so I did that change to my script but still, no matches. Let's see what's the situation in a year.
Quite a lot of websites where I couldn't determine the Drupal version, over 3000. This is mostly because they did not have their CHANGELOG.txt crawlable. I could recognize them as Drupal though, usually due to headers or metatags.
What's the most popular Drupal website? Just after I did my blog post last year, Weather.com was relaunched as a Drupal website and some people speculated that it would be the most popular website. And sure enough, it is. The top five is:
It took my home server a week to crawl through all the websites and a day to determine the Drupal versions.
CSV Download
Last time people were interested in a downloadable file of the top Drupal websites, so here they are: