Crawling the top 15,000 Drupal websites: 2016 edition

This is an update to last year's blog post.

The source was the same as last time, 1,000,000 websites from Alexa, dumped on 4th of January 2016.

Out of those websites, I was able to recognize 217,000 websites as five of the most popular CMS's. I used a PHP library which detects CMS's. It's my project on Github, so feel free to contribute!

Why did I do this? I was curious about what are the markets shares of the top CMS's. But my main reason was to look into Drupal versions, to see how many of them are kept updated and how many are still vulnerable to the Drupalgeddon bug.

CMS Market shares

This shouldn't come as a surprise, but people STILL really love Wordpress! Their install base has increased massively from last year. The sad news is that Drupal installs had actually decreased a bit, from 14.5k to 13.3k. This year I had vBulletin and Liferay as newcomers in the chart (Liferay is hard to see, I know, but the number is 267.)

 

Most popular Drupal versions

My script recognized 13,279 different Drupal websites running 60 different versions. Here are the 5 most popular versions.

From this we can gather that people are really eager to update to the latest 7 version, which is great! I was expecting far more older versions. You can also see that there still are quite a lot of 6.x sites around.

Latest Drupal versions

 

The 7.41 version dominates, which is great.

Vulnerable to Drupalgeddon

Way more than a year later, are sites still vulnerable to Drupalgeddon bug?

 

Yep, around 10% still, that's a thousand websites!

Please note: Drupal version is not the best way of determining the vulnerability. You can patch your Drupal against Drupalgeddon bug, which will not update the version number. So some older Drupal websites could still be protected against the bug.

Other findings

Only one website was running 5.x, but five were running 4.x!

Really interestingly, no 8.x versions were found. I was expecting at least a couple of them, honestly. I know that the file structure had changed a bit, so I did that change to my script but still, no matches. Let's see what's the situation in a year.

Quite a lot of websites where I couldn't determine the Drupal version, over 3000. This is mostly because they did not have their CHANGELOG.txt crawlable. I could recognize them as Drupal though, usually due to headers or metatags.

What's the most popular Drupal website? Just after I did my blog post last year, Weather.com was relaunched as a Drupal website and some people speculated that it would be the most popular website. And sure enough, it is. The top five is:

  1. Weather.com
  2. Taboola.com
  3. Nih.gov
  4. Independent.co.uk

It took my home server a week to crawl through all the websites and a day to determine the Drupal versions.

CSV Download

Last time people were interested in a downloadable file of the top Drupal websites, so here they are:

/sites/polso.info/files/alexa-drupal-2016-01-18.csv

Comments

DA's picture
Mon, 01/18/2016 - 15:10

"no 8.x versions were found" ... isn't www.drupal.org running on D8?
At least mine are Drupal 8 versions.

Really interesting analysis and thanks for sharing it.

I am curious about something else. Would the market share change significantly if you consider only the top 10000 sites, or the top 100K?

I've been running some stats on your data and the Drupal share of total sites -that's the only thing I can get from the csv file- does change, actually doubles, if you consider the top 1000 sites agains the top 100K (0.9% v 1.73%).

My hypotheis would be that Drupal could have a bigger share considering only more "important" sites, would appreciate your help for either confirming or discarding it.

Thanks.

Vlad's picture
Vlad
Tue, 01/19/2016 - 11:54

Hi,
Interesting there isn't any site on D8...
Could you share the version of Drupal detected in the CSV?

Kristian Polso
Wed, 01/20/2016 - 20:37

Sorry, out of security reasons I don't want to share the versions. Although they are pretty easy to determine, usually from CHANGELOG.txt.

Damien McKenna's picture
Damien McKenna
Thu, 01/21/2016 - 23:26

I'm seeing a JS error on the page and the charts aren't loading?

Kristian Polso
Wed, 02/03/2016 - 20:16

Thanks for reporting this! Syntaxhighlighter module does that from time to time, it's a pain.

Kristian Polso
Wed, 02/03/2016 - 20:17

Hey,

I checked my database, looks like your website wasn't in the top 1 million websites!

David's picture
David
Wed, 06/29/2016 - 23:21

Hello Kristian,

Thanks for sharing this work.

I am working onto something very similar for my college project. I was able to have a working script for detecting whether the site uses drupal or not.

Can you help by telling how you automated finding the version number?

Kristian Polso
Thu, 06/30/2016 - 09:54

Hey,

Thanks for your comment!

The easiest way to to find the version number is to look for the CHANGELOG.txt file.

In Drupal 6 / 7, it's at /CHANGELOG.txt, line 2, so for example http://vaiste.com/CHANGELOG.txt
In Drupal 8, it's at /core/CHANGELOG.txt, line 1, so for example http://druid.fi/core/CHANGELOG.txt

Add new comment