At scale, everything breaks.

I don’t fancy myself a developer or a sysadmin. I pay people far more talented than me to wear those titles. However I do consider myself a steadfast hacker, CEO, customer support agent, biz dev, janitor, bookeeper….etc… Entrepreneur.

Our managed WordPress hosting service Pagely, will be 2 years old in September. We have bootstrapped this bad boy from 1 server to 26. From our first 100 customers I knew by name, to thousands that include universities, SV based startups, Fortune 500’s, state and local agencies, and everything in between. Through all of that there is 1 indisputable fact: At scale, everything breaks. (heard first from my good friend Joe Stump).

We are on the last stretch of a system upgrade we started 4 months ago. Entirely new server stack, new load balancers, new SANs, new provisioning and maintenance code. All of it happening behind the scenes. I use the analogy of the Boston big dig; we are building a new system under the existing one and trying not to disrupt the day-to-day operation of our customers and business. Like any major construction project there are sometimes road closures, and detours. It has not gone as smooth as we like. Seemingly minor performance issues on 1 site; at scale become bottlenecks system wide.

Example:

WordPress is a great platform that is easy to develop on. 15,000 or so plugins have been contributed. Pick a plugin, any plugin, and you have a 15-25% chance of getting one that is a mysql life suck. In context of 1 site, the fact it does not join on an index, or for that matter does not even have an index defined it appears to affect 1 site owner and a handful of readers. Now move up the stack. In the context of page.ly where that common plugin may be used across thousands of WordPress sites you can watch a handful of mysql slaves work breathlessly to process these queries. Now it affects everyone.

You cant see the forest through the trees

We can add.. you cannot see the performance drain of your plugin until you see it used across a few thousands sites. We know which popular plugins bring mysql to it’s knees and we know which commercial/premium WordPress themes break tinymce and users complain about most. Stats like this are just the icing on a data stack of WordPress usage we see, and have to deal with.

Aside: Stats/Analytics plugins are the worst offenders. Services like Cloudflare and Google’s new product just mask the performance issue with caching.

We look forward to sharing some of the data we see with the community and helping authors to improve their plugins/themes for the greater good.

Moral of the Story:

Shit breaks sometimes. It breaks more often at scale when the small issues are magnified exponentially.

Gratuitous Pagely 3.0 Update:

The move to the new database master/slaves setup is complete. All sites are now using hyperDB and further mysql traffic shaping happens via our load balancers. This week we have been syncing a few millions files from our old disks to the new ones getting us closer to the switch over. Thank you again for your patience.

2 Comments

Leave a Reply to Rob Sandie Cancel reply

  1. Rob Sandie
    Rob Sandie

    Great work making the move!!

    Which plugin is the worst in terms of performance? That would be interesting to see.

    Reply

    1. Mark Brown
      Mark Brown

      +1. I also would LOVE to see which plug-ins break at scale. Info on themes would be interesting too.