Our first “Office Hours” with CTO Joshua Eichorn

In an effort to give you opportunity for deeper insight into some of the more commonly-asked technical pre-sales questions, we’re trying something new. We held our first ever “office hours” call the other day and had a casual “on air” conversation with our CTO Joshua Eichorn during which he addressed a handful of topics. This is an experiment in transparency and providing answers via conversational audio vs. written text.

Joshua answered the following three questions:

  1. What does your high availability scheme look like?
  2. How do you handle backups?
  3. Will moving to Pagely speed up my site, and if so, how much will it help?

Here is the audio capture of that call:

And here is the transcription of the call:

Sean: All right. For anyone tuning in, we had a office hours scheduled today. We had one fellow that was on the line. But it looks we had some audio difficulty and lost him there. We’re going to try this as an experiment and record some questions here. We’ve got Joshua Eichorn, he’s our CTO, on the line. He’s gonna basically answer a few of these questions that I’ve encountered on a lot of the calls I’ve had with folks. We’ll go ahead and get started here. I’ll feed these questions to Josh, one at a time. This is an experiment. We’ll see how this works for answering y’alls questions.So Josh, yeah. Thanks for taking the time. The first one we’ve got here. What does your high availability scheme look like?

Josh: Alright. So when we’re doing high availability you have to think of two pieces. There’s the web component and there’s the database component. For the web servers, the high availability plans start at two servers. So the way that works is, you’ll end up with two web servers, each one in different availability zones. The way availability zone works is, with AWS you have regions like Virginia on the East Coast or Oregon on the West Coast. And inside of that there’s always at least three different availability zones.

Those zones are separate buildings with separate power but still close enough that you have high-speed networking between them. You end up with two web servers, one in each availability zone. One is the master server. That means all of the changes that you make to the file system are uploaded there. So if you’re uploading wpadmin it goes to that server or if you’re using SNP to update your theme or any of the WordPress admin stuff happens on that server. And then both servers respond to your standard requests from users. SO if the primary server goes down you won’t be able to do a new post, but your traffic would still be up. If the secondary server goes down then both sets of functionality are still up.

The second part of it is on the database side. So our default setup with that is to use the Amazon RDS server in multi-availability zone mode. The way that works is you have a primary database server and a secondary shadow server. And each of those run different availability zones and your data’s in each availability zone as well. Then if the primary crashes or when you do an update, we do fail-over from the primary to secondary database server. That fail-over normally takes 15 or 20 seconds.

We also now offer the option of Amazon Aurora Databases. They’re a very similar concept as the MySQL RDS servers. The biggest difference there is that the back-up server is always active. So we can do a fail-over a little bit quicker in that case. Normally a fail-over takes less than ten seconds in Amazon or up. In all cases, for both the web servers and the database, the big things availability, are the pair of servers running in different availability zones. So even if there’s a major outage at Amazon, you would still be up.

Sean: Gotcha. A follow-up question to that I’ve heard asked a couple times. Let’s say they start on a VPS-3 with us, which is the load-balanced pair, if they wanna throw another web node in that rotation, is there downtime associated with that move? Or how does that work? How do they go about adding new web nodes?

Josh: Right. So, yeah. There’s no downtime to add a new web node. The way it works, we add the additional web node. And then sync all the data from the primary node. Then once it’s fully set-up, we can add it to the rotation. I guess I didn’t mention this earlier. The rotations between the web servers can be managed in two ways. One is using Route 53’s DNS Management to do the application between the three servers. So the nicest thing about that is it that it allows for complex routing and the fact that one server can take 20% of the traffic and the other could take 80%. Which is very handy if you’re in the middle of an upgrade.

Let’s say you’re getting a huge span of traffic. We could upgrade the just the secondary server to two or three times its normal size and send the majority of traffic to that server. That would allow us to do upgrade without any downtime and then we can still do that split traffic. The only downside of that is if you’re not using Route 53 DNS, either on your own account or the one we provide, and you don’t want to use a subdomain on your website. Then there’s a limited number of other DNS providers you could do that will work correctly with the set-up.

Actually, that downside’s with every solution we have. The other option is you use elastic load balancers. That is an additional piece of hardware sitting between your customers and the web servers. Really the downside of that is there’s less flexibility in how they can route the traffic. It’s always routed the same to all the servers. Also, there’s s0ome SSL limitations when you’re using an elastic load balancer. You can’t run multiple SSL certs on it using SSL S9. So if you have to run a bunch of SSL, the only option is to use the DNS-based load balancing.

Sean: The irony there is that the elastic load balancer is not as flexible as the non-elastic load balancer. Another corollary question here. If they start on a VPS-1 and they wanna move to a load balance configuration, is it possible to make that transition where there is no downtime?

Josh: That is correct. There wouldn’t be any downtime. All we would do is setup the second server and then make the DNS changes. While the DNS is updating all the traffic would still just be going to the one server. Assuming that one server isn’t overloaded when making the change, then you won’t see any downtime.

Sean: We’re going to go on to the next question. How do you handle backups?

Josh: Our backup process is pretty simple. Every 24 hours we run a backup. In that case, we’re gonna backup all of your files including your images into a standard tar file. Then we upload that to Amazon S3. We also do a MySQL dump of your database and upload that to S3. In both of those cases, we can also upload it to your S3 bucket. We keep the backups around for 14 days. If you’re using your own S3 bucket, you can use whatever retention policy you want. S3 makes it really easy to just say expire these files after x amount of time. Or even if you want to keep them long term you can tell them to switch over to Amazon Glacier after a month. Which makes sense if you want to keep backups for six months, a year, or something like that.

Sean: Cool. Then the format of that is not some proprietary Pagely format. Those are just straight up…

Josh: Those are standard MySQL dump files and tar files. If you want to use them to recreate your environment on your dev, you can do that. There’s no special tool required.

Sean: Let’s do just one more question here. Again, this is kind of an experiment. So we’ll keep it short and sweet this time. But the question I commonly get asked, will moving to Pagely speed up my site? And if so, how can I tell how much? What do you say to that one?

Josh: Right. That’s like any complex question, it depends. So there’s a couple different components to speed. When you’re looking at WordPress, the first component is, how long does it take to generate the page in PHP. So if you were using a monitoring tool that shows you a waterfall of your downloads, like Pingdom offers a free one. Then you would see that as a time to first byte on the very first request or on any subsequent ajax request. That’s your server time. That’s the biggest thing that upgrading to Pagely will change.

We have a very fast PHP set-up. So if the pages are served dynamically, they should be faster than your current set-up. And then on top of that, we have a very fast full page caching layer that generally takes most traffic to your site. That will be much, much faster than any set-up without caching or any set-up that uses something like W3 Total-Cache for caching. The second part is that might not be the reason that your site is slow. It might be loading all sorts of front end assets, CSS, JavaScript. Commonly, slowness is actually from third party content. Add networks on your site, all the various widgets, from Twitter or Facebook, are often what make your page load slowly.

So if its assets under your control, switching the page, we might speed them up. We provide a CDN with all of our plans and that can speed up. Also, our support staff can help you do further optimizations. But if your slowness is coming from third party sites, then obviously switching to Pagely won’t change that. It’ll still load the same third parties that are slow. Really the only thing you can do in those cases is stop using those third parties.

Sean: Gotcha. I think that’s a critical lesson here. We account for a certain chunk of the pie in terms of the speed of your site. We only affect the chunk which we account for. I think you nailed it. That’s a critical thing for folks to understand.

Josh: From the web hosting platform versus third parties or from downloading assets to the client, it really depends on what type of site you have and what optimizations you’ve already done.

Sean: While we’re on this topic. Can you just comment on the option for folks where they have a global presence and they wanna run a cache node that physically puts the site closer to some of their audience?

Josh: Right, so that’s the other option that we offer. We do caching of that HTML content in cache nodes throughout the world. We offer cache nodes in the East Coast of the US, the West Coast, in Ireland, in Europe, and in Singapore, in Asia. That gives you a smart cache node that knows about WordPress. That can be matched with the CDN so that content is always being downloaded from close to the user to lower those network latencies. Our only requirement, in that case, is you have to be using a CNAME distributor site or our Route 53 Managed DNS.

Sean: In the case of a cache node, I’m assuming it works the same way as it does with VPS, where the rights only occur on the main server?

Josh: Right. In the case of a cache node, your site is served still from just one location. That could be from a Share Planet Pagely or it could be from a VPS at Pagely. Like I said, we have this full page caching layer. That’s what’s running out on the geographical disbursed edge cache nodes. So when the page the page can be served from cache, so for most sites that most traffic except for admin traffic. If you’re a commerce site, that’s all your customers who haven’t added anything to their cart yet. It can depend on how much your traffic has logged in versus logged out. For all the traffic that can be cached, it’s being served directly from that edge.

Sean: It almost sounds like the same way that a CDN pushes the static assets closer to the edge, this is almost a way to run WordPress closer to the edge in a sense.

Josh: Right, that’s correct. You can think of this actually as a specialized CDN that knows about WordPress. Our cache nodes know about Woocommerce so if you add something to your cart, we can’t cache anymore. And we know about hundreds of WordPress plugins, how they interact with the cache node, and how they know to do the right thing. Even if that plugin isn’t doing the sorts of thing that would make a standard CDN work correctly.

Sean: Cool. Well Josh, thanks for your time. If you’re listening and you have questions, go ahead and submit those to us. We’re gonna see about making this recurring, almost like a radio show I guess, where you can submit your questions ahead of time and we’ll answer them. We’ll have Josh be able to give you some real in-depth audio based answers and we’re just gonna see how this goes. So thank you Josh, and thank you, anyone who was listening. We’ll talk to you soon.

The theme of our next office hours session will be centered around different workflow options for staging and versioning of code & content in WordPress. We’ll be digging into Git-based workflows, using Capistrano in conjunction with Git, plugin options for staging content and doing near real-time backups.  If you have a question you’d like to see covered or you’d like to participate on the live call, leave a comment below.

And let us know what you think of this office hours initiative- useful? What other topics would you like us to address on future calls?