May 08, 2008

Scotty, I need more power!

We haven't been shy about acknowledging our shared hosting performance issues this past year. It all goes back to our core storage architecture, a SAN-attached NFS server. The short version is that there's too much reading-and writing happening too fast for this architecture to provide exceptional performance 24/7/365. Backups, which need to inspect every file on the storage system for changes, are taking more than 12 hours to complete now, degrading performance along the way.

While there can be many reasons for perceived slowness, we know our storage system performance is less than perfect from time to time, and we have seen the solution, in our mind's eye (and engineering whiteboards): Clustered storage. We'll be able to add machines and add storage and add read/write capacity simultaneously, in a single, seamless, migration-free step. We've been working on it for about a year now. We have a ways to go still, and I'll post an update in June on the progress.

In the meantime, we have known that we really need to do something about this -- and today we have. We've deployed additional storage hardware -- not as a permanent fix, and not even as a step towards the new system, but as an 'overflow' system to which we can move a handful of busy accounts.

One of the first we've moved is the webmail system inside OnSite.  If you use it regularly, I think you'll find it considerably faster today than it was a few days ago. In fact, all of OnSite should feel faster now due to software upgrades we deployed last week.

We are still ironing out some quirks with this 'overflow' hardware, but next week we should be able to migrate a small number of additional sites to this system. If you've previously contacted us about performance issues with complex, template-driven sites, let us know by emailing feedback at modwest dot com that you're interested in having your files moved to the new system.  We'll handle everything, and you won't have to do anything different,  but the move process can cause a site to be unavailable for up to an hour or so, depending on the amount of data and number of files.

This isn't a permanent fix, but it should help mitigate our recent less-than-stellar filesystem access times occasionally experienced. And again, the longer term fix will solve the NFS-performance problem once and for all.

So, short term, we sent our request for more power to the engine room, and we've managed to eek out just a bit more. If only we had a dilithium crystal... more news soon.

-JM










April 26, 2008

Email Delays Resolved

Mail on the Modwest shared hosting system is flowing again now. Nothing was lost, but throughout the day yesterday (Friday), incoming messages could have been delayed several hours, webmail didn't work for most customers, and about half the mailboxes we host were totally unavailable for several hours last night.  A blow-by-blow is available on our offsite status page.

At the high-water mark for yesterday's problems, close to 200,000 messages were hung up in the queue, awaiting delivery to customer mailboxes. That included messages coming in to our support team, so it wasn't until late last night that a bazillion support request emails came in. Needless to say, we're a little behind.

The problem, as best as we could tell, was some sort of deadlock issue with our Cyrus IMAP software. The resulting behavior was all the 'stuff the message in a mailbox' processes believed they needed to wait for access to do so. All of them. I can sort of imagine them all politely saying "no please, I insist, you first" to each other, with no deliveries happening at all.

Anyway, the situation is resolved now; not necessarily in a permanent way, but everything is working at the moment and we're monitoring servers closely. Sorry about the trouble.

-JM

April 25, 2008

E-Mail Delays

We are currently working on an issue that has caused email delivery delays today. For updates, See the Modwest System Monitor at http://status.modwest.com/.

April 20, 2008

On the Tree of (Hardware) Woe

     "Contemplate this on the Tree of Woe..."

This is a line from Conan the Barbarian uttered by the villain Thulsa Doom after capturing Conan and beating him to within an inch of his life. Looking down upon the fallen warrior, Doom (played by Vader-esque James Earl Jones) then turns to his henchman Rexor and issues a directive:

      

"Crucify him."

In response, Conan collapses in exhausted anguish.

Last week wasn't quite that bad for our hardware team, but it was a rough one. We had two important managed server customers suffer catastrophic hardware issues which required hours -- even days -- of downtime to fully repair. While dissimilar, both problems were storage-related.

First, a mini-primer on storage redundancy and fault tolerance:

There are two basic ways to provide storage redundancy in a standalone dedicated server: software and hardware.

  • The benefits of the software strategy is lower hardware cost, which we can then pass along to the customer.
  • The hardware solution is better at detecting and handling drive failures but costs more to deploy.

Neither strategy is immune to fault, as we've been painfully reminded over the past 10 days. Both affected customers themselves host dozens of their own customers on these servers, and so the interruptions were particularly undesirable for them.

In the first case, the server featured hardware-based redundancy, a RAID-controller made by 3Ware. It just so happens that the particular driver for this controller has a rare bug when installed on servers running a certain Linux kernel version. The bug can cause generalized data corruption (!), and in this case, we discovered that various system configuration files in /etc were getting periodically scrambled, removed, and relocated!

This is a live server providing business-critical functionality to our customer and his customers, and yet the only fix was to re-install the operating system running an updated driver to ensure data integrity. This required an overnight re-install and data-restoration procedure, but when the server was back up and running, a few configurations and software versions were different, and thus we had to work through  dozens of small web application and email glitches before everything was ship-shape.

That would have been challenging enough, but around the same time, another managed server suffered a hard drive failure.  This machine utilized the software-approach to storage redundancy, and while the  drive failure was indeed detected, what wasn't detected was that the other hard drive was on its last legs and could have also failed at any moment. The server was in an absolutely precarious state when it finally alerted us to the issue.

Ordinarily, when one hard drive in a mirrored pair fails, the procedure is to shut down, replaced the failed drive, reboot, and instruct the system to re-mirror everything. In this case though, the server's remaining drive was in such bad shape that we suspected it wouldn't make it through the reboot, and that all current data on the machine would be lost.

The challenge was therefore to ensure that we had a snapshot of the most current data before beginning the surgery. But try making a fresh backup of 30+GB of data off a damaged hard drive that could fail at any moment; it's a slow, slow process, and took close to 18 hours to complete. It was only upon that completion that we could proceed with a full reinstall, reconfiguration, and restoration from backup (which took much of the next day).

I'm happy to report that as of Friday afternoon, thanks to long hours of work by our best hardware guys, both servers are (to the best of our knowledge) repaired and fully functioning. 

Our managed servers as a rule boast the highest availability of all our services, with many of them enjoying near-100% historical uptime. But the fact remains that hardware components, and especially moving parts such as hard drives, simply wear out and break. We do what we can to ensure that when they do, repair and recovery is relatively painless, but this past week presented a 'perfect storm' of hardware problems. 

By the way, Conan's friend Subotai rescued him from the Tree of Woe, and Conan returned heroically to vanquish the enemy.

-JM



March 24, 2008

PHP Versions at Modwest

We've had a few requests recently about upgrading PHP on the Modwest shared system. We specialize in PHP hosting -- we have substantial expertise in this arena -- but as of March 2008, we're not running the latest stable release of PHP on our shared system. We're running 4.4.6 as the default PHP version for new accounts, and the PHP development team currently recommends version 5.2.5.

There are a few technical barriers to upgrading the shared system, some of which I've hinted at in previous posts. The Modwest shared system is a centralized SAN-attached storage architecture, and in its current incarnation, there is only one technical server environment possible for all customers. This environment, despite periodic security and stability updates, is in many ways the same as it was in 2001 when we built it.  That is going to change in 2008.

In the meantime, here are the options available to customers on the shared system who would like to (or need to) utilize a more recent version of PHP.

  • You can switch a site to PHP 5.0.4 in OnSite under 'PHP Configuration'.
  • PHP 5.2.0 is also available on the shared system, but it's incomplete. Some new extensions simply cannot be installed in our current environment. More info about these first two options is in the FAQ.
  • Both VPS plans and managed servers provide PHP 5.2.

But the long term solution is to offer our customers on the shared system an array of choices with regards to their hosting environment. That's what we have planned. Customers whose web applications require the latest PHP version, extensions, and libraries, will eventually be able to configure their accounts to accommodate these requirements. Customers whose web apps are running just fine as is won't need to change a thing.

We intend to continue being an excellent hosting choice for PHP developers. We may be slightly behind the curve currently, but we've got some great things in the works. If you'd like to beta-test the new environment this summer, just let us know by writing to feedback at modwest dot com.

-JM







March 17, 2008

Internet Successes at Modwest

Modwest hosts websites for thousands of customers in some fifty countries, and I know we host customer sites selling high-def televisions, customizable Belgian chocolates, South American jewelry, and virtually everything in between. We also host gossip sites, faith-based organizations, and business-to-business consulting sites. (This hosting thing is actually a fascinating business to be in.)

Overseeing Support & Operations at Modwest, I most commonly hear about the challenges our customers are facing, and not so much about the successes. But when we do hear from a customer who has overcome a technical challenge, or unveiled their new e-commerce application to the world, or sold 50 televisions in a day, or won a web design award, it's really cool.

The purpose of this brief message is to ask Modwest customers for your success stories. What we have in mind is a periodic post that highlights a specific customer.  Upgrading your site? Just registered your 1,000th user? Just get your grant funding renewed based on website success? Sold your millionth widget? Or just happy to be hosting at Modwest? :) Let us know, and we may share your story with the whole wide world right here on the Modwest Blog.

Email us at feedback at modwest dot com or fill out our contact form to get in touch.

-JM

February 21, 2008

End of Forwarding to Comcast

Spam, spam, spam. The never-ending battle continues.

Some may remember about how we stopped allowing email forwarding to AOL last year. Spam that originates elsewhere, passes through Modwest, arrives at AOL, and is then reported to AOL by the recipient as being spam generates a complaint against Modwest as the source of the spam -- even though we were merely a conduit, forwarding mail as per our customer's preferences.

We now know that the same situation affects email auto-forwarded to Comcast, which has a similar policy. In the past month, we've asked Comcast to please resume accepting mail from Modwest customers at least a dozen times. They're usually responsive, but it's becoming an almost daily occurrence. At the moment I write this, nearly 1,000 messages are stuck here at Modwest because Comcast refuses to accept them. We hope to push them through to their final destinations at Comcast by the end of the day, but we need Comcast's cooperation to make that happen.

Therefore, because Comcast's anti-spam policies are regularly causing significant email delays (hours or days) for any Modwest customer who attempts to communicate with any Comcast customer, we cannot allow automatic forwarding to Comcast any longer. We'll soon be getting in touch with customers who rely on Comcast forwarding about their options.

(Incidentally, Yahoo is also blocking most Modwest customer mail at the moment, and we're working on clearing things up with them. UPDATE Feb 24: Yahoo is still blocking us, and not responding to requests for information and resolution.

Update Feb 27: Yahoo has not been terribly communicative but has let the vast majority of deferred messages through, finally.)

We always try to avoid removing features that customers have been using, and we may rescind this policy if other options become available, but we'll be preventing auto-forwarding to Comcast addresses starting next week.

-JM


February 06, 2008

Filesystem Performance: Not out of the woods, yet

As we've previously explained (here and here and here), Modwest has been challenged periodically to coax more performance out of the centralized storage architecture of our shared web hosting system. When system loads start to reach the point at which performance suffers, we address the issue by adjusting NFS and backup configurations, identifying resource-intensive sites and figuring out how to make them less so, and publishing tips for site owners about how to make their sites less storage-dependent.

Well, the issue has returned:

Load
(The week-long gap at the end of January was the result of an operating system upgrade on the monitoring server which sort of deconfigured the graphing subsystem. Oops!)

As you can see from the graph (click it for a larger version), we're approaching 52-week highs again. Monday of this week was especially rough in the middle of the day.

We're pretty much run out of 'big stuff that needs fixing' on the current architecture, so now we make small changes, each of which could provide some incremental gain in performance via reduced utilization of the central storage system. In addition to finding a few super-frequent cron jobs (which rarely need super-frequency), one of the actions we've taken today is FTP rate-limiting. That means that if you're on a mega-fast uplink you might not see your full potential in FTP upload speed.

Of course what really needs to happen is a re-thinking of our centralized storage architecture. I'm happy to say we started that thought process more than a year ago. We already run a load balanced cluster of web servers, and a load balanced cluster of mail servers. Why not storage?

It's a hard problem, but a solvable problem to which we've been devoting a lot of engineering time over the past year. We're within three months or so of offering access to a "re-imagined" storage architecture that will not only address the issues we're currently experiencing, but will also open the door to some interesting features you won't find elsewhere.

-JM

P.S. I promise I'll publish more details about the new system soon, but as always, let us know of any questions by commenting below or otherwise contacting us.

February 02, 2008

Email changes: Attachment Size, Spam Filtering

This past week we made a couple improvements on our email systems that we think will be be better for everyone.

First, we changed the maximum email attachment size to twenty megabytes. This is more in line with other large email providers and will help some of our graphic designer and photographer customers send and receive the large files associated with their trade. Remote recipients might not be able to receive 20MB attachments sent through our system (since their email provider might have a different policy), but incoming attachments should come through just fine as long as they can make it through our dangerous content filters.

Second, all newly created mailboxes will have their spam filter turned on, and anything detected as spam will be placed in the Spam/ subfolder of the mailbox. Note that messages in the Spam/ and Trash/ folders are auto-deleted after a period of time.

We're committed to offering maximum control over your incoming messages and had to think hard about enabling the filter by default. But two things gave us confidence: a relatively commonly expressed concern from new customers that since transferring to Modwest, they seem to receive more spam, and the (directly related) discovery that the majority of our customers had not chosen to enable spam filtering at all. So this change should help with that phenomenon. Existing mailboxes are not affected by this change.

Questions? Comment below, or contact our support team.

-JM




January 16, 2008

Database Server Maintenance - Jan 17

As announced in OnSite and our public status page, we'll be taking our MySQL 5 server (db1.modwest.com) offline for up to thirty minutes tomorrow (Thursday) night around 7PM Mountain Time for system upgrades. If your site utilizes this server and you're a programmer (or have one nearby), you may want to consider making sure your web applications handle the maintenance window gracefully. (I'll post a technical comment below for one way to do so.)

Speaking of MySQL, the company that produces this excellent open-source database software was just  acquired by Sun Microsystems, an enterprise hardware and software company. Only time will tell, but we anticipate this will result in some powerful improvements to the already-great MySQL software.

-JM


Powered by TypePad

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31