June 10, 2008

On Mountains, Rivers, and Work

Sometimes when we are fixing servers and dealing with issues all the time, we forget that Modwest is based in this awesome place.

Tonight, June 10th, 2008, there's a snow advisory for Missoula, but recently it was mild (50s and 60s)  with the occasional scattered shower. My wife and I took most of last week off to see the nearby Belt Mountains and Little Belt Mountains a short drive away.

First stop was Hanging Valley. I'm a huge fan of the various national parks in Utah, and so when some reviewers compared this 12-mile hike to the absolutely surreal Bryce Canyon, I had to see it. Not quite the same, but certainly worth the hike and gorgeous (click images for a closer view):

20080603vacation_021_3

Late that day, we resupplied in Townsend, Montana and carried on to a large (and utterly deserted) Forest Service campground just south of Mt. Baldy, which was mostly invisible due to swirling blizzards near the summit.

The next day, after a hearty breakfast at  Dori's in White Sulphur Springs, we headed north, back into the mountains. On a high plateau, we briefly encountered one of the locals:

Dsc00224

After a harrowing drive down a crater-filled, nightmare-switchbacked forest road, we finally arrived at the Logging Creek Campground -- also totally deserted -- and, upon starting dinner, quickly made friends with the neighbors.

20080603vacation_093

Logging Creek, perhaps despite the name, was idyllic, perfectly clear, ice-cold, flower-lined, and I spent much of a sunny afternoon enjoying it. (I even brought a gallon home to brew beer with.)

Here's another shot of the campsite:

20080603vacation_080

As we descended from the mountains the next day and into Belt, Montana, how could we resist this sign for the Harvest Moon Brewery:

20080603vacation_102

It was only 11 in the morning, and the workers were in the midst of bottling, but one of the brewers took the time to give us a tour of the brewery and pour us fresh samples of everything on tap. Delicious!

Next stop was Great Falls, a (relative) metropolis in Montana, and the home of my wife's parents, Jim and Helen. We didn't tell them we were in town, and so the visit was something like a surprise party (which included a freshly-filled growler from the brewery).   After a few hours socializing, we headed west over the Continental Divide, through the briefly-famous Lincoln, Montana, and back to Missoula.


It's easy to be consumed, even overwhelmed, by the technical details of your job. Those details are, in fact, infinite. So it's important to, from time to time, look around, see what's nearby, enjoy it... and relax. Here at Modwest we are surrounded by an outdoor paradise of mountains, rivers, and wildlife, and so there are ample opportunities to get outta town and enjoy it. But even a local park, a creek, a shrub(!) offer these opportunities. Soak them in, enjoy.

And don't worry, we'll keep improving and fixing server stuff as needed.

-JM




May 08, 2008

Scotty, I need more power!

We haven't been shy about acknowledging our shared hosting performance issues this past year. It all goes back to our core storage architecture, a SAN-attached NFS server. The short version is that there's too much reading-and writing happening too fast for this architecture to provide exceptional performance 24/7/365. Backups, which need to inspect every file on the storage system for changes, are taking more than 12 hours to complete now, degrading performance along the way.

While there can be many reasons for perceived slowness, we know our storage system performance is less than perfect from time to time, and we have seen the solution, in our mind's eye (and engineering whiteboards): Clustered storage. We'll be able to add machines and add storage and add read/write capacity simultaneously, in a single, seamless, migration-free step. We've been working on it for about a year now. We have a ways to go still, and I'll post an update in June on the progress.

In the meantime, we have known that we really need to do something about this -- and today we have. We've deployed additional storage hardware -- not as a permanent fix, and not even as a step towards the new system, but as an 'overflow' system to which we can move a handful of busy accounts.

One of the first we've moved is the webmail system inside OnSite.  If you use it regularly, I think you'll find it considerably faster today than it was a few days ago. In fact, all of OnSite should feel faster now due to software upgrades we deployed last week.

We are still ironing out some quirks with this 'overflow' hardware, but next week we should be able to migrate a small number of additional sites to this system. If you've previously contacted us about performance issues with complex, template-driven sites, let us know by emailing feedback at modwest dot com that you're interested in having your files moved to the new system.  We'll handle everything, and you won't have to do anything different,  but the move process can cause a site to be unavailable for up to an hour or so, depending on the amount of data and number of files.

This isn't a permanent fix, but it should help mitigate our recent less-than-stellar filesystem access times occasionally experienced. And again, the longer term fix will solve the NFS-performance problem once and for all.

So, short term, we sent our request for more power to the engine room, and we've managed to eek out just a bit more. If only we had a dilithium crystal... more news soon.

-JM










April 26, 2008

Email Delays Resolved

Mail on the Modwest shared hosting system is flowing again now. Nothing was lost, but throughout the day yesterday (Friday), incoming messages could have been delayed several hours, webmail didn't work for most customers, and about half the mailboxes we host were totally unavailable for several hours last night.  A blow-by-blow is available on our offsite status page.

At the high-water mark for yesterday's problems, close to 200,000 messages were hung up in the queue, awaiting delivery to customer mailboxes. That included messages coming in to our support team, so it wasn't until late last night that a bazillion support request emails came in. Needless to say, we're a little behind.

The problem, as best as we could tell, was some sort of deadlock issue with our Cyrus IMAP software. The resulting behavior was all the 'stuff the message in a mailbox' processes believed they needed to wait for access to do so. All of them. I can sort of imagine them all politely saying "no please, I insist, you first" to each other, with no deliveries happening at all.

Anyway, the situation is resolved now; not necessarily in a permanent way, but everything is working at the moment and we're monitoring servers closely. Sorry about the trouble.

-JM

April 25, 2008

E-Mail Delays

We are currently working on an issue that has caused email delivery delays today. For updates, See the Modwest System Monitor at http://status.modwest.com/.

April 20, 2008

On the Tree of (Hardware) Woe

     "Contemplate this on the Tree of Woe..."

This is a line from Conan the Barbarian uttered by the villain Thulsa Doom after capturing Conan and beating him to within an inch of his life. Looking down upon the fallen warrior, Doom (played by Vader-esque James Earl Jones) then turns to his henchman Rexor and issues a directive:

      

"Crucify him."

In response, Conan collapses in exhausted anguish.

Last week wasn't quite that bad for our hardware team, but it was a rough one. We had two important managed server customers suffer catastrophic hardware issues which required hours -- even days -- of downtime to fully repair. While dissimilar, both problems were storage-related.

First, a mini-primer on storage redundancy and fault tolerance:

There are two basic ways to provide storage redundancy in a standalone dedicated server: software and hardware.

  • The benefits of the software strategy is lower hardware cost, which we can then pass along to the customer.
  • The hardware solution is better at detecting and handling drive failures but costs more to deploy.

Neither strategy is immune to fault, as we've been painfully reminded over the past 10 days. Both affected customers themselves host dozens of their own customers on these servers, and so the interruptions were particularly undesirable for them.

In the first case, the server featured hardware-based redundancy, a RAID-controller made by 3Ware. It just so happens that the particular driver for this controller has a rare bug when installed on servers running a certain Linux kernel version. The bug can cause generalized data corruption (!), and in this case, we discovered that various system configuration files in /etc were getting periodically scrambled, removed, and relocated!

This is a live server providing business-critical functionality to our customer and his customers, and yet the only fix was to re-install the operating system running an updated driver to ensure data integrity. This required an overnight re-install and data-restoration procedure, but when the server was back up and running, a few configurations and software versions were different, and thus we had to work through  dozens of small web application and email glitches before everything was ship-shape.

That would have been challenging enough, but around the same time, another managed server suffered a hard drive failure.  This machine utilized the software-approach to storage redundancy, and while the  drive failure was indeed detected, what wasn't detected was that the other hard drive was on its last legs and could have also failed at any moment. The server was in an absolutely precarious state when it finally alerted us to the issue.

Ordinarily, when one hard drive in a mirrored pair fails, the procedure is to shut down, replaced the failed drive, reboot, and instruct the system to re-mirror everything. In this case though, the server's remaining drive was in such bad shape that we suspected it wouldn't make it through the reboot, and that all current data on the machine would be lost.

The challenge was therefore to ensure that we had a snapshot of the most current data before beginning the surgery. But try making a fresh backup of 30+GB of data off a damaged hard drive that could fail at any moment; it's a slow, slow process, and took close to 18 hours to complete. It was only upon that completion that we could proceed with a full reinstall, reconfiguration, and restoration from backup (which took much of the next day).

I'm happy to report that as of Friday afternoon, thanks to long hours of work by our best hardware guys, both servers are (to the best of our knowledge) repaired and fully functioning. 

Our managed servers as a rule boast the highest availability of all our services, with many of them enjoying near-100% historical uptime. But the fact remains that hardware components, and especially moving parts such as hard drives, simply wear out and break. We do what we can to ensure that when they do, repair and recovery is relatively painless, but this past week presented a 'perfect storm' of hardware problems. 

By the way, Conan's friend Subotai rescued him from the Tree of Woe, and Conan returned heroically to vanquish the enemy.

-JM



March 24, 2008

PHP Versions at Modwest

We've had a few requests recently about upgrading PHP on the Modwest shared system. We specialize in PHP hosting -- we have substantial expertise in this arena -- but as of March 2008, we're not running the latest stable release of PHP on our shared system. We're running 4.4.6 as the default PHP version for new accounts, and the PHP development team currently recommends version 5.2.5.

There are a few technical barriers to upgrading the shared system, some of which I've hinted at in previous posts. The Modwest shared system is a centralized SAN-attached storage architecture, and in its current incarnation, there is only one technical server environment possible for all customers. This environment, despite periodic security and stability updates, is in many ways the same as it was in 2001 when we built it.  That is going to change in 2008.

In the meantime, here are the options available to customers on the shared system who would like to (or need to) utilize a more recent version of PHP.

  • You can switch a site to PHP 5.0.4 in OnSite under 'PHP Configuration'.
  • PHP 5.2.0 is also available on the shared system, but it's incomplete. Some new extensions simply cannot be installed in our current environment. More info about these first two options is in the FAQ.
  • Both VPS plans and managed servers provide PHP 5.2.

But the long term solution is to offer our customers on the shared system an array of choices with regards to their hosting environment. That's what we have planned. Customers whose web applications require the latest PHP version, extensions, and libraries, will eventually be able to configure their accounts to accommodate these requirements. Customers whose web apps are running just fine as is won't need to change a thing.

We intend to continue being an excellent hosting choice for PHP developers. We may be slightly behind the curve currently, but we've got some great things in the works. If you'd like to beta-test the new environment this summer, just let us know by writing to feedback at modwest dot com.

-JM







March 17, 2008

Internet Successes at Modwest

Modwest hosts websites for thousands of customers in some fifty countries, and I know we host customer sites selling high-def televisions, customizable Belgian chocolates, South American jewelry, and virtually everything in between. We also host gossip sites, faith-based organizations, and business-to-business consulting sites. (This hosting thing is actually a fascinating business to be in.)

Overseeing Support & Operations at Modwest, I most commonly hear about the challenges our customers are facing, and not so much about the successes. But when we do hear from a customer who has overcome a technical challenge, or unveiled their new e-commerce application to the world, or sold 50 televisions in a day, or won a web design award, it's really cool.

The purpose of this brief message is to ask Modwest customers for your success stories. What we have in mind is a periodic post that highlights a specific customer.  Upgrading your site? Just registered your 1,000th user? Just get your grant funding renewed based on website success? Sold your millionth widget? Or just happy to be hosting at Modwest? :) Let us know, and we may share your story with the whole wide world right here on the Modwest Blog.

Email us at feedback at modwest dot com or fill out our contact form to get in touch.

-JM

February 21, 2008

End of Forwarding to Comcast

Spam, spam, spam. The never-ending battle continues.

Some may remember about how we stopped allowing email forwarding to AOL last year. Spam that originates elsewhere, passes through Modwest, arrives at AOL, and is then reported to AOL by the recipient as being spam generates a complaint against Modwest as the source of the spam -- even though we were merely a conduit, forwarding mail as per our customer's preferences.

We now know that the same situation affects email auto-forwarded to Comcast, which has a similar policy. In the past month, we've asked Comcast to please resume accepting mail from Modwest customers at least a dozen times. They're usually responsive, but it's becoming an almost daily occurrence. At the moment I write this, nearly 1,000 messages are stuck here at Modwest because Comcast refuses to accept them. We hope to push them through to their final destinations at Comcast by the end of the day, but we need Comcast's cooperation to make that happen.

Therefore, because Comcast's anti-spam policies are regularly causing significant email delays (hours or days) for any Modwest customer who attempts to communicate with any Comcast customer, we cannot allow automatic forwarding to Comcast any longer. We'll soon be getting in touch with customers who rely on Comcast forwarding about their options.

(Incidentally, Yahoo is also blocking most Modwest customer mail at the moment, and we're working on clearing things up with them. UPDATE Feb 24: Yahoo is still blocking us, and not responding to requests for information and resolution.

Update Feb 27: Yahoo has not been terribly communicative but has let the vast majority of deferred messages through, finally.)

We always try to avoid removing features that customers have been using, and we may rescind this policy if other options become available, but we'll be preventing auto-forwarding to Comcast addresses starting next week.

-JM


February 06, 2008

Filesystem Performance: Not out of the woods, yet

As we've previously explained (here and here and here), Modwest has been challenged periodically to coax more performance out of the centralized storage architecture of our shared web hosting system. When system loads start to reach the point at which performance suffers, we address the issue by adjusting NFS and backup configurations, identifying resource-intensive sites and figuring out how to make them less so, and publishing tips for site owners about how to make their sites less storage-dependent.

Well, the issue has returned:

Load
(The week-long gap at the end of January was the result of an operating system upgrade on the monitoring server which sort of deconfigured the graphing subsystem. Oops!)

As you can see from the graph (click it for a larger version), we're approaching 52-week highs again. Monday of this week was especially rough in the middle of the day.

We're pretty much run out of 'big stuff that needs fixing' on the current architecture, so now we make small changes, each of which could provide some incremental gain in performance via reduced utilization of the central storage system. In addition to finding a few super-frequent cron jobs (which rarely need super-frequency), one of the actions we've taken today is FTP rate-limiting. That means that if you're on a mega-fast uplink you might not see your full potential in FTP upload speed.

Of course what really needs to happen is a re-thinking of our centralized storage architecture. I'm happy to say we started that thought process more than a year ago. We already run a load balanced cluster of web servers, and a load balanced cluster of mail servers. Why not storage?

It's a hard problem, but a solvable problem to which we've been devoting a lot of engineering time over the past year. We're within three months or so of offering access to a "re-imagined" storage architecture that will not only address the issues we're currently experiencing, but will also open the door to some interesting features you won't find elsewhere.

-JM

P.S. I promise I'll publish more details about the new system soon, but as always, let us know of any questions by commenting below or otherwise contacting us.

February 02, 2008

Email changes: Attachment Size, Spam Filtering

This past week we made a couple improvements on our email systems that we think will be be better for everyone.

First, we changed the maximum email attachment size to twenty megabytes. This is more in line with other large email providers and will help some of our graphic designer and photographer customers send and receive the large files associated with their trade. Remote recipients might not be able to receive 20MB attachments sent through our system (since their email provider might have a different policy), but incoming attachments should come through just fine as long as they can make it through our dangerous content filters.

Second, all newly created mailboxes will have their spam filter turned on, and anything detected as spam will be placed in the Spam/ subfolder of the mailbox. Note that messages in the Spam/ and Trash/ folders are auto-deleted after a period of time.

We're committed to offering maximum control over your incoming messages and had to think hard about enabling the filter by default. But two things gave us confidence: a relatively commonly expressed concern from new customers that since transferring to Modwest, they seem to receive more spam, and the (directly related) discovery that the majority of our customers had not chosen to enable spam filtering at all. So this change should help with that phenomenon. Existing mailboxes are not affected by this change.

Questions? Comment below, or contact our support team.

-JM





Powered by TypePad

June 2008

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30