Anyone who has visited this site within the last couple of days probably was greeted with a wonderful .Net runtime exception (Sql Server timeouts). It has taken me a couple of days to track down the issue with the virtual machine this site runs on, but I think I finally nailed it. The symptoms were strange though, and given that I’m much better at debugging code over troubleshooting server issues, it probably took me longer than it should have. The symptoms were:
- Tons of harddrive activity on the server (it has 3 drives which compounds the problem…which one is out of control?).
- Looking at task manager on the host OS revealed the vsssvr process (Virtual Server) was spiking the CPU, but disk I/O wasn’t anything out of the ordinary for that process.
- Trying to do anything on the guest OS that my site and database are on was basically an effort in futility as it was all but unresponsive (click something, wait 5 minutes for it to do anything…really tested my patience). Task manager on that VM revealed no process utilizing more than 20% processor time (which contradicts the previous bullet as VS was consuming most of the host’s proc most of the time); again disk I/O didn’t seem to be hokey.
- Reboot the guest OS a couple of times for good measure (this usually takes 5 minutes or so, but with the new issue took more like an hour each time).
- Nothing worth taking a second look at in Event Viewer. Attempting to look at performance monitor counters simply wasn’t happening as I would always get timeouts.
- Ok, so it’s not a processor race, plenty of unused memory on the VM (both physical and virtual, so no excessive paging going on), but the HDD is spinning out of control. Next logical decision is to break out FileMon from the wonderful folks over at SysInternals. Result?
I have (well, had) eTrust EZ Antivirus installed on that VM. FileMon revealed that explorer.exe was making ~ 500+ I/O requests/second to the main eTrust .exe. I’m pretty sure that’s enough to make an HDD spin out of control, so I got rid of it (which took more than an hour due to the drive thrashing). Rebooted, et voila…that VM is happily serving up web requests again. This brings 2 questions to mind:
- Is there some sort of issue with eTrust and Virtual Server? Or (an even bigger question IMO)…
- Do we even need to install a/v software on our virtual machines if the host machine has some sort of a/v solution already in place?
Again, I’m not a hardware guy so I don’t know if virtualization runs VM’s in some sort of isolation away from the host OS wherein the a/v software wouldn’t pick up would be nasties on the guest OS’s…but I’m inclined to think that whatever bytes are pouring onto the disc, no matter where they’re going, will be scanned by the a/v software. Yes? No? I’d love a definitive answer on this. Regardless, hopefully (knock on wood) my JK.com woes are over and done with. I really hope so, it’s been seriously affecting my productivity/sleep patterns/time spent with my dog.
Sidenote: Speaking of my dog, he has had a severe case of the upset tummies for 2 days now (I won’t get into graphic detail, let’s just say he’s had to go out a lot…as in 3 times last night; then he had me up at 6:30 a.m. and has been out every 3 hours on the dot since then. It’s 11:00 p.m. now, you do the math. I live on the 8th floor of an apartment building, so it’s not like I can just open up the back door and let him out…it’s been a total PITA, each walk has been a 20 minute ordeal). Worst of all, he refuses to eat which would probably…err…solidify things at this point. Poor guy.
Posted
Mar 06 2006, 11:17 PM
by
Jayson Knight