Friday, March 18, 2005

Yesterday, we increased the server capacity of by 20% in order to address some of the stability problems we are seeing. This has had a positive impact but there is more to do.

What we are seeing is that individual application servers will trend toward 100% CPU usage over time - simply, the appservers get pegged and users on those servers encounter paralyzingly slow load times throughout the site. Over the course of the next week, we will be doubling the number of machines responsible for serving in order to address this.

Users with more than 500 posts are also being severely hampered at this time. We believe this is due to an improper use of system resources when users of such blogs either access the Edit Posts page or attempt to publish. We will be testing a potential fix to this problem over the next couple days and hope to push it to production early next week. Because of the extent of the change, we need to fully assess the impact on the service before deployment.

This blog will be updated with additional news on these solutions to the current problems.