Statistics vs. Resources
How Collecting and Processing Additional Statistics Might do you a Bad Favor!
We all know how useful statistics might be. It gives quick overview to aggregated information and provides quality data analysis to help us make better, more informed decisions. However, in some cases statistics collection and processing requires more server resources than the initially intended.
Today we would like to share with you our recent experience. This is a real story about two stat modules which were added to the hosting service, but doubled the server resource usage.*
Piwik module is one of the most widespread stats analysis softwares. It is normally installed in the hosting service. The system collects and provides statistics for your website visitors as well as search engine bots and keywords that bring users to your website.
Our team had a detailed check to find the cause for so many simultaneously running processes.
Our customer was using Google Analytics, but he had also recently installed Piwik as a second statistics source. The system was used in a few of his projects. Meanwhile one of his web projects was having an active promo campaign, hence it had been attracting more visitors than usual.
When a website with a Piwik installed is loaded, a request for saving the visitor's data is also sent to Piwik. This request is processed by a PHP script. The rising number of website visits leads also to more requests to Piwik.
After we came to this conclusion, our team reached Mr. Dimitrov and we increased his Managed VPS resources in terms of RAM and number of simultaneous running processes.
We would also like to quote Dimitar Dimitrov (with his consent). While looking at the graph with system parameters, he exclaimed:
I was very surprised to find out that a software, consisting of only five modules can turn out to be so costly! I just wanted to use it as a second source of information. 🙂
Analyzing the PHP processes which reached their limit was one of the steps before going further with the increase in CPU usage. CPU usage includes the time required for PHP request processing as well as the time required for MySQL query processing. There was a discernible difference between the two parameters – the PHP request processing time reached nearly 78 minutes while the MySQL time was over 1433 minutes! This was more than 18 times longer!
We were absolutely determined to analyze and find out what kind of queries were being executed to the database. We observed the MySQL server and noticed that there was an SQL query executed to one of the databases, but it was not optimized at all.
Here the "credit" goes to HeatMapTracker. This system enables you to track places on your website where users click, what pages attract their interest or are avoided, where they scroll and many others.
Similarly to Piwik, HeatMap Tracker is installed in the hosting service and stores the collected data in the database. Over time statistics grows bigger and more and more data piles up.
One of the unspoken rules when dealing with software is: If there are unoptimized processes, they immediately pop up in large volumes of data.
The unoptimized execution of an SQL query has led to excessive resource usage.
But then what exactly means optimized execution of an SQL query?
It is normal for web projects retrieving data to execute queries for less than a second. However, optimized execution is a combination of two factors:
- query execution time
- query execution frequency
If an SQL query is executed for longer than a second and once in a few seconds, this is considered to be unoptimized. This situation leads to the following consequences:
- Increase in CPU usage
- Slowdown in the website's operation in this regard which waits for the query result
In our customer's case we noticed one particular SQL query which was executed for longer than a second and very intensively. The query is:
# User@Host: user[user] @ localhost 
# Query_time: 1.683540 Lock_time: 0.000094 Rows_sent: 1 Rows_examined: 63160
SELECT * FROM hmtrack_main_8053a0fc9dec652dba7f3c9b05c3531e6fcbf31e WHERE session_id = '0' AND `project` = 'project';
Usually a well-structured SQL query aims at processing a maximum number of lines. For this purpose the query might be reworked or indexes might be added to the tables.
We added an index to the session_id column. Afterwards the query started being executed for 0.0009 seconds. The execution time was reduced by 1870 times!
Regarding the CPU usage – See it by yourself 😉
Managed VPS services provide you with more resources and CPU usage. However, try not to waste your resources.
Be careful when installing stats modules. Evaluate your sources and the means of analysis, but do not underestimate the limited availability of your server resources.
*P.S. Our customer's profile is in Bulgarian, so the screenshots are in Bulgarian, too.