Thursday, February 10, 2011

Apache MaxMemFree For Lean Memory Control

We were experiencing unexplained memory spikes on our CentOS Apache VMs.
This was a big problem because within the span of 5-10 minutes our Apache webserver VMs would go from 23% memory usage to swapping out to the NFS datastore seriously impacting performance until we restarted apache to clear the condition.

This was a non-trivial (interesting!) problem to analyze for several reasons:
1) The spikes were not easily tied to any particular large increase in number of requests (in fact the scoreboard showed most threads were idle during these memory spikes)

2) determining the constituent apache memory components contributing to the memory usage spikes was not made easy:
2a) top reports memory usage based on Shared pages not the REAL memory actually bound to that httpd process exclusively
2b) we did not know which apache requests or which apache processes were consuming the memory (out of the dozens of httpd processes and thousands of requests)
3) The restart fix was easy enough so the root cause analysis was deferred for other priorities.

Road to the Solution (skip to bottom for the Solution ;):

First we needed a way to tie to the httpd PIDs to the requests they were serving.
Our existing LogFormat did not include the PID for the httpd serving the request
Adding %P to the end solved this:

LogFormat "%h %V %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %P" combined

Next, getting the REAL memory consumption for these bloated httpd's was not happening with top. Turns out with Linux you need to get into the /proc/*/smap files and analyze the Private_Dirty entries (Credit to Hongli Lai and his helpful post for giving the seed of this script)

[root@web08 02]# more ~/ShowMem.csh

foreach i (`ls /proc/*/smaps`)
echo `grep Private_Dirty $i | awk '{ print $2 }' | xargs ruby -e 'puts ARGV.inject { |i, j| i.to_i + j.to_i }'` " " `head -1 $i | awk '{print $NF}'` " " $i

Running this through sort shows the highest REAL (private dedicated to that PID) memory:
[root@web08 02]# ~/ShowMem.csh | sort -nr | head

16576 /usr/sbin/httpd /proc/23691/smaps
15484 /usr/sbin/httpd /proc/24871/smaps
3432 /usr/sbin/httpd /proc/24734/smaps
3188 /usr/sbin/httpd /proc/25354/smaps
I then had the PID(s) of the most bloated httpds to search through the apache access logs.
I chose to focus on the LARGEST payload requests for these PIDs first.
Sort the access log by size of request:

awk '{print $(NF-1) " " $1 " " $2 " " $3 " " $4 " " $5 " " $7}' access_log.20110207 | sort -nr | more
Starting with a fresh restart of apache, so there were no bloated httpds yet, I then tested several of the high payload URLs from the access log while watching the output of repeated ShowMem.csh runs to catch any httpds growing.
Surprisingly I observed the httpds did not grow via "straight from the filesystem" 300mb+ mp4/mov files, but they did when the SAME FILE was served via mod_jk from the app layer!
(Quickly checked mod_jk was up to date and no fixes for memory in newer version)
I could not explain yet why this webapp/mod_jk combination caused apache to hold onto the memory in its smap anon space, but now I could readily reproduce and observe the issue at will (and that is 99% of the battle)

Armed with this info, I started researching for apache memory directives and quickly found

MaxMemFree !!

After adding

MaxMemFree 10000

I repeated the test and did not see desired effect advertised by the documentation.
Then I read in the forums the units may be Mb instead of KB as documented.
I then tried:

MaxMemFree 10

Repeating my test I observed the httpd serving the 300Mb mp4 video file via mod_jk balloon while serving the request to > 200Mb, then quickly free up this memory and return to 2.3Mb!
Success (MaxMemFree FTW!)
Our Apache instances are now running much leaner and effectively we've increased our capacity and eliminated our exposure to random requests bloating our httpd memory consumption.


No comments: