This was a big problem because within the span of 5-10 minutes our Apache webserver VMs would go from 23% memory usage to swapping out to the NFS datastore seriously impacting performance until we restarted apache to clear the condition.

This was a non-trivial (interesting!) problem to analyze for several reasons:
1) The spikes were not easily tied to any particular large increase in number of requests (in fact the scoreboard showed most threads were idle during these memory spikes)

2) determining the constituent apache memory components contributing to the memory usage spikes was not made easy:
2a) top reports memory usage based on Shared pages not the REAL memory actually bound to that httpd process exclusively
2b) we did not know which apache requests or which apache processes were consuming the memory (out of the dozens of httpd processes and thousands of requests)
3) The restart fix was easy enough so the root cause analysis was deferred for other priorities.
Road to the Solution (skip to bottom for the Solution ;):
First we needed a way to tie to the httpd PIDs to the requests they were serving.
Our existing LogFormat did not include the PID for the httpd serving the request
Adding %P to the end solved this:
LogFormat "%h %V %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %P" combined
Next, getting the REAL memory consumption for these bloated httpd's was not happening with top. Turns out with Linux you need to get into the /proc/*/smap files and analyze the Private_Dirty entries (Credit to and his helpful post for giving the seed of this script)
[root@web08 02]# more ~/ShowMem.csh
#!/bin/csh
foreach i (`ls /proc/*/smaps`)
echo `grep Private_Dirty $i | awk '{ print $2 }' | xargs ruby -e 'puts ARGV.inject { |i, j| i.to_i + j.to_i }'` " " `head -1 $i | awk '{print $NF}'` " " $i
end
Running this through sort shows the highest REAL (private dedicated to that PID) memory:
[root@web08 02]# ~/ShowMem.csh | sort -nr | headI then had the PID(s) of the most bloated httpds to search through the apache access logs.
16576 /usr/sbin/httpd /proc/23691/smaps
15484 /usr/sbin/httpd /proc/24871/smaps
3432 /usr/sbin/httpd /proc/24734/smaps
3188 /usr/sbin/httpd /proc/25354/smaps
I chose to focus on the LARGEST payload requests for these PIDs first.
Sort the access log by size of request:
Starting with a fresh restart of apache, so there were no bloated httpds yet, I then tested several of the high payload URLs from the access log while watching the output of repeated ShowMem.csh runs to catch any httpds growing.
awk '{print $(NF-1) " " $1 " " $2 " " $3 " " $4 " " $5 " " $7}' access_log.20110207 | sort -nr | more
Surprisingly I observed the httpds did not grow via "straight from the filesystem" 300mb+ mp4/mov files, but they did when the SAME FILE was served via mod_jk from the app layer!
(Quickly checked mod_jk was up to date and no fixes for memory in newer version)
I could not explain yet why this webapp/mod_jk combination caused apache to hold onto the memory in its smap anon space, but now I could readily reproduce and observe the issue at will (and that is 99% of the battle)
Armed with this info, I started researching for apache memory directives and quickly found
MaxMemFree !!
After adding
MaxMemFree 10000
I repeated the test and did not see desired effect advertised by the documentation.
Then I read in the forums the units may be Mb instead of KB as documented.
I then tried:
MaxMemFree 10
Repeating my test I observed the httpd serving the 300Mb mp4 video file via mod_jk balloon while serving the request to > 200Mb, then quickly free up this memory and return to 2.3Mb!
Success (MaxMemFree FTW!)
Our Apache instances are now running much leaner and effectively we've increased our capacity and eliminated our exposure to random requests bloating our httpd memory consumption.
Solution:
http://httpd.apache.org/docs/2.0/mod/mpm_common.html#maxmemfree
No comments:
Post a Comment