With memory pages shared among applications, sometimes it's hard to tell how much you'll really save by killing that memory hog. Two new memory metrics offer help for busy administrators.
Anybody who has tried to figure out why a Linux system is running short of memory can attest that the memory usage information made available by the kernel is, at best, difficult to use. Matt Mackall has recently been working on a set of patches aimed at improving this situation. Given the constraints imposed by embedded Linux systems, it is not surprising that Matt chose the Embedded Linux Conference to present his work (which, incidentally, was funded by the Consumer Electronics Linux Forum).
Matt pointed out that the currently-available information is confusing at best. The page cache muddies the situation, and the sharing of pages between applications complicates things even more. The result is that it is hard to say where memory is being used; one can't even get a definitive answer to the question of how big a specific application is. More detailed questions - such as which parts of an application are using the most memory - are even harder to answer. Trying to answer questions of interest to embedded systems developers - how many applications can run on a specific device without pushing it into thrashing, for example - is nearly impossible without simply running a test.
The problem is that the numbers exported by the current kernels are nearly meaningless. The reported virtual size of an application is nearly irrelevant; it says nothing about how much of that virtual space is actually being used. The resident set size (RSS) number is a little better, but there is no information on sharing of pages there. The /proc/pid/smaps file gives a bit of detail, but also lacks sharing information. And the presence of memory pressure can change the situation significantly.
The Linux virtual memory system, in other words, is a black box which provides too little information on what is going on inside. Matt's project is to open up that box and shine some light inside.
The first step is to add a new file (pagemap) in each process's /proc directory. It is a binary file containing the page frame number for each page in the process's address space. The file can be read to see where a process's pages have been placed and, more interestingly, it can be compared between processes to see which pages are being shared. Matt has a little graphical tool which can display this file, showing the patterns of which pages are present in memory and which are not.
Then, there is a file (/proc/kpagemap) which provides information about the kernel's memory map. For each physical page in the system, kpagemap contains the mapping count and the page flags. This information can be used to learn about sharing of pages and about how each page is being used. There were a couple of graphical applications using this file as well; one showed the degree to which each page is being shared, while the other showed the use of each page as determined by its flags.
Once this information is available, one can start to generate some useful numbers on memory use. Matt is proposing two new metrics. The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500. The unique set size (USS), instead, is a simple count of unshared pages. It is, for all practical purposes, the number of pages which will be returned to the system if the process is killed.
These numbers are relatively expensive to calculate, since they required a pass through the process's address space. So they will not be something which is regularly exported from the kernel. They can be calculated in user space using the pagemap files, though. Matt demonstrated a couple of tools to do these calculations. Using "memstats" on a galeon process, he supplemented the currently-available virtual size and resident set size numbers (105MB and 41MB, respectively) with a PSS of 26MB and a USS of 20MB. There is also a "memrank" tool which lists processes in the system sorted by decreasing PSS. With a tool like that, finding the memory hogs on the system becomes a trivial task.
Matt pointed out that these numbers, while useful, will change depending on the amount of memory pressure being experienced by the system. It would be nice to be able to figure out how much memory a given process truly needs before it will begin to thrash. To this end, his patch creates a new clear_refs file for each process; this file can be used to reset the "referenced" flag on each page in the process's working set. After the process runs for a bit, one can look at which pages have had their referenced bits set again; those are the pages it actually needed to run during that time.
The patches are in the -mm tree currently; it's possible that they could find their way into the mainline once the 2.6.22 merge window opens up. Those who would like to play with Matt's scripts can find them in this directory; the slides from his talk are packaged there as well. With luck, understanding system memory usage will require far less guesswork in the near future.
This story, "Kernel space: How much memory am I really using?" was originally published by LinuxWorld-(US).