Infrequently Asked Questions: memory, RPM recovery, monitoring

Most of my memory is gone, my hard drive just became half-sized, and now the RPM command won't answer my queries. Can anyone at help?

ost of my memory is gone, my hard drive just became half-sized, and now the RPM command won't answer my queries. Can anyone at help?

Where is all my RAM going?

When people migrate to Linux, one of the first things they notice is that the amount of free RAM seems to be significantly lower than they've been accustomed to seeing. "What's happening with my RAM?" they will frequently ask.

Using my system as an example (IBM ThinkPad T40, 512Mg RAM, SuSE v9.3) let's inspect my memory usage (in KDE, System → Monitor → Memory) For context, the only apps I am running are Firefox, gkrellm, and kate, but out of my 503Mg of total available RAM, only 71Mg is displayed as being free. Now, at first glance that might seem alarming, but because Linux actively works to keep recently used data cached in memory, the amount of RAM that appears to be in use is higher than it really it. The performance advantage to keeping recently-used data cached in RAM is that if you need to access it again, the system does not have to re-read it from the hard drive. That's a huge difference, given that access times within RAM are measured in nanoseconds, compared to milliseconds with physical I/O.

To distinguish between RAM that is actively being used and RAM that is cached, you can open a console session and enter the command free -m The results will look something like

default@linux:~> free -m
             total       used       free     shared    buffers     cached
Mem:           503        431         71          0         67        185
-/+ buffers/cache:        179        324
Swap:          243          0        243

This will give a much truer picture of how the RAM is actually being used by your apps. The "Mem" line shows that 431Mg out of my 503Mg total is being used, but the second line "-/+ buffers/cache" gives us the real picture as to how much RAM is being used for caching. In a sense, cached memory is being used, but not actively, and thus is would become immediately available if your system needed it. In any event by caching that data, your system is making maximum use of otherwise idle resources, and thus a seemingly high memory usage in Linux should be interpreted as your system trying to take the maximum advantage of the physical RAM it has, which is desirable. As an aside, if you are wondering why the system only shows 503Mg when I physically have 512Mg, it's because the kernel can never be swapped out, and therefore the RAM it occupies is off limits to all other uses.

Two final comments: if you are interested in observing how much memory individual processes are using, check out the "top" command. Lastly, if you have large amounts of physical RAM (meaning >1G) but your system only shows 880Mg total, you will want to recompile your kernel, and enable High Memory Support (4G) in order for all your RAM to be recognized.

-- J.W.

Copying an old hard drive with dd?

I installed a new 40G hard drive to replace a creaking 19G hard drive with Microsoft Windows XP on it. I used dd in Knoppix to copy an image of the whole old drive to the new one. Now in Windows it displays as just being 19G, so half the new capacity is invisible. Debian and Knoppix read it correctly.

My Windows colleague blamed me for using Knoppix, but any ideas how I can get Windows to see the rest of the space?

-- thread

dd copies data from one drive to another in a bitwise copy. Rather than trying to understand files, it just read()s and write()s data at the block level. This is good in the sense that it is usable regardless of filesystem type and also preserves things like deleted files for data recovery applications. The downside is that it copies the filesystem exactly as is: in this case, this was a 19G filesystem. A number of approaches to work around this are possible.

The first option is for NTFS systems, which most Windows installs are these days. There's a handy tool called 'ntfsresize'. As its name implies, ntfsresize allows you to resize NTFS filesystems. Run without any arguments, it will automatically resize the filesystem to fill the entire partition. Alternatively, you can use one of the graphical variants of parted, such as GParted, which operate as frontends to ntfsresize. These are even more helpful in cases where the partition table was copied with the original dd (generally a bad idea with drives of different size, anyway).

Another option, if Linux can read the source drive and write to the filesystem on the destination drive, is to simply use cp -R. Recent stabilization of arbitrary NTFS write support in the Linux ntfs driver ( make this a much more convenient and viable option than would have been a year or so ago.

-- Matir

Understanding the "tee" command

I've always used the redirect symbols to write to a file (>) or append to a file (>>). What does tee add that you all use it instead of redirects?

-- thread

DotHQ had just recently heard of the utility called 'tee' (part of the GNU coreutils) and that it had similar functionality to the standard shell output redirectors. He wanted to know more about it and how it was more useful or different from shell redirection. Several users posted parts of the answers to his question.

Tee copies "standard input to each FILE, and also to standard output." It allows you to make multiple copies of the output of a command, rather than a single destination. It also directs output to the standard output on the screen. This functionality is commonly used when one wants to log the output of a command to a file as well as monitor it in real-time on the console. This is tee's most basic invocation: ./foo | tee foo.log. In that case, you can both watch the output and have a record in foo.log.

In some cases, you might want multiple copies of the same file. While you could use something like ./foo | tee foo.log | tee /nfs/foo.log, but tee is more helpful than that by allowing output to multiple files, as in ./foo | tee foo.log /nfs/foo.log.

Maybe you want the output of a series of commands in one file as well as one file per command. To output multiple commands to a single file without overwriting, one would normally use the shell >gt; operator. Tee provides this same append functionality with the -a flag. For example:

for dir in etc home
   do ls /$dir | tee -a ls.log ls.$dir.log

Tee is also equivalent to the tail's -f option in some cases. Rather than writing something like ./bar > bar.log & ; tail -f bar.log one can simply write ./bar | tee bar.log'.

As a final use, tee makes it convenient to only show errors to the console while logging all output. We can also add a separate error log. Suppose the 'bar' program prefixes all Errors with 'Error: '. We can grep through to grab the error lines as well as logging output: ./bar | tee bar.log | grep '^Error:' | tee bar.error.log

Tee is a useful but underutilized utility for managing the output of a lot of your commands. Hopefully this will have shown you a little bit about the flexibility tee offers.

-- Matir

Two RPM questions

Is it possible to trace the install date of packages?

-- thread

Yes, using the RPM package manager and a bit of scripting glue, that is very easy. Running rpm --querytags|grep -i install gives (among other choices) the "INSTALLTIME" tag, and rpm --querytags|grep -i name gives (among other choices) the "NAME" tag. Now rpm will show the time as epoch seconds (seconds since 00:00:00 UTC, January 1, 1970). We'll use "date" to convert it. Here's the Bash function we use:

epoch2date() {
    EPOCH="$1"; /bin/date '+%Y-%m-%d %H:%M:%S %Z' --date "Jan 1, 1970 01:00:00 + $EPOCH seconds"; }

Now we glue it together using rpm's flexible "queryformat" option:

#!/bin/sh -
rpm -qa --queryformat="%{NAME} %{INSTALLTIME}\n" | while read pkg epoch; do
 echo "${pkg} $(epoch2date $epoch)"
exit 0

I suspect (fear) the RPM database is corrupt because "rpm -qa" gives no output and "rpm -q {known installed rpm}" gives package is not installed message.

I cannot install/upgrade any rpms as it fails on all dependencies. Other than the rpm problem the server seams to be functioning correctly.

Is there any way of resolving this RPM issue, without needing a reinstall?

-- thread

What we're looking at is a broken RPM database. Bummer. Since rpm --rebuilddb didn't work out as expected and the user didn't make backups but still has the output from the /etc/cron.daily/rpm cronjob, we're going to use RPM in reverse, that is: populate the RPM database with already installed packages.

I would like to emphasize you should not do this if you don't trust your system or worse, to "repair" a compromised system!

That said, here's what we will do. We'll initiate the RPM database, take the output from /var/log/rpmpkgs (which contains the names of all the installed RPM packages), find the RPMs and enter the package data in the database. There's one caveat: if you store all available RPMs in one directory then they will all be found in one run. If you use removable media you will have to run the script multiple times until all RPMs are "installed". The "--justdb" option makes sure we only enter data in the database and the "--no.*" options are extra guards against making involuntarily changes to the filesystem. If you would like to run the script in "dry run" mode just add the "-test" option.

#!/bin/sh -
# The umask setting makes sure access rights are not too permissive
umask 0027
# Make a backup just in case
mv -f /var/lib/rpm /var/lib/.rpm
# Create the RPM database directory
mkdir /var/lib/rpm
# Initialise the database
rpm --initdb
cat /var/log/rpmpkgs | while read name; do
 find /mnt/cdrom -maxdepth 5 -type f -name "$name" | while read package
  /bin/rpm --install --noscripts --nodeps --notriggers --justdb "$package" > /dev/null
exit 0

If you can get a listing with "rpm -qa" then this task is finished.

-- unSpawn

Monitoring the easy way

I need to make a regular check of the free space on one of the mounts and issue a few commands, like email or something, if the free space drops down and passes a threshold.

Am I right to think about a solution of a script to put into cron? Or is there any other more reasonable or automated procedure?

-- thread

(See also service check script thread, Monitoring Linux Servers thread)

If you have ever come across a service that just broke when you needed it or your /var/log partition filling up killing Apache, you know how annoying the situation can be. There are different ways to monitor a process, and often people just make a restart script and run it as a cron job to check if the process still is alive. If you have a lot to monitor this can quickly grow into an unmanageable situation. I'd like to take a minute to plug one application that has saved my hide a few times: Monit. It can do all sorts of process checks ranging from available diskspace to checking remote SSL-enabled services for particular responses, alert via email and perform any "healing" actions you would want.

Looking at these questions, what would for instance a free diskspace check look like? Say you would like to monitor /var every minute, get an alert when the disk reached 90% capacity and halt (this is an example, OK?) the box if it reaches 99% capacity. Start by setting the polling interval in the global configuration of /etc/monit.conf to a sensible value. What you want depends on your polling needs for different services. Let's poll every 30 seconds:

set daemon  30

Now add an email recipient who will receive (and read!) any alerts:

set alert bill@localhost

Fill in the "service" name (let's call it "space_var"), the partition (/dev/hda1), the alert and add the halt command:

check device space_var with path /dev/hda1 every 2 cycles
 if space usage > 90 % then alert
 if space usage > 99 % then exec "/sbin/shutdown -nh now"

Save, then run a configuration check: monit -c /etc/monit.conf -t which should return the message "Control file syntax OK". Finally pkill -HUP -f monit to make it reread its configuration file if it's already running or (re)start it manually.

So, what could a process and network check look like? Mindful of the phrase "who watches the watchmen" we choose to monitor Nagios, the Network monitor process on localhost and the network connection on the remote box "mincemeat":

Local Nagios process, every 10 minutes, restart when failed:

check process nagios with pidfile /var/run/ every 20 cycles
 start program = "/etc/init.d/nagios start"
 stop program = "/etcinit.d/nagios stop"

Nagios on remote host mincemeat with the URI Nagios should reside at, every hour and alert when unable to connect:

check host nagios with address mincemeat every 120 cycles
 if failed url https://login:password@ then alert

I'd say that's even easier than scripting.

-- unSpawn

Learn more about this topic

NTFS on Linux

monit: utility for managing and monitoring, processes, files, directories and devices

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2006 IDG Communications, Inc.