Americas

  • United States

A problem with NLMs not cleaning up after themselves

Opinion
Sep 14, 20044 mins
Enterprise Applications

* Memory fragmentation issue with NetWare 6.0/6.5

Ever since Windows NT was first released (strangely enough as Version 3.5 – Microsoft never did get the hang of proper numbering), NetWare aficionados have swapped stories about uptime – how long a server goes between bootups. Windows servers used to have uptime numbers that were measured in hours or days, while NetWare uptime was measured in months or even years. With older versions of NT, starting or stopping a service often meant you needed to reboot while recent versions of NetWare haven’t needed a reboot even to upgrade the operating system.

So it was with a great deal of shock that I read Novell’s Technical Information Document (TID) number 10091980, “Memory Fragmentation Issue with NetWare 6.0/6.5” (https://support.novell.com/cgi-bin/search/searchtid.cgi?/10091980.htm).

This TID has been modified since I last looked at it. It now incorporates the changes and patches that I’ll be mentioning later in this newsletter. Too bad I didn’t save a copy of the original. Too bad Novell is trying to re-write the story.

Now memory fragmentation isn’t a new issue for NetWare, it’s been around – off and on – since the days of NetWare 3. Some NLMs simply don’t do a good job of cleaning up after themselves – when allocating RAM they leave little bits and pieces unavailable for other services and applications. Over time, these little bits and pieces add up so that there’s not enough RAM left for new services. Rebooting clears this up, but it’s always been a top priority at Novell to correct the situation, not simply to work around it.

So when I heard from more than one reader that “Memory management issues in NW6.5 are causing *many* servers to abend during nightly backups” I checked the TID database and found the rather long-winded #10091980 which said, in about 3,000 words, “re-boot the server periodically and you won’t have any problems.”

It also said that Novell wouldn’t be spending resources to correct the situation but did give a whole laundry list of settings to change, which might enable the server to stay up a bit longer by reducing the amounts of RAM in some memory pools and reallocating others. But you still needed to re-boot fairly frequently. The “culprit” was generally determined to be the TSAFS module, an interface for archive systems to use when backing up the file system.

My first thought, of course, was that Microsoft had taken over the Provo engineering offices of Novell while the bigwigs were napping in Boston.

So I fired off a stiff note to Novell’s PR team, which, although is often kept as much in the dark as I am, did yeoman work and tracked down a definitive answer.

According to Novell’s PR, the issue was fixed with NetWare 6.5 Support Pack 2 (SP2). Checking back to the TID, I note it now says “There has been information written [not be my, at least not until today] accusing TSAFS.NLM of fragmenting memory on netware [sic!] servers.  TSAFS.NLM does not cause fragmentation problems.” Gee, if not TSAFS, what was causing the problems during backup?

According to the TID, “These issues were part of the netware [sic] operating system, and have been addressed in later versions of server.exe.” (such as the one in SP2, it implies). A plausible answer, surely, but one that doesn’t stand close examination. With server.exe loaded into memory, any NLM subsequently loaded becomes, in effect, a new module that’s part of the operating system. A problem within that new module can be attributed as a problem with a “part of the NetWare OS.”

But here’s the real kicker. Immediately after those quotes, the TID goes on to say: “TSAFS.NLM can and should be configured to use less ram [sic] in such a situation.” So unless you re-configure TSAFS to use less RAM, then your server will run out of available RAM and need to be re-booted (or crash). But, says the TID, “again, TSAFS.NLM is not the cause of the fragmentation.”

So why not rewrite TSAFS so it uses less RAM? Why not fix the proximate cause of the problem?

Still, if you are noticing that you need to reboot NetWare 6.x with an alarming frequency (i.e., more than once a year) the TID does have that whole laundry list of settings you can play with to modify its memory usage and perhaps get back to counting uptime in terms of years – or even decades.