Resource Leaks – Increasing Quotas is not always the solution

I recently answered a query about the OpenVMS $ASSIGN system service returning SS$_NOIOCHAN (no channel available). The solution seems obvious: Increase CHANNELCNT. However, in my experience increasing CHANNELCNT is most often a misleading palliative, masking the immediate symptom, not an actual solution to the underlying problem. Increasing CHANNELCNT only delays the inevitable failure.

In my experience, I/O channel shortages are most often consequences of I/O channel mismanagement, not actual I/O channel shortages.

The underlying problem is often a creeping resource leak: the $DASSGN system service1 has not been used to release no longer needed channels. Worse, increasing CHANNELCNT2 requires authority to both modify the system parameter file and a system reboot, disrupting other users. The actual correction requires neither elevated privileges nor system reboots.

Executing programs often assign and de-assign channels throughout their execution, either explicitly using the $ASSIGN and $DASSGN system services or implicitly through RMS OPEN/CLOSE requests.1,3 Channel leakage is the same anti-pattern as a memory leak: Resources are allocated and never released, even though their usefulness has ended. Starvation eventually ensues.

What symptoms distinguish channel starvation from simple insufficiency? Timing.

When does $ASSIGN return SS$_NOIOCHAN? Immediately upon initiation? After a particular function or input file is executed? At a random times past initiation with no specific pattern? Each underlying problem has subtly different symptoms. A simple debugging statement displaying channel numbers as they are assigned will reveal that at least some channel numbers are never released and reused.

If one’s account is authorized for CMKRNL access, SDA can be used to look at a problem using the DCL command ANALYZE/SYSTEM. In SDA, the command SHOW PROCESS/ID=nn/CHANNELS will display the channels associated with the specified process.

If the error condition occurs during initial program startup, it is less likely that the problem occurs from unreleased channels and more likely a simple lack of sufficient channels. Static analysis often identifies this situation.

The individual account's FILLM4 can be part of the problem, as can the running system's CHANNELCNT, PQL_DFILLM, and PQL_MFILLM.5,6 all are easily checked from any user account using the F$GETSYI and F$GETJPI DCL lexical functions, as shown below:

$ write sys$output f$getsyi("channelcnt")
512
$ write sys$output f$getjpi(0, "fillm")
128

Changing the system-wide CHANNELCNT requires write access to the system parameter file followed by a system reboot. The architecture-dependant system parameter file name is located in SYS$SPECIFIC:[SYSEXE]:

By contrast, changing a user-specific FILLM only requires write access to the system authorization file, generally SYS$COMMON:[SYSEXE]SYSUAF.DAT. The location and name of the active SYSUAF may be different in a particular OpenVMS system, particularly in OpenVMScluster environments.

Changing CHANNELCNT requires at least a system reboot. Symmetrically configured OpenVMScluster nodes generally have a common CHANNELCNT; OpenVMScluster members with dramatically different configurations may have different CHANNELCNT values. This was more common when memory capacities were more constrained.

If failures are associated with a particular function, or group of related functions, the underlying problem may be either the previously described resource shortage or resource leakage. As in the previously described context, the first step is static analysis of the function to determine the failure point. Does the problem happen immediately upon executing the function, or does the problem not present itself on first execution but only on the second, third, or subsequent execution. Is that number constant, or does it vary?

Sometimes leakage is a simple matter of a missing $DASSGN or RMS CLOSE call in an otherwise unremarkable code path. Sometimes the missing call is skipped due to an unrelated condition. Branching and conditional code paths are more difficult to analyze than simple omissions. The missing $DASSGN may be many layers deep in a series of nested subroutines and functions. In object-oriented languages, the omission may lie in a destructor or in the failure to execute a nested destructor.

Taking a more strategic perspective, channel leakage and other resource leakage problems are creeping problems. Leakage begins as a nuisance that can be accommodated by increased resources until aggregate leakage impacts operations. It is far more strategic to eliminate resource leakage issues when they are small and seemingly unimportant, long before they lead to production failures.

Notes

[1]VMS Software, Inc. (2024, February) VSI OpenVMS System Services Reference Manual: A–GETUAI, pp 372
[2] CHANNELCNT
[3]VMS Software, Inc. (2024, February) VSI OpenVMS System Services Reference Manual: A–GETUAI pp 118
[4]FILLM is defined per username in the active system authorization file, by default SYS$SYSTEM:[SYSEXE]SYSUAF.DAT
[5]The $CREPRC system service determine a process' FILLM during process creation by using the larger of the username FILLM retrieved from the system authorization file and the running system's PQL_MFILLM parameter.
[6]Lawrence Kenah and Simon F Bale (1984) VAX/VMS Internals and Data Structures Digital Press EY-000014-DP Section 18.1.1 Channel Assignment, pp 393
[7]VMS Software, Inc. (2024, February) VSI OpenVMS System Management Utilities Reference Manual, Volume II: M–Z Document Number: DO-DSYUR2-01A
CHANNELCNT page 481
PQL_DFILLM page 521
PQL_MFILLM page 522

References

URLs for referencing this entry

Picture of Robert Gezelter, CDP
RSS Feed Icon RSS Feed Icon
Follow us on Twitter
Bringing Details into Focus, Focused Innovation, Focused Solutions
Robert Gezelter Software Consultant Logo
http://www.rlgsc.com
+1 (718) 463 1079