Frequently Asked Questions about lsof ********************************************************************** | The latest release of lsof is always available via anonymous ftp | | from vic.cc.purdue.edu. Look in pub/lsof.README for its location. | ********************************************************************** ______________________________________________________________________ This file contains frequently asked questions about lsof and answers to them. Vic Abell August 21, 2000 ______________________________________________________________________ Table of Contents: 1.0 General Concepts 1.1 Lsof -- what is it? 1.2 Where do I get lsof? 1.2.1 Are there mirror sites? 1.2.2 Are lsof executables available? 1.3 Is lsof year 2000 (Y2K) compliant? 2.0 Lsof Ports 2.1 What ports exist? 2.2 What about a new port? 2.2.1 User-contributed Ports 2.3 Why isn't there an AT&T SVR4 port? 2.4 Why isn't there an SGI IRIX port? 3.0 Lsof Problems 3.1 Why doesn't lsof report full path names? 3.1.1 Why do lsof -r reports show different path names? 3.1.2 Why does lsof report the wrong path names? 3.1.3 Why doesn't lsof report path names for unlinked (rm'd) files? 3.1.4 Why doesn't lsof report the "correct" hard linked file path name? 3.2 Why is lsof so slow? 3.3 Why doesn't lsof's setgid or setuid permission work? 3.4 Does lsof have security problems? 3.5 Will lsof show remote hosts using files via NFS? 3.6 Why doesn't lsof report locks held on NFS files? 3.6.1 Why does lsof report a one byte lock on byte zero as a full file lock? 3.7 Why does lsof report different values for open files on the same file system (the automounter phenomenon)? 3.8 Why don't lsof and netstat output match? 3.8.1 Why can't lsof find accesses to some TCP and UDP ports? 3.9 Why does lsof update the device cache file? 3.10 Why doesn't lsof report state for UDP socket files? 3.11 I am editing a file with vi; why doesn't lsof find the file? 3.12 Why doesn't lsof report TCP/TPI window and queue sizes for my dialect? 3.13 What does "no more information" in the NAME column mean? 3.14 Why doesn't lsof find a process that ps finds? 3.15 Why doesn't -V report a search failure? 3.16 Portmap problems 3.16.1 Why isn't a name displayed for the portmap registration? 3.16.2 How can I display only portmap registrations? 3.16.3 Why doesn't lsof report portmap registrations for some ports? 3.17 Why is `lsof | wc` bigger than my system's open file limit? 3.18 Why doesn't lsof report file offset (position)? 3.19 Problems with path name arguments 3.19.1 How do I ask lsof to search a file system? 3.19.2 Why doesn't lsof find all the open files in a file system? 3.19.3 Why does the lsof exit code report it didn't find open files when some files were listed? 3.19.4 Why won't lsof find all the open files in a directory? 3.19.5 Why are the +D and +d options so slow? 3.19.6 Why do the +D and +d options produce warning messages? 3.20 Why can't my C compiler find the rpcent structure definition? 3.21 Why doesn't lsof report fully on file "foo" on UNIX dialect "bar?" 3.22 Problems loading and executing lsof 3.22.1 Why do I get a complaint when I execute lsof that some library file can't be found? 3.22.2 Why does lsof complain it can't open files? 3.22.3 Why does lsof warn "compiled for x ... y; this is z."? 3.22.4 How can I disable the kernel identity check? 3.23 Why don't ps(1) and lsof agree on the owner of a process? 3.24 Why doesn't lsof find an open socket file whose connection state is past CLOSE_WAIT? 3.25 Why don't machine.h definitions work when the surrounding comments are removed? 3.26 What do "can't read inpcb at 0x...", "no protocol control block", "no PCB, CANTSENDMORE, CANTRCVMORE", etc. mean? 3.27 What do the "unknown file system type" warnings mean? 4.0 AIX Problems 4.1 What is the Stale Segment ID bug and why is -X needed? 4.1.1 Stale Segment ID APAR 4.2 Gcc Work-around for AIX 4.1x 4.3 Gcc and AIX 4.2 4.4 Why won't lsof's Configure allow the use of gcc for AIX below 4.1? 4.5 What is an AIX SMT file type? 4.6 Why does AIX lsof start so slowly? 4.7 Why does exec complain it can't find libc.a[shr.o]? 4.8 What does lsof mean when it says, "TCP no PCB, CANTSENDMORE, CANTRCVMORE" in a socket file's NAME column? 4.9 When the -X option is used on AIX 4.3.3, why does lsof disable it, saying "WARNING: user struct mismatch; -X option disabled?" 5.0 BSD/OS BSDI Problems 5.1 Why doesn't lsof report on open kernfs files? 6.0 DEC OSF/1, Digital UNIX, and Tru64 UNIX Problems 6.1 Why does lsof complain about non-existent /dev/fd entries? 6.2 Why does the Digital UNIX V3.2 ld complain about Ots* symbols? 6.3 Why can't lsof locate named pipes (FIFOs) under V3.2? 6.4 Why does lsof use the wrong configuration header files? For example, why can't the lsof compilation find cpus.h? 6.5 Why does lsof indicate incomplete paths with " -- " for Tru64 UNIX 5.1 files? 7.0 FreeBSD Problems 7.1 Why doesn't lsof report on open kernfs files? 7.2 Why doesn't lsof work under FreeBSD 4.0? 8.0 HP-UX Problems 8.1 Why doesn't an HP-UX lsof compilation use -O? 8.2 Where is x25L3.h under HP-UX 9.x? 8.3 Why doesn't lsof report all HP-UX 9.x locks? 8.4 Why doesn't lsof report HP-UX 10.20 locks correctly? 8.5 Why doesn't the CCITT support work under 10.x? 8.6 Why can't lsof be compiled with `cc -Aa` or `gcc -ansi` under HP-UX 10.x? 8.7 Why does lsof complain about no C compiler? 8.8 Why does Configure complain about q4 for HP-UX 11? 9.0 Linux Problems 9.1 What do /dev/kmem-based and /proc-based lsof mean? 9.2 /dev/kmem-based Linux lsof Questions 9.2.1 Why doesn't /dev/kmem-based lsof work (or even compile) on my Linux system? 9.2.1.1 Why does /dev/kmem-based Configure complain about /usr/src/linux? 9.2.2 Why does /dev/kmem-based lsof complain about /dev/kmem? 9.2.3 Why can't /dev/kmem-based lsof find kernel addresses? 9.2.4 Why does /dev/kmem-based lsof have trouble reading kernel structures? 9.2.5 Where is the system map file for kernel symbol->address translations for /dev/kmem-based lsof? Why doesn't it match my kernel? 9.2.5.1 What do kernel symbol address mismatch error messages mean for /dev/kmem-based lsof? 9.2.5.2 Why does /dev/kmem-based lsof complain that query_module is unimplemented? 9.2.6 Why does /dev/kmem-based lsof complain about the random_fops and urandom_fops kernel symbols? 9.2.7 Why does /dev/kmem-based lsof complain about get_kernel_syms()? 9.2.8 Why does /dev/kmem-based lsof complain "WARNING: uncertain kernel loader format; assuming..."? 9.2.9 /dev/kmem-based lsof Problems Under Linux 2.1.x 9.2.9.1 Why does Configure say ``Testing lseek() with gcc'' for /dev/kmem-based lsof for Linux 2.1.x? 9.2.9.2 Why does Configure's lseek() test complain about read permissions for /dev/kmem-based lsof for Linux 2.1.x? 9.2.10 What do the /dev/kmem-based lsof WARNING messages about kncache.h mean? 9.2.11 What does "WARNING: kncache.h defines no kernel name caches." from /dev/kmem-based lsof mean? 9.2.12 Why doesn't my /dev/kmem-based lsof have PPID support? 9.3 /proc-based Linux lsof Questions 9.3.1 Why doesn't /proc-based lsof report file offsets (positions)? 9.3.2 Why does /proc-based lsof report "can't identify protocol" for some socket files? 9.3.3 Why does /proc-based lsof warn about unsupported formats? 9.3.4 Why does /proc-based lsof report "(deleted)" after a path name? 9.3.5 Why doesn't /proc-based lsof report full open file information for all processes? 9.3.6 Why won't Customize offer to change HASDCACHE or WARNDEVACCESS for /proc-based lsof? 9.4 What about lsof, Linux, and the Alpha or SPARC processors? 10.0 NetBSD Problems 10.1 Why doesn't lsof report on open kernfs files? 11.0 NEXTSTEP and OpenStep Problems 11.1 Why can't lsof report on 3.1 lockf() or fcntl(F_SETLK) locks? 11.2 Why doesn't lsof compile for NEXTSTEP with AFS? 12.0 OpenBSD Problems 12.1 Why doesn't lsof support kernfs on my OpenBSD system? 12.2 Will lsof work on OpenBSD on non-Intel-based architectures? 12.3 problems 12.3.1 Why does the compiler claim nbpg isn't defined? 12.3.2 What value should I assign to nbpg? 13.0 Output problems 13.1 Why do the lsof column sizes change? 13.2 Why does the offset have ``0t' and ``0x'' prefixes? 13.3 What are the values printed in the FILE_FLAG column and why is 0x sometimes included? 13.3.1 Why doesn't lsof display FILE_FLAG values for my dialect? 13.4 Network Addresses 13.4.1 Why does lsof's -n option cause IPv4 addresses, mapped to IPv6, to be displayed in IPv6 notation? 14.0 Pyramid Version Problems 14.1 DC/OSx Problems 14.2 Reliant UNIX Problems 14.2.1 Why does lsof complain that it can't find /stand/unix? 14.2.2 Why does lsof complain about bad kernel addresses? 14.2.3 Why does the Reliant C compiler give so many warning messages when compiling lsof? 14.2.4 Why does the lsof compilation require -Klp64 for Reliant UNIX 5.44 and why does my compiler reject it? 15.0 SCO Problems 15.1 SCO OpenServer Problems 15.1.1 How can I avoid segmentation faults when compiling lsof? 15.1.2 Where is libsocket.a? 15.1.3 Why do I get "warning C4200" messages when I compile lsof? 15.2 SCO UnixWare Problems 16.0 Sun Problems 16.1 My Sun gcc-compiled lsof doesn't work -- why? 16.2 How can I make lsof compile with gcc under Solaris 2.[456], 2.5.1, or 7? 16.3 How can I make lsof compile with gcc under SunOS 4.1.x? 16.4 Why does Solaris Sun C complain about system header files? 16.5 Why doesn't lsof work under my Solaris 2.4 system? 16.6 Where are the Solaris header files? 16.7 Where is the Solaris /usr/src/uts//sys/machparam.h? 16.8 Why does Solaris lsof say ``can't read proc table''? 16.9 Why does Solaris lsof complain about a bad cached clone device? 16.10 Why doesn't Solaris make generate .o files? 16.11 Why does lsof report some Solaris 2.3 and 2.4 lock types as `N'? 16.12 Why does lsof Configure say "WARNING: no cc in ..."? 16.13 Solaris 7 and 8 Problems 16.13.1 Why does lsof say the compiler isn't adequate for Solaris 7 or 8? 16.13.2 Why does Solaris 7 or 8 lsof say "FATAL: lsof was compiled for..."? 16.13.3 How do I build lsof for a 64 bit Solaris kernel under a 32 bit Solaris kernel? 16.13.4 How do I install lsof for Solaris 7 or 8? 16.13.5 Why does my Solaris 7 or 8 system say it cannot execute lsof? 16.13.6 How do I build a gcc that will produce 64 bit Solaris 7 and 8 executables? 16.13.7 Why does lsof on my Solaris 7 or 8 system say, "can't read namelist from /dev/ksyms?" 16.14 Solaris and COMMON 16.14.1 What does COMMON mean in the NAME column for a Solaris VCHR file? 16.14.2 Why does a COMMON Solaris VCHR file sometimes seem to have an incorrect minor device number? 16.15 Why don't lsof and Solaris pfiles reports always match? 16.16 Why doesn't lsof report node number and name for some SunOS 4.1.4 Auspex processes? 17.0 Lsof Features 17.1 Why doesn't lsof doesn't report on /proc entries on my system? 17.2 How do I disable the device cache file feature or alter it's behavior? 17.2.1 What's the risk with a perverted device cache file? 17.2.2 How do I put the full host name in a personal device cache file path? 17.2.3 How do I put the personal device cache file in /tmp? 17.3 Why doesn't lsof know about AFS files on my favorite dialect? 17.3.1 Why doesn't lsof report node numbers for all AFS volume files, or how do I reveal dynamic module addresses to lsof? ______________________________________________________________________ 1.0 General Concepts 1.1 Lsof -- what is it? Lsof is a UNIX-specific tool. Its name stands for LiSt Open Files, and it does just that. It lists information about files that are open by the processes running on a UNIX system. See the lsof man page, the 00DIST file, the 00QUICKSTART file, and the 00README file of the lsof distribution for more information. 1.2 Where do I get lsof? Lsof is available via anonymous ftp from vic.cc.purdue.edu. Look in the pub/tools/unix/lsof sub-directory. Compressed and gzip'd tar files with PGP certificates are available. 1.2.1 Are there mirror sites? The lsof distribution is currently mirrored at: ftp://coast.cs.purdue.edu/pub/tools/unix/lsof ftp://ftp.auscert.org.au/pub/mirrors/vic.cc.purdue.edu/lsof ftp://ftp.cert.dfn.de/pub/tools/admin/lsof ftp://ftp.crc.doc.ca/packages/lsof ftp://ftp.cs.columbia.edu/archives/lsof ftp://ftp.fu-berlin.de/pub/unix/tools/lsof ftp://ftp.gre.ac.uk/pub/tools/lsof ftp://ftp.rge.com/pub/lsof ftp://ftp.sunet.se/pub/unix/admin/lsof ftp://ftp.tau.ac.il/pub/unix/admin ftp://the.wiretapped.net/pub/security/host-security/lsof/ ftp://ftp.tu-darmstadt.de/pub/sysadmin/lsof ftp://ftp.tux.org/pub/sites/vic.cc.purdue.edu/tools/unix/lsof ftp://ftp.unicamp.br/pub/unix-tools/lsof ftp://ftp.uni-mainz.de/pub/misc/lsof ftp://ftp.web.ad.jp/pub/UNIX/tools/lsof ftp://gd.tuwien.ac.at/utils/admin-tools/lsof ftp://raver.net/pub/lsof ftp://sunsite.icm.edu.pl/pub/unix/net/lsof ftp://sunsite.ualberta.ca/pub/Mirror/lsof ftp://wuarchive.wustl.edu/packages/security/lsof 1.2.2 Are lsof executables available? Some lsof executables are available in the subdirectory tree pub/tools/unix/lsof/binaries These are neither guaranteed to be current nor cover every dialect and machine architecture. I don't recommend you use pre-compiled lsof binaries; I recommend you obtain the sources and build your own binary. Even if you're a Sun user without a Sun C compiler, you can use gcc to compile lsof. If you must use a binary file, please be conscious of the security and configuration implications in using an executable of unknown or different origin. The lsof binaries are accompanied by PGP certificates. Please use them! Three additional cautions apply to executables: 1. Don't try to use an lsof executable, compiled for one version of a UNIX dialect, on another. 2. A SunOS lsof executable, compiled for one Sun architecture, won't work on different Sun architecture, even if both systems run the same version of SunOS. 3. A Solaris lsof executable, compiled for one Sun architecture, isn't guaranteed to work on a different Sun architecture, even if both systems run the same version of Solaris. 1.3 Is lsof year 2000 (Y2K) compliant? Probably. Lsof doesn't do any time or day computations with anything other than UNIX time_t values -- e.g., checking the time stamps of files. It doesn't use ctime(3) or localtime(3). However, I haven't done any lsof Y2K compliance testing. Since I distribute the lsof sources freely and find it difficult enough to keep lsof running on 40+ UNIX dialects in the face of constant new dialect releases, I expect lsof beneficiaries to share the work. Y2K compliance testing is a share. These people have done lsof Y2K testing and report that it passed their tests: Sylvain Robitaille 2.0 Lsof Ports 2.1 What ports exist? The pub/lsof.README file carries the latest port information: AIX 4.1.[45], 4.2[.1], and 4.3[.123] BSDI BSD/OS 2.1, 3.[01], and 4.[01] for Intel-based systems DC/OSx 1.1 for Pyramid systems DEC OSF/1, Digital UNIX, Tru64 UNIX 2.0, 3.2, 4.0, and 5.[01] FreeBSD 2.1.6, 2.2[.x], 3.[012345], 4.[01], and 5.0 for Intel-based systems HP-UX 9.01, 10.20, and 11.00 Linux 2.0.3[2346] and 2.[1234].x for Intel-based systems NetBSD 1.[2345] for Alpha, Intel, and SPARC-based systems NEXTSTEP 3.[13] OpenBSD 2.[01234567] for Intel-based systems Reliant UNIX 5.4[34] for Pyramid systems SCO OpenServer Release 3.0 and 5.0.[02456] for Intel-based systems SCO UnixWare 2.1.[123] and 7[[.0].1] for Intel-based systems Sequent PTX 2.1.9, 4.2.[13], 4.[34], 4.4.[1246], and 4.5[.1] Solaris 2.5.1, 2.6, 7, 8 BETA, and 8 BETA-Refresh SunOS 4.1.x Ultrix 4.2 Lsof version 4 predecessors, versions 3 and 4, may support older version of some dialects. You can find their distributions on vic.cc.purdue.edu in the pub/tools/unix/lsof/OLD subdirectory. 2.2 What about a new port? The 00PORTING file in the distribution gives hints on doing a port. I will consider doing a port in exchange for permanent access to a test host. I require permanent access so I can test new lsof revisions, because I will not offer distributions of dialect ports I cannot upgrade and test. 2.2.1 User-contributed Ports Sometimes I receive contributions of ports of lsof to systems where I can't test future revisions of lsof. Hence, I don't incorporate these contributions into my lsof distribution. However, I do make these contributions available in the directory: pub/tools/unix/lsof/contrib on vic.cc.purdue.edu. Consult the 00INDEX file in the contrib/ directory for a list of the available contributions. 2.3 Why isn't there an AT&T SVR4 port? I haven't produced an AT&T SVR4 port because I haven't seen a UNIX dialect that is strictly limited to the AT&T System V, Release 4 source code. Every one I have seen is a derivative with vendor additions. The vendor additions are significant to lsof because they affect the internal kernel structures with which lsof does business. While some vendor derivatives of SVR4 are similar, each one I have encounted so far has been different enough from its siblings to require special source code. If you're interested in an SVR4 version of lsof, here are some existing ports you might consider: DC/OSx Reliant UNIX SCO UnixWare Sequent PTX Solaris 2.4 Why isn't there an SGI IRIX port? Lsof support for IRIX was terminated at lsof revision 4.36, because it had become increasingly difficult for me to obtain information on the IRIX kernel structures lsof needs to access. At IRIX 6.5 I decided the obstacles were too large for me to overcome, and I stopped supporting lsof on IRIX. You'll find the sources for last revision of lsof (4.36) for IRIX via anonymous ftp at vic.cc.purdue.edu in: pub/tools/unix/lsof/OLD/src/lsof_4.36.irix.tar.gz If you wish to pursue the issue, don't contact me, contact SGI. This case was opened with SGI on the subject: Case ID: 0982584 Category: Unix Priority: 30-Moderate Impact Problem Summary: kernel structure header files needed for continued lsof support Problem Description: Email In 07/17/98 19:09:23 3.0 Lsof Problems 3.1 Why doesn't lsof report full path names? Lsof reports full path names in limited cases: some systems -- e.g., some SunOS versions -- contain full directory path names in their user structure, so lsof reports those. Lsof reports the full path name when it is specified as a search argument for open files that match the argument. However, if the argument is a file system mounted-on directory, and lsof finds additional path name components from the kernel name cache, it will report them. Lsof reports path name for file system types that have path name lookup features-- e.g., AdvFS for Digital and Tru64 UNIX; the Linux /proc-based lsof reports full path names, because the Linux /proc file system provides them. Otherwise, lsof uses the kernel name cache, where it exists and can be accessed, and reports some or all path name components (e.g., the sys and proc.h components of /usr/include/sys/proc.h) for these dialects: DC/OSx DEC OSF/1, Digital UNIX, Tru64 UNIX FreeBSD HP-UX NetBSD NEXTSTEP OpenBSD OpenStep Reliant UNIX SCO OpenServer SCO UnixWare Sequent PTX Solaris 2.x, 7, 8 BETA, and 8 BETA-Refresh SunOS 4.1.x Ultrix As far as I can determine, AFS path lookups don't share in kernel name cache operations, so lsof can't identify open AFS path name components. Since the size of the kernel name cache is limited and the cache is in constant flux, it does not always contain the names of all components in an open file's path; sometimes it contains none of them. Lsof reports the file system directory name and whatever components of the file's path it finds in the cache, starting with the last component and working backwards through the directories that contain it. If lsof finds no path components, lsof reports the file system device name instead. When lsof does report some path components in the NAME column, it prefixes them with the file system directory name, followed by " -- ", followed by the components -- e.g., /usr -- sys/path.h for /usr/include/sys/path.h. The " -- " is omitted when lsof finds all the path name components of a file's name. Lsof can't obtain path name components from the kernel name caches of the following dialects: AIX Only the Linux kernel records full path names in the structures it maintains about open files; instead, most kernels convert path names to device and node number doublets and use them for subsequent file references once files have been opened. To convert the device and node number doublet into a complete path name, lsof would have to start at the root node (root directory) of the file system on which the node resides, and search every branch for the node, building possible path names along the way. That would be a time consuming operation and require access to the raw disk device (usually implying setuid-root permission). If the prospect of all that local disk activity doesn't concern you, think about the cost when the device is NFS-mounted. Try using the file system mount point and node number lsof reports as parameters to find -- e.g., $ find -inum -print and you may get an appreciation of what a file system directory tree search would cost. 3.1.1 Why do lsof -r reports show different path names? When you run lsof with its repeat (``-r'') option, you may notice that the extent to which it reports path names for the same files may vary from cycle to cycle. That happens because other processes are making kernel calls affecting the cache and causing entries to be removed from and added to it. 3.1.2 Why does lsof report the wrong path names? Under some circumstances lsof may report an incorrect path name component, especially for files in a rapidly changing directory like /tmp. In a rapidly changing directory, like /tmp, if the kernel doesn't clear the cache entry when it removes a file, a new file may be given the same keys and lead lsof to believe that the old cache entry with the same keys belongs to the new file. Lsof tries to avoid this error by purging duplicate entries from its copy of the kernel name cache when they have the same device and inode number, but different names. This error is less likely to occur in UNIX dialects where the keys to the name cache are node address and possibly a capability ID. The BSDI, Digital UNIX, FreeBSD, HP-UX, NEXTSTEP, OpenStep, PTX, SunOS, Solaris, Tru64 UNIX, Ultrix and UnixWare dialects use node address. BSDI, FreeBSD, NetBSD, OpenBSD, Tru64 UNIX, and Ultrix also use a capability ID to further identify name cache entries. 3.1.3 Why doesn't lsof report path names for unlinked (rm'd) files? Lsof never reports a path names for a file that has been unlinked from its parent directory -- e.g., deleted via rm, or the unlink() system call -- even when some process may still hold the file open. That's because the path name is erased from name caches and the parent directory file when the file is unlinked. Unlinked open files are sometimes used by applications for temporary, but invisible storage (i.e., ls won't show them, and no other process can open them.) However, they may occasionally consume disk space to excess and cause concern for a system administrator, who will be unable to locate them with find, ls, du, or other tools that rely on finding files by examining the directory tree. By using lsof's +L option you can see the link count of open files -- in the NLINK column. An unlinked file will have an NLINK value of zero. By using the option +L1 you can tell lsof to display only files whose link count is less than one (i.e., zero). 3.1.4 Why doesn't lsof report the "correct" hard linked file path name? When lsof reports a rightmost path name component for a file with hard links, the component may come from the kernel's name cache. Since the key which connects an open file to the kernel name cache may be the same for each differently named hard link, lsof may report only one name for all open hard-linked files. Sometimes that will be "correct" in the eye of the beholder; sometimes it will not. Remember, the file identification keys significant to the kernel are the device and node numbers, and they're the same for all the hard linked names. 3.2 Why is lsof so slow? Lsof may appear to be slow if network address to host name resolution is slow. This can happen, for example, when the name server is unreachable, or when a Solaris PPP cache daemon is malfunctioning. To see if name lookup is causing lsof to be slow, turn it off with the ``-n'' option. Port service name lookup or portmap registration lookup may also be causes of slow-down. To suppress port service name lookup, specify -P. Lsof doesn't usually make direct portmap calls -- only when +M is specified, or when HASPMAPENABLED is defined during lsof construction. (The lsof help panel, produced with `lsof -h` will display the default portmap registration reporting state.) The quickest first step in checking if lsof is slow because of the portmapper is to use lsof's -M option. AIX lsof may be slow to start because of its oslevel identity comparison. See the "Why does AIX lsof start so slowly?" and "Why does lsof warn "compiled for x ... y; this is z.?" sections for more information. 3.3 Why doesn't lsof's setgid or setuid permission work? If you install lsof on an NFS file system that has been mounted with the nosuid option, lsof may not be able to use the setgid or setuid permission you give it, complaining it can't open the kernel memory device -- e.g., /dev/kmem. The only solution is to install lsof on a file system that doesn't inhibit setgid or setuid permission. 3.4 Does lsof have security problems? I don't think so. However, lsof does usually start with setgid permission, and sometimes with setuid-root permission. Any program that has setgid or setuid-root permission, should always be regarded with suspicion. Lsof drops setgid power, holding it only while it opens access to kernel memory devices (e.g., /dev/kmem, /dev/mem, /dev/swap). That allows lsof to bypass the weaker security of access(2) in favor of the stronger checks the kernel makes when it examines the right of the lsof process to open files declared with -k and -m. Lsof also restricts some device cache file naming options when it senses the process has setuid-root power. On a few dialects lsof requires setuid-root permission during its full execution in order to access files in the /proc file system. These dialects include: DC/OSx 1.1 for Pyramid systems Reliant UNIX 5.4[34] for Pyramid systems When lsof runs with setuid-root permission it severely restricts all file accesses it might be asked to make with its options. The device cache file (typically .lsof_hostname in the home directory of the real user ID that executes lsof) has 0600 modes. (The suffix, hostname, is the first component of the host's name returned by gethostname(2).) However, even when lsof runs setuid-root, it makes sure the file's ownerships are changed to that of the real user and group. In addition, lsof checks the file carefully before using it (See the question "How do I disable the device cache file feature or alter it's behavior?" for a description of the checks.); discards the file if it fails the scrutiny; complains about the condition of the file; then rebuilds the file. See the 00DCACHE file of the lsof distribution for more information about device cache file handling and the risks associated with the file. 3.5 Will lsof show remote hosts using files via NFS? No. Remember, lsof displays open files for the processes of the host on which it runs. If the host on which lsof is running is an NFS server, the remote NFS client processes that are accessing files on the server leave no process records on the server for lsof to examine. 3.6 Why doesn't lsof report locks held on NFS files? Generally lock information held by local processes on remote NFS files is not recorded by the UNIX dialect kernel. Hence, lsof can't report it. One exception is some patch levels of Solaris 2.3, and all versions of Solaris 2.4 and above. Lsof for those dialects does report on locks held by local processes on remotely mounted NFS files. 3.6.1 Why does lsof report a one byte lock on byte zero as a full file lock? When a process has a lock of length one, starting at byte zero, lsof can't distinguish it from a full file lock. That's because most UNIX dialects represent both locks the same way in their file lock (flock or eflock) structures. 3.7 Why does lsof report different values for open files on the same file system (the automounter phenomenon)? On UNIX dialects where file systems may be mounted by an automounter with the ``direct'' type, lsof may sometimes report difference DEVICE, SIZE/OFF, INODE and NAME values when asked to report files open on the file system. This happens because some files open on the file system -- e.g., the current directory of a shell that changed its directory to the file system as the file system's first reference -- may be characterized in the kernel with temporary automounter node information. The cd doesn't cause the file system to be mounted. A subsequent reference to the file system -- e.g., an ls of any place in it -- will cause the file system to be mounted. Processes with files open to the mounted file system are characterized in the kernel with data that reflects the mounted file system's parameters. Unfortunately some kernels (e.g., some versions of Solaris 2.x) don't revisit the process that did only a change-directory for the purpose of updating the data associated with the open directory file. The file continues to be characterized with temporary automounter information until it does another directory change, even a trivial ``cd .''. Lsof will report on both reference types, when supplied the file system name as an argument, but the data lsof reports will reflect what it finds in the kernel. For the different types lsof will display different data, including different major and minor device numbers in the DEVICE column, different lengths in the SIZE/OFF column, different node numbers in the INODE column, and slightly different file system names in the NAME column. In contrast, fuser, where available, can only report on one reference type when supplied the file system name as an argument. Usually it will report on the one that is associated with the mounted file system information. If the only reference type is the temporary automounter one, fuser will often be silent about it. 3.8 Why don't lsof and netstat output match? Lsof and netstat output don't match because lsof reports the network information it finds in open file system objects -- e.g., socket files -- while netstat often gets its information from separate kernel tables. The information available to netstat may describe network activities never or no longer associated with open files, but necessary for proper network state machine operation. For example, a TCP connection in the FIN_WAIT_[12] state may no longer have an associated open file, because the connection has been closed at the application layer and is now being closed at the TCP/IP protocol layer. 3.8.1 Why can't lsof find accesses to some TCP and UDP ports? Kernel implementations sometimes set aside TCP and UDP ports for communicating with support activities running in application layer servers -- the automountd and amd daemons, and the NFS biod and nfsd daemons are examples. Netstat may report the ports are in use, but lsof doesn't. These kernel ports are not associated with file system objects, may be set aside by the kernel on demand, and sometimes are never released. Because they aren't associated with open file system objects, they are transparent to lsof. After all, lsof does stand for LiSt Open Files, and there are no open files associated with these kernel ports. I don't know a way to determine when ports reported by netstat but not by lsof are reserved by the kernel. 3.9 Why does lsof update the device cache file? At the end of the lsof output you may see the message: lsof: WARNING: /Homes/abe/.lsof_vic was updated. In this message /Homes/abe/.lsof_vic is the path to the private device cache file for login abe. (See 00DCACHE.) Lsof issues this message when it finds it necessary to recheck the system device directory (e.g., /dev or /devices) and rebuild the device cache file during the open file scan. Lsof may need to do these things it finds that a device directory node has changed, or if it cannot find a device in the cache. 3.10 Why doesn't lsof report state for UDP socket files? Lsof reports UDP TPI connection state -- TS_IDLE, TS_BOUND, etc. -- for a limited set of dialects, including DC/OSx, Reliant UNIX, Solaris 2.x, 7, 8 BETA, 8 BETA-Refresh, and PTX. TPI state is stream-based TCP/IP information that isn't available in many dialects. The general rule is if netstat(1) reports TPI state, lsof will too. 3.11 I am editing a file with vi; why doesn't lsof find the file? Vi doesn't have the file open. It opens the file, makes a temporary copy (usually in /tmp or /usr/tmp), and does its work in that file. When you update the file from vi, it reopens and rewrites the file. During the vi session itself, except for the brief periods when vi is reading or rewriting the file, lsof can't find an open reference to the file from the vi process, because there is none. 3.12 Why doesn't lsof report TCP/TPI window and queue sizes for my dialect? Lsof only reports TCP/TPI window sizes for Solaris, because only its netstat reports them. The intent of providing TCP/TPI information in lsof NAME column output is to make it easier to match netstat output to lsof output. In general lsof only reports queue sizes for both TCP and UDP (TPI) connections on BSD-derived UNIX dialects, where both sets of values appear in kernel socket queue structures. SYSV-derived UNIX dialects whose TCP/IP implementations are based on streams generally provide only TCP queue sizes, not UDP (TPI) ones. While you may find that netstat on some SYSV-derived UNIX dialects with streams TCP/IP may report UDP (TPI) queue sizes, you will probably also find that the sizes are always zero -- netstat supplies a constant zero for UDP (TPI) queue sizes to make its headers align the same for TCP and UDP (TPI) connections. Solaris seems to get it right -- i.e., its netstat does not report UDP (TPI) queue sizes. When in doubt, I chose to avoid reporting UDP (TPI) queue sizes for UNIX dialects whose netstat-reported values I knew to be a constant zero or whose origin I couldn't determine. Dialects in this category include DC/OSX, OSR, PTX, and Reliant UNIX. 3.13 What does "no more information" in the NAME column mean? When lsof can find no successor structures -- a gnode, inode, socket, or vnode -- connected to the file structure of an open descriptor of a process, it reports "no more information" in the NAME column. The TYPE, DEVICE, SIZE/OFF, and INODE columns will be blank. Because the file structure is supposed to contain a pointer to the next structure of a file's processing support, if the pointer is NUL, lsof can go no further. Some UNIX dialects have file structures for system processes -- e.g., the sched process -- that have no successor structure pointers. The "no more information" NAME will commonly appear for these processes in lsof output. It may also be the case that lsof has read the file structure while it is being assembled and before a successor structure pointer value has been set. The "no more information" NAME will again result. Unless lsof output is filled with "no more information" NAME column messages, the appearance of a few should be no cause for alarm. 3.14 Why doesn't lsof find a process that ps finds? If lsof fails to display open files for a process that ps indicates exists, there may be several reasons for the difference. The process may be a "zombie" for which ps displays the "(defunct)" state. In that case, the process has exited and has no open file information lsof can display. It does still have a process structure, sufficient for the needs of ps. Another possible explanation is that kernel tables and structures may have been changing when lsof looked for the process, making lsof unable to find all relevant process structures. Try repeating the lsof request. 3.15 Why doesn't -V report a search failure? The usual reason that -V won't report a search failure is that lsof located the search item, but was prevented from listing it by an option that doesn't participate in search failure reporting. For example, this lsof invocation: $ lsof -V -i TCP@foobar -a -d 999 may not report that it can't find the Internet address TCP@foobar, even if there is an open file connected to that address, unless the open file also has a file descriptor number of 999 (the ``-a -d 999'' options). 3.16 Portmap problems 3.16.1 Why isn't a name displayed for the portmap registration? When portmap registration reporting is enabled, any time there is a registration for a local TCP or UDP port, lsof displays it in square brackets, following the port number or service name -- e.g., ``:1234[name]'' or ``:name[100083]''. The TCP or UDP port number or service number (what follows the `:') is displayed under the control of the lsof -P option. The registration identity is held by the portmapper and may be a name or a number, depending on how the registration's owner declared it. Lsof reports what the port map holds and cannot derive a registration name from a registration number. Lsof can be compiled with registration reporting enabled or disabled by default, under the control of the HASPMAPENABLED #define (usually in machine.h). The lsof help panel (`lsof -h`) will show the default. Lsof is distributed with reporting disabled by default. 3.16.2 How can I display only portmap registrations? Lsof doesn't have an option that will display only TCP or UDP ports with portmap registrations. The +M option only enables the reporting of registration information when Internet socket files are displayed; +M doesn't select the displaying of Internet socket files -- the -i option does that. This simple lsof pipe to grep will do the job: $ lsof -i +M | grep "\[" This works because -i selects Internet socket files, +M enables portmap registration reporting, and only output lines with opening square brackets will have registrations. When portmap registration reporting is enabled by default, because the lsof builder constructed it that way, +M is not necessary. (The lsof help panel, produced with `lsof -h` will display the default portmapper registration reporting state.) However, specifying +M when reporting is already enabled is acceptable, as is specifying -M when reporting is already disabled. Digression: lsof will accept `+' or `-' as a prefix to most options. (That isn't documented in the man page or help panel to reduce confusion and complexity.) The -i option is as acceptable as +i, so the above example could be written a little more tersely as: $ lsof +Mi | grep "\[" But be careful to use the ``Mi'' ordering, since ``iM'' implies M is an address argument to `i'. 3.16.3 Why doesn't lsof report portmap registrations for some ports? Lsof reports portmap registrations for local TCP and UDP ports only. It identifies local ports this way: * The port appears in the local address section of the kernel structure that contains it. * The port appears in the foreign address section of a kernel structure whose local and foreign Internet addresses are the same. * The port appears in the foreign address section of a kernel address structure whose Internet address is INADDR_LOOPBACK (127.0.0.1). Following these rules, lsof ignores foreign portmapped ports. That's done for reasons of efficiency and possible security prohibitions. Contacting all remote portmappers could take a long time and be blocked by network difficulties (i.e., be inefficient). Many firewalls block portmapper access for security reasons. Lsof may occasionally ignore portmap registration information for a legitimate local port by virtue of its local port rules. This can happen when a port appears in the foreign part of its kernel structure and the local and foreign Internet addresses don't match (perhaps because they're on different interfaces), and the foreign Internet address isn't INADDR_LOOPBACK (127.0.0.1). 3.17 Why is `lsof | wc` bigger than my system's open file limit? There is a strong temptation to count files by piping lsof output to wc. If your purpose is to compare the number you get to some Unix system parameter that defines the number of open files your system can have, resist the temptation. One reason is that lsof reports a number of "files" that don't occupy Unix file table space -- current working directories, root directories, text files, library files, memory mapped files are some. Another reason is that lsof can report a file shared by more than one process that itself occupies only one file table slot. If you want to know the number of open files that occupy file table slots, use the +ff option and process the lsof output's FILE_ADDR column information with standard Unix tools like cut, grep, sed, and sort. You might also consider using use lsof's field output with +ff, selecting the file struct address with -FF, and processing the output with an AWK or Perl script. See the list_fields.awk and list_fields.perl scripts in the scripts/ subdirectory of the lsof distribution for hints on file struct post-processing filters. 3.18 Why doesn't lsof report file offset (position)? Lsof won't report a file offset (position) value if the -s option has been specified, or if the dialect doesn't support the displaying of file offset (position). That lsof is reporting only file size is indicated by the fact that the appropriate column header says SIZE instead of SIZE/OFF. If lsof doesn't support the displaying of file offset (position) -- e.g., for Linux /proc-based lsof -- the -h or -? output panel won't list the -o option. 3.19 Problems with path name arguments 3.19.1 How do I ask lsof to search a file system? You can ask lsof to search for all open files on a file system by specifying its mounted path name as an lsof argument -- e.g., $ lsof / Output of the mount command will show file system mounted path names. It will also show the mounted-on device path for the file system. If the mounted-on device is a block device (the permission field in output of `ls -l ` starts with a `b/), you can specify it's name, too -- e.g., $ lsof /dev/sd0a If the mounted-on device isn't a block device -- for example, some UNIX dialects call a CD-ROM device a character device (ls output starts with a `c') -- you can force lsof to assume that the specified device names a file system with the +f option -- e.g., $ lsof +f -- /dev/sd0a (Note: you must use ``--'' after +f or -f if a file name follows immediately, because +f and -f can be followed by characters that specify flag output selections.) When you use +f and lsof can't match the device to a file system, lsof will issue a complaint. The +f option may be used in some dialects to ask lsof to search for an NFS file system by its server name and server mount point. If the mount application reports an NFS file system mounted-on value that way, then this sample lsof request should work. $ lsof +f -- fleet:/home/fleet/u5 Finally, you can use -f if you don't want a mounted file system path name to be considered a request to report all open files on the file system. This is useful when you want to know if anyone is using the file system's mounted path name. This example directs lsof to report on open access to the `/' directory, including when it's being used as a current working or root directory. $ lsof -f -- / The lsof -f option performs the same function as -f does in some fuser implementations. However, since the lsof -c option was chosen for another purpose before the `f' option was added to lsof, +f was selected as the analogue to the fuser -c option. (Sorry for the potential confusion.) 3.19.2 Why doesn't lsof find all the open files in a file system? Lsof may not find all the open files in a file system for several reasons. First, some processes with files open on the file system may have been changing status when lsof examined the process table, and lsof "missed" them. Remember, the kernel changes much faster than lsof can respond to the changes. Second, be sure you have specified the file system correctly. Perhaps you specified a file instead. You can use lsof's -V option to have lsof report in detail on what it couldn't find. Make sure the report for the file system you specified says "file system." Here's some -V output: $ /lsof -V /tmp ./lsof.h ./lsof COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME lsof 2688 abe txt VREG 18,1,7 1428583 226641 ./lsof lsof 2689 abe txt VREG 18,1,7 1428583 226641 ./lsof lsof: no file use located: ./lsof.h You can also use lsof's +f option to force it to consider a path name as a file system. If lsof can't find a file system by the specified name, it will issue a complaint -- e.g., $ lsof +f -- /usr lsof: not a file system: /usr (/usr is a directory in the / file system.) 3.19.3 Why does the lsof exit code report it didn't find open files when some files were listed? Sometimes lsof will list some open files, yet return a non-zero exit code, suggesting it hasn't found all the specified files. The first thing you should when you suspect lsof is incorrect is to repeat the request, adding the -V option. In the resulting report you may find that your file system specification really wasn't a file system specification, just a file specification. Finally, if you specify two files or two file systems twice, lsof will credit all matches to the first of the two and believe that there were no matches for the second. It's possible to specify a single file system twice with different path names by using both its mounted directory path name and mounted-one device name. $ lsof +f -V spcuna:/sysprog /sysprog COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME ksh 11092 abe cwd VDIR 39,0,1 1536 226562 /sysprog (spcuna:/sysprog) ... lsof: no file system use located: spcuna:/sysprog All matches were credited to /sysprog; none to spcuna:/sysprog. 3.19.4 Why won't lsof find all the open files in a directory? When you give lsof a simple directory path name argument (not a file system mounted-on name), you are asking it to search for processes that have the directory open as a file, or as a process-specific directory -- e.g., root or current working directory. If you want to list instances of open files inside the directory, you need to specify the individual path names of those files, or use the lsof +D and +d options. See the answer to the question "Why are the +D and +d options so slow?" before you use +D or +d casually. See the answer to the question "Why do the +D and +d options produce warning messages?" for an explanation of some process authority limitations of +D and +d. 3.19.5 Why are the +D and +d options so slow? The +D and +d options cause lsof to build a path name search list for a specified directory. +D causes lsof to descend the directory to its furthest subdirectory, while +d restricts it to the top level. In both cases, the specified directory itself is included in the search list. In both symbolic links are ignored. Building such a search list can take considerable time, especially when the specified directory contains many files and subdirectories -- lsof must call the system readlink() and stat() functions for each file and directory. Storing the search list can cause lsof to use more than its normal amount of dynamic memory -- each file recorded in the search list consumes dynamic memory for its path name, characteristics, and search linkages. Using the list means lsof must search it for every open file in the system. As an example of the load +D can impose, consider that an `lsof +D /` on a lightly loaded NeXT '040 cube with a 1GB root file system disk took 4+ minutes of real time. It also generated several hundred error messages about files and directories the lsof process didn't have permission to access with stat(2). The bottom line is that +D and +d should be used cautiously. +D is more costly than +d for deeply nested directory trees, because of the full directory descent it causes. So use +d where possible. In view of these warnings, when is it appropriate to use +D or +d? Probably the most appropriate time is when you would specify the directory's contents to lsof with a shell globbing construct -- e.g., `lsof *`. If that's what you need to do, `lsof +d .` is probably more efficient than having the shell produce a directory list, form it into an argument vector, and pass the vector to lsof for it to unravel. See the answer to the question "Why do the +D and +d options produce warning messages?" for an explanation of some process authority limitations of +D and +d. 3.19.6 Why do the +D and +d options produce warning messages? +D and +d option processing is limited by the authority of the lsof process -- i.e., lsof can only examine (with lstat(2) and stat(2)) files the owner of the process can access. If the ownership, group membership, or permissions of the specified directory, file within it, or directory within it prevents the owner of the lsof process from using lstat(2) or stat(2) on it, lsof will issue a warning message, naming the path and giving the system's (lstat(2's or stat(2)'s) reason (errno explanation text) for refusing access. As an example, assume user abc has a subdirectory in /tmp, owned by abc and readable, writable and searchable by only its owner. If user def asks lsof to search for all /tmp references with +D or +d, lsof will be unable to lstat(2) or stat(2) anything in abc's private subdirectory, and will issue an appropriate warning. Lsof warnings can usually be suppressed with the -w option. However, using -w with +D or +d means that there will be no indication why lsof couldn't find an open reference to a restricted directory or something contained in it. Hint: if you need to use +D or +d and avoid authority warnings, and if you have super-user power, su and use lsof with +D or +d as root. 3.20 Why can't my C compiler find the rpcent structure definition? When you try to compile lsof your compiler may complain that the rpcent structure is undefined. The complaints may look like this: >print.c: In function `fill_portmap': >print.c:213: dereferencing pointer to incomplete type >... The most likely cause is that someone has allowed a BIND installation to update /usr/include/netdb.h (or perhaps /usr/include/rpc/netdb.h), removing the rpcent structure definition that lsof expects to find there. Only Solaris has an automatic work-around. (See dlsof.h in dialects/sun.). The Solaris work-around succeeds because there is another header file, , with the rpcent structure definition, and there is a Solaris C pre-processor test that can tell when the BIND is in place and hence must be included. Doubtlessly there are similar work-arounds possible in other UNIX dialects whose header files have been "touched" by BIND, but in general I recommend restoration of the vendor's and any other header files BIND might have replaced. (I think BIND replaces , , -- and maybe others.) 3.21 Why doesn't lsof report fully on file "foo" on UNIX dialect "bar?" Lsof sometimes won't report much information on a given file, or may even report an error message in its NAME column. That's usually because the file is of a special type -- e.g., in a file system specific to the UNIX dialect -- and I haven't used a system where the file appeared during my testing. If you encounter such a situation, send me e-mail and we may be able to devise an addition to lsof that will report on the file in question. 3.22 Problems loading and executing lsof 3.22.1 Why do I get a complaint when I execute lsof that some library file can't be found? On systems where the LIBPATH (or the equivalent) environment variable is used to record the library search path in executable files when they are built, an incorrect value may make it impossible for the system to find the shared libraries needed to load lsof for execution. This may be particularly true on systems like AIX >= 4.1.4, where the lsof Makefile takes the precautionary step of using the -bnolibpath loader flag to insure that the path to the private static lsof library is not recorded in the lsof binary. Should LIBPATH be invalid when lsof is built, it will be recorded in the lsof binary as the default library path search order and lead to an inability to find libraries when lsof is executed. So, if you get missing library complaints when you try to execute lsof, check LIBPATH, or whatever environment variable is used on your system to define library search order in executable files. Use the tools at your disposal to look at the library paths recorded in the lsof binary -- e.g., chatr on HP-UX, dump on AIX, ldd on Solaris. Make sure, too, that when the correct library search path has been recorded in the executable file, the required library files exist at one or more of the search paths. 3.22.2 Why does lsof complain it can't open files? When lsof begins execution, unless it has been asked to report only help or version information, typically it will attempt to access kernel memory and symbol files -- e.g., /unix, /dev/kmem. Even though lsof needs only permission to open these files for reading, read access to them might be restricted by ownerships and permission modes. So the first step to diagnosing lsof problems with opening files is to use ls(1) to examine the ownerships and permission modes of the files that lsof wants to open. You may find that lsof needs to be installed with some type of special ownership or permission modes to enable it to open the necessary files for reading. See the Installing Lsof section of 00README for more information. 3.22.3 Why does lsof warn "compiled for x ... y; this is z."? Unless warnings are suppressed (with -w) or the kernel identity check symbol (HASKERNIDCK) definition has been deleted, all but one lsof dialect version (exception: /proc-based Linux lsof) compare the identity of the running kernel to that of the one for which lsof was constructed. If the identities don't match, lsof issues a warning like this: lsof: WARNING: compiled for Solaris release 5.7; this is 5.6. Three kernel identity differences can generate this warning -- the version number, the release number, and the architecture type. Not all are compared for every dialect version; architecture type is only compared for SunOS, for example. Build and running identity differences are usually significant, because they usually indicate kernels whose structures are different -- kernel structures commonly change at dialect version releases. Since lsof reads data from the kernel in the form of structures, it is sensitive to changes in them. The general rule is that an lsof compiled for one UNIX dialect version will not work correctly when run on a different version. There are three work-arounds: 1) use -w to suppress the warning -- and risk missing other warnings; 2) permanently disable the identity check by deleting the definition of HASKERNIDCK in the dialect's machine.h header file -- with the same risk; or 3) rebuild lsof on the system where it is to be run. (Deleting HASKERNIDCK can be done with the Customize script or by editing machine.h.) Generally checking kernel identity is a quick operation for lsof. However, it is potentially slow under AIX, where lsof must run /usr/bin/oslevel. To speed up lsof, use -w to suppress the /usr/bin/oslevel test. See "Why does AIX lsof start so slowly?" for more information. 3.22.4 How can I disable the kernel identity check? The kernel identity check is controlled by the HASKERNIDCK definition. When it is defined, most dialects (exclusion: /proc-based Linux lsof) will compare the build-time kernel identity with the run-time one. To disable the kernel identity check, disable the HASKERNIDCK definition in the dialect's machine.h header file. The Customize script can be used to do that in its section about the kernel identity check. Caution: while disabling the kernel identity check may result in smaller lsof startup overhead, it comes with the risk of executing an lsof that may produce warning messages, error messages, incorrect output, or no output at all. 3.23 Why don't ps(1) and lsof agree on the owner of a process? Generally the user ID lsof reports in its USER column is the process effective user ID, as found in the process structure. Sometimes that may not agree with what ps(1) reports for the same process. There are sundry reasons for the difference. Sometimes ps(1) uses a different source for process information, e.g., the /proc file system or the psinfo structure. Sometimes the kernel is lax or confused (e.g., SunOS 4.1.x or Solaris 2.5.1) about what ID to report as the effective user ID. Sometimes the system carries only one user ID in its process structure (some BSD derivatives), leaving lsof no choice. The differences between lsof and ps(1) user identifications should be small and normally it will be apparent that the confusion is over a process whose application has changed to an effective user ID different from the real one. 3.24 Why doesn't lsof find an open socket file whose connection state is past CLOSE_WAIT? TCP/IP connections in states past CLOSE_WAIT -- e.g., FIN_WAIT_1, CLOSING, LAST_ACK, FIN_WAIT_2, and TIME_WAIT -- don't always have open files associated with them. When they don't, lsof can't identify them. When the connection state advances from CLOSE_WAIT, sometimes the open file associated with the connection is deleted. 3.25 Why don't machine.h definitions work when the surrounding comments are removed? The machine.h header files in dialect subdirectories have some commented-out definitions like: /* #define HASSYSDC "/your/choice/of/path */ You can't simply remove the comments and expect the definition to work. That's intended to make you think about what value you are assigning to the symbol. The assigned value might have a system-specific convention. HASSYSDC, for example, might be /var/db/lsof.dc for FreeBSD, but it might be /var/adm/lsof.dc for Solaris. Symbols defined in the lsof documentation are described in 00PORTING, other machine.h comments, and other lsof documentation files. HASSYSDC, for example, is discussed in 00DCACHE. When comments and documentation don't suffice, consult the source code for hints on how the symbol is used. 3.26 What do "can't read inpcb at 0x...", "no protocol control block", "no PCB, CANTSENDMORE, CANTRCVMORE", etc. mean? Sometimes lsof will report "can't read inpcb at 0x00000000", "no protocol control block", "no PCB, CANTSENDMORE, CANTRCVMORE" or a similar message in the NAME column for open TCP socket files. These messages mean the file's socket structure lacks a pointer to the INternet Protocol Control Block (inpcb) where lsof expects to find connection addresses -- local and foreign ports, local and foreign IP addresses. The socket file has probably been submitted to the shutdown(2) function for processing. In some implementations lsof issues the "no PCB, CANTSENDMORE, CANTRCVMORE" message, which tries to explain the absence of a protocol control block by showing the socket state settings that have been made by the shutdown(2) function. If a non-zero address follows the "0x" in the "can't read inpcb" message, it means lsof couldn't read inpcb contents from the indicated address in kernel memory. 3.27 What do the "unknown file system type" warnings mean? Lsof may report a message similar to" unknown file system type, v_op: 0x10472f10 in the NAME column for some files. This means that lsof has encountered a vnode for the file whose operation switch address (from v_op) references a file system type for which there is no support in lsof. After lsof identifies the file system type, it uses pre-compiled code to locate the file system specific node for the file where lsof finds information like file size, device number, node number, etc. To get some idea of what the file system type might be, use nm on your kernel symbol file to locate the symbol name that corresponds to the v_op address -- e.g., on Solaris do: $ nm -x /dev/ksyms | grep 0x10472f10 0x10472f10 ... |file_system_name_vnodeops Where "file_system_name" is the clue to the unsupported file system. Lsof doesn't use the v_op address to identify file system types on all dialects. Sometimes it uses an index number it finds in the vnode. It will translate that symbol to a short name in the warning message -- e.g., "nfs3" -- if possible. 4.0 AIX Problems 4.1 What is the Stale Segment ID bug and why is -X needed? Kevin Ruderman reports that he has been informed by IBM that processes using the AIX 3.2.x, 4.1[.12345]], 4.2[.1], and 4.3.x kernel's readx() function can cause other AIX processes to hang because of what appears to be file system corruption. This failure, known as the Stale Segment ID bug, is caused by an error in the AIX kernel's journalled segment memory handler that causes the kernel's dir_search() function erroneously to believe directory entries contain zeroes. The process using the readx() call need not be doing anything wrong. Usually the system must be under such heavy load that the segment ID being used in the readx() call has been freed and then reallocated to another process since it was obtained from kernel memory. Lsof uses the readx() function to access library entry structures, based on the segment ID it finds in the proc structure of a process. Since IBM probably will never fix the kernel bug, I've added an AIX-specific option to lsof that controls its use of the readx() function. By default lsof readx() use is disabled; specifying the ``-X'' option enables readx() use. If you want to change the default readx() behavior of AIX lsof, change the HASXOPT, HASXOPT_ROOT, and HASXOPT_VALUE definitions in dialects/aix/machine.h. You can also use these definitions to enable or disable readx() -- consult the comments in machine.h. You may want to disable readx() use permanently if you plan to make lsof publicly executable. When HASXOPT_ROOT is defined, lsof will restrict use of the -X option to processes whose real UID is root; if HASXOPT_ROOT isn't defined, any user may specify the -X option. The Customize script offers the option to change HASXOPT_ROOT when HASXOPT is defined and HASXOPT_ROOT is named in any dialect's machine.h header file. I have never seen lsof cause a problem with its use of readx(), but I believe there is some chance it could, given the right circumstances. 4.1.1 Stale Segment ID APAR Here are the details of the Stale Segment ID bug and IBM's response, provided by Kevin Ruderman . AIX V3 APAR=ix49183 user process hangs forever in kernel due to file system corruption STAT=closed prs TID=tx2527 ISEV=2 SEV=2 (A "closed prs" is one closed with a Permanent ReStriction.) RCOMP=575603001 aix v3 for rs/6 RREL=r320 AIX V4 (internal defect, no apar #) prefix p name 175671 abstract KERMP: loop for ever in dir_search() Problem description: 1. Some user application -- e.g., lsof -- gets the segment ID (SID) for the process private segment of a target process from the process table. 2. The target process exits, deleting the process private segment. 3. The SID is reallocated for use as a persistent segment. 4. The user application runs again and tries to read the user area structure from /dev/mem, using the SID it read from the process table. 5. The loads done by the driver for /dev/mem cause faults in the directory; new blocks are allocated; the size changed; and zero pages created. 6. The next application that looks for a file in the affected directory hangs in the kernel's dir_search() function because of the zero pages. This occurs because the kernel's dir_search() function loops through the variable length entries one at a time, moving from one to the next by adding the length of the current entry to its address to get the address of the next entry. This process should end when the current pointer passes the end of the known directory length. However, while the directory length has increased, the entry length data has not, so when dir_search() reaches the zero pages, it loops forever, adding a length of zero to the current pointer, never passing the end of the directory length. The application process is hung; it can't be killed or stopped. IBM closed the problem with a PRS code (Permanent ReStriction) under AIX Version 3 and had targeted a fix for AIX 4.2. They have recently (I became aware of it September 10, 1996) cancelled the defect report altogether and have indicated they are not going to fix the defect. 4.2 Gcc Work-around for AIX 4.1x When gcc is used to compile lsof for AIX 4.1x, it doesn't align one element of the user structure correctly. Xlc sees the U_irss element as a type "long long" and aligns it on an 8 byte boundary. That's because the default mode of xlc is -qlonglong; when -qlonglong is enabled, the _LONG_LONG symbol is also defined. Gcc sees U_irss as a two element array of type long, because _LONG_LONG isn't defined. Hence gcc aligns the U_irss element array on a 4 byte boundary, rather than an 8 byte one, making the gcc incantation of the user structure 4 bytes shorter than xlc's. When the length of gcc's user structure is supplied as argument 4 to the undocumented getuser() function of the AIX kernel, getuser() rejects it as an incorrect size and returns EINVAL. Lsof has a work-around for this problem. It involves a special test in the Configure script when the "aixgcc" Configure abbreviation is used -- e.g., $ Configure -n aixgcc The test is to compile a small program with gcc and check the alignment of U_irss. If it's not aligned on an 8 byte boundary, the Configure script makes a special copy of in ./dialects/aix/aix whose U_irss will align properly, and generates compile time options to use it. While I have tested this work-around only with 4.1.4, it should work with earlier versions of AIX 4.1. It does not work for AIX 4.2; a different work-around is employed there. (See the next section.) If you want to use this technique to compile other AIX 4.1x programs with gcc for using getuser(), check the Configure script. Stuart D. Gathman identified this gcc AIX alignment problem. 4.3 Gcc and AIX 4.2[.1] Alignment problems with gcc and AIX 4.2[.1] inside the user structure are more severe, because there are some new 64 bit types in AIX that gcc doesn't yet (as of 2.7.x) support. The U_irss element problem, discussed in 4.3 above, doesn't exist in 4.2[.1]. The AIX lsof machine.h header file has a work-around, provided by Henry Grebler , that bypasses gcc alignment problems. Later versions of gcc (e.g., 2.8.x) will probably bypass the problems as well. 4.4 Why won't lsof's Configure allow the use of gcc for AIX below 4.1? Gcc can't reliably be used to compile lsof for AIX versions below AIX 4.1 because of possible kernel structure element alignment differences between it and xlc. 4.5 What is an AIX SMT file type? When you run AIX X clients with the DISPLAY environment variable set to ``:0.0'' they communicate with the AIX X server via files whose kernel file structure has an undefined type (f_type == 0xf) -- at least there's no definition for it in . These are Shared Memory Transport (SMT) sockets, an artifact of AIXWindows, designed for more efficient data transfers between the X server and its clients. Henry Grebler and David J. Wilson alerted me to the existence of these files. Mike Feldman and others helped me identify them as SMT sockets. The curious reader can find more about SMT sockets in /usr/lpp/X11/README.SMT. 4.6 Why does AIX lsof start so slowly? When AIX lsof starts it compares the running kernel's identity to the one for which it was built, using /usr/bin/oslevel. That comparison can sometimes take a long time to complete, depending on the system's maintenance level and how recently it was examined with oslevel. You can skip the oslevel test by suppressing warning messages with lsof's -w option. Doing that carries with it the risk of missing other warning messages, however. You can also disable the kernel identity check by disabling the definition of the HASKERNIDCK symbol by editing AIX machine.h header file or by using the Customize script to disable it. See the "Why does lsof warn "compiled for x ... y; this is z.?" section for more information. 4.7 Why does exec complain it can't find libc.a[shr.o]? When you try to execute lsof you may get this complaint: exec(): 0509-036 Cannot load program ./lsof because of the following errors: 0509-022 Cannot load library libc.a[shr.o]. 0509-026 System error: A file or directory in the path name does not exist. This is probably the result of making lsof when the LIBPATH environment variable contained a directory path that doesn't contain libc.a. You can see what LIBPATH contained when lsof was made by using the dump application on lsof. For example, if LIBPATH contained /foo/bar when lsof was made, you will see this (partial) dump output: $ dump -H lsof ... ***Import File Strings*** INDEX PATH BASE ... 0 /foo/bar To correct the problem, revisit the lsof source directory and remake lsof this way: $ unset LIBPATH; make (sh or ksh) or % unsetenv LIBPATH; make (csh or tcsh) 4.8 What does lsof mean when it says, "no PCB, CANTSENDMORE, CANTRCVMORE" in a socket file's NAME column? When an AIX application calls shutdown(2) on an open socket file, but hasn't called close(2) on the file, the file will remain visible to lsof as an open socket file without any extended protocol information. Lsof reports that state in the NAME column by saying that there is "no PCB" (Protocol Control Block) for the protocol (e.g., TCP in the NODE column). If the open socket file has the state variables SO_CANTSENDMORE and SO_CANTRCVMORE set -- i.e., from the shutdown(2) call -- lsof reports them with the CANTSENDMORE and CANTRCVMORE notes in the NAME column. 4.9 When the -X option is used on AIX 4.3.3, why does lsof disable it, saying "WARNING: user struct mismatch; -X option disabled?" The -X option causes lsof to read the loader information of the user structure from virtual memory via the readx() system call. It does that with the user structure definition from that was compiled into the lsof executable. On AIX 4.3.3 there are two different user structure definitions in two separate header files, distributed at different times by IBM. If lsof was compiled with one and the kernel on which lsof is being run was compiled with the other, lsof normally won't get correct loader information when it calls readx(). In an attempt to compensate for that difference, lsof makes an independent check of the loader information by getting the user structure's open file count via readx() and comparing it to the open file count obtained independently via getprocs(). When the two counts don't match, lsof tries to read the count (and re-read the loader information) with two offsets, based on observed differences between the two user structures. When one of the three attempts produces a correct open file count, lsof uses its corresponding offset on subsequent readings of the loader information. When none of the three attempts produces a correct open file count, lsof issues the WARNING message and disables -X processing. To eliminate this problem, obtain an lsof binary that matches the kernel of the AIX 4.3.3 system where you want to run lsof. Compiling lsof on the target system is the preferred way to get a matching binary. 5.0 BSD/OS BSDI Problems 5.1 Why doesn't lsof report on open kernfs files? Lsof doesn't report on open BSD/OS BSDI kernfs files because the structures lsof needs aren't defined in the kernfs.h header file in /sys/misc/kernfs. 6.0 DEC OSF/1, Digital UNIX, and Tru64 UNIX Problems 6.1 Why does lsof complain about non-existent /dev/fd entries? When you run lsof for Digital UNIX 3.2, lsof may complain: lsof: can't lstat /dev/fd/xxx: No such file or directory lsof: can't lstat /dev/fd/yyy: No such file or directory (Or it may warn about other missing /dev/fd paths.) When you do an ``ls /dev/fd'' none of the missing paths are listed. This is caused by a bug in the DEC library function getdirentries(). For some reason, when /dev/fd is a file system mount point, getdirentries() returns an incorrect size for it to readdir(). (Lsof calls readdir() in its ddev.c readdev() function.) Because of the incorrect size, readdir() goes past the end of the /dev/fd directory buffer, encounters random paths and returns them to lsof. Lsof then attempts to lstat(2) the random paths, gets error replies from lstat(2), and complains about the paths. Duncan McEwan discovered this error and has reported it to DEC. Duncan also supplied a work- an alternate readdir() function as a work-around. I've incorporated his readdir() in dialects/osf/ddev.c (as the static ReadDir() function) with some slight modifications, and enabled its use when the USELOCALREADDIR symbol is defined. The Configure script defines USELOCALREADDIR for Digital UNIX version and 3.2. If you don't want to use Duncan's local readdir() function, edit the Makefile and remove -DUSELOCALREADDIR from the CFGF string. When DEC releases a corrected getdirentries() function, I'll modify the Configure script to stop defining USELOCALREADDIR. 6.2 Why does the Digital UNIX V3.2 ld complain about Ots* symbols? When you compile lsof on your Digital UNIX V3.2 system, ld may complain: ld: Unresolved: knlist _OtsRemainder32Unsigned _OtsDivide64Unsigned _OtsRemainder64Unsigned _OtsDivide32Unsigned _OtsMove _OtsDivide32 _OtsRemainder32 *** Exit 1 Chris Eleveld reports this happens on Digital UNIX V3.2 systems after the Fortran compiler has been installed. The best work-around seems to be to remove -lmld from the CFGL string in the Makefile produced by Configure -- i.e., change: CFGL= -lmld to CFGL= According to the V3.2 man page for nlist(3), this shouldn't work, but my testing shows that it does. Although I haven't been able to test this second work-around, you might try adding -lots to CFGL, rather than removing -lmld -- i.e., change: CFGL= -lmld to CFGL= -lmld -lots WARNING: my testing also shows that the V2.0 nlist(3) man page means what it says when it calls for -lmld -- lsof loaded without -mld under V2.0 can't locate the proc (process) table address. DON'T REMOVE -lmld FROM THE DIGITAL UNIX V2.0 MAKEFILE. If you run into this problem, please let me know what problem you encountered and how you solved it. 6.3 Why can't lsof locate named pipes (FIFOs) under V3.2? While lsof for V3.2 can report on named pipes (FIFOs), it can't find them by name. That appears to happen because of the way the V3.2 kernel lstat(2) function reports named pipe device numbers. The V3.2 kernel reports the device number as 0xfffffff, while the kernel structures for named pipes that lsof examines contain the device number of the file system on which the named pipe resides. Consequently, lsof can't match the device and inode number pair it receives from applying lstat(2) to the named pipe with any device and inode number pair it finds when scanning kernel structures. I don't have a work-around. You can, of course, ask for full lsof output and use a post-processing filer (e.g., grep) to locate the named pipe of interest. This problem doesn't exist under V2.0. 6.4 Why does lsof use the wrong configuration header files? For example, why can't the lsof compilation find cpus.h? DEC OSF/1, Digital UNIX, and Tru64 UNIX configuration header files describe the hardware and software environment for which your kernel boot file was constructed. For example, /sys//cpus.h defines the number of CPUs in its NCPUS #define. Lsof searches for the configuration header file subdirectory in /sys (/usr/sys for Digital UNIX version 4.0 and Tru64 UNIX) by converting the first host name component to capital letters -- e.g., TOMIS is derived from tomis.bio.purdue.edu. If that subdirectory exists, lsof uses header files from it. (Configure reports what subdirectory is being used.) If Configure doesn't find a host-name derived subdirectory, it prompts you for the entry of a subdirectory name. If you can't find one, quit Configure and run the kernel generation process to create a proper configuration sub- directory. If you don't identify a proper configuration subdirectory and you try to compile lsof, the compiler will complain about missing header files -- e.g., a missing cpus.h. Once you have located or generated a proper configuration subdirectory, rerun Configure. If you have generated a configuration subdirectory whose name is derived from the host name, Configure will find and use it. If not, you will have to specify its name to Configure. 6.5 Why does lsof indicate incomplete paths with " -- " for Tru64 UNIX 5.1 files? When lsof can't find a component of a path in the kernel's name cache (aka DNLC), or can't determine that the left-most component has as its parent the file system root, it uses an "incomplete path" notation. That notation begins with the file system root name, followed by " -- ", followed by the consecutive path name components lsof was able to find in the DNLC -- e.g., "/ -- init". Because the DNLC was significantly redesigned in Tru64 UNIX 5.1, lsof's handling of the cache had to be completely redone. As part of the DNLC redesign a name cache entry parameter lsof formerly used to locate the file system root of a path was removed. With help from Chang Song I've been able to implement an alternate method for detecting the root of these file system types: AdvFS (MSFS), CDFS, DVDFS, FDFS, NFS, NFS3, and UFS. When lsof doesn't know how to identify the root for a file system type, it will resort to the " -- " incomplete path notation. 7.0 FreeBSD Problems 7.1 Why doesn't lsof report on open kernfs files? Lsof doesn't report on open FreeBSD kernfs files because the structures lsof needs aren't defined in the kernfs.h header file in /sys/misc/kernfs. 7.2 Why doesn't lsof work under FreeBSD 4.0? If lsof doesn't work under FreeBSD 4.0, first make sure you have the latest lsof revision, 4.41 or higher. Next check that your kernel and libkvm are in proper synchronization. Recompile them, if necessary. You might also try compiling lsof this way: $ make DEBUG="-O -DCOMPAT_LINUX_THREADS" Strictly speaking, -DCOMPAT_LINUX_THREADS shouldn't be needed, but slightly unsynchronized FreeBSD 4.0 kernels, header files, and libraries may make it necessary. 8.0 HP-UX Problems 8.1 Why doesn't an HP-UX lsof compilation use -O? If you only have the standard (bundled) HP-UX C compiler and haven't purchased and installed the optional one, then you can't use cc's -O option. The HP-UX cc(1) man page says this: "Options Note that in the following list, the cc and c89 options -A , -G , -g , -O , -p , -v , -y , +z , and +Z are not supported by the C compiler provided as part of the standard HP-UX operating system. They are supported by the C compiler sold as an optional separate product." Lsof's Configure script tries to detect what C compiler product you have installed by examining your compiler. If that examination reveals a standard (bundled) compiler, lsof avoids using -O. If the Configure compiler test fails, the C compiler will complain that it doesn't support -O. You can suppress that complaint by editing the Makefile produced by Configure and removing the DEBUG= -O make string. 8.2 Where is x25L3.h under HP-UX 9.x? If you try to compile lsof with CCITT support under HP-UX 9.x, the compiler will complain it can't find the x25L3.h header file. While that header file was shipped with HP-UX 8.x releases, it is missing from 9.x. If you have access to an 8.x system, you can copy the x25L3.h header file from its /etc/conf/x25 directory to the same place on your 9.x system. If you can't do that, you'll have to appeal to HP or to an 8.x user who can supply that header file. You can disable HP-UX CCITT support by editing the Makefile the Configure script generates. Delete the DINC string and remove ``-DHPUX_CCITT'' from CFGF. My thanks go to Pasi Kaara for this information. 8.3 Why doesn't lsof report all HP-UX 9.x locks? Lsof can't report on HP-UX 9.x locks. created via the fcntl() system call, because the information about them is not stored in structures known to lsof. Locks so created appear to work correctly, the problem is that lsof can't find how they're reported in the file and node structures for the locking process. Lsof has no similar trouble finding information on locks created with the lock() system call. It finds information about them in the locklist structure, attached to the file's inode. I'm at a loss to find the fcntl() lock information. It can be found in HP-UX 10.[12]0, in locklist structures, attached to the vnode, although the 10.20 locklist structure is incorrect. (See the next section.) 8.4 Why doesn't lsof report HP-UX 10.20 locks correctly? Lsof doesn't report the length of HP-UX 10.20 locks -- byte or full file -- correctly under HP-UX 10.20 because the kernel structure lsof examines to determine that a process has a lock on a vnode (the locklist structure from ) contains incorrect lock start and end byte values. Even though this appears to be a kernel bug, HP-UX locks seem to work correctly. All I can conclude is that the correct lock information is stored somewhere else in the kernel, in a place not visible to lsof. As a consequence of this incorrect locklist structure information, lsof always reports all locks with a byte-level `r' (read) or `w' (write) lock indication, and never reports a full-file read (`R') or write (`W') lock. 8.5 Why doesn't the CCITT support work under 10.x? Pasi Kaara , who originally provided the HP-UX CCITT support, reports that it no longer works under HP-UX 10.x. Consequently, at lsof revision 4.02 it has been disabled. 8.6 Why can't lsof be compiled with `cc -Aa` or `gcc -ansi` under HP-UX 10.x? Some HP-UX 10.x header files, needed by lsof, can't be compiled properly in ANSI_C mode; structure element definition and alignment problems result. The f_offset member of the file structure, for example, is incorrect. This ANSI-C obstacle extends to using the -Aa option of the HP C compiler and the -ansi option of gcc. 8.7 Why does lsof complain about no C compiler? Lsof's Configure script looks in /bin and /usr/ccs/bin for an HP C compiler, because it needs to know if the compiler is the standard (bundled) one or the optional separate product. If it finds no compiler in either place, Configure quits after complaining: No executable cc in /bin or /usr/ccs/bin If you don't have a C compiler in either of these standard places, you should consider installing it. If you have gcc installed, you can use it by declaring the ``hpuxgcc'' abbreviation to lsof's Configure script. If you have a C compiler in a non-standard location, you can use the HPUX_CCDIR[12] environment variables to name the path to it. Consult the 00XCONFIG file of the lsof distribution for more information. 8.8 Why does Configure complain about q4 for HP-UX 11? When you run Configure on an HP-UX 11 system, it may complain: !!!ERROR!!! !!!ERROR!!! !!!ERROR!!! !!!ERROR!!! Configure can't use /usr/contrib/bin/q4 to examine the ipis_s structure. You must do that yourself, report the result in the HPUX_IPC_S_PATCH environment variable, then repeat the Configure step. Consult the Configure script's use of /usr/contrib/bin/q4 and the 00XCONFIG file for information on ipis_s testing and the setting of HPUX_IPC_S_PATCH. !!!ERROR!!! !!!ERROR!!! !!!ERROR!!! !!!ERROR!!! This message states that Configure cannot use q4 from /usr/contrib/bin to examine the kernel's boot image for the ipis_s structure. That structure was introduced in early 1999. Patch bundle B.11.00.43 and patches PHNE_20008 and PHNE_20735 appear to be responsible for ipis_s. Note: q4 may also fail if it can't execute nm -- e.g., it can't find /usr/bin/nm, or you have a conflicting, private version of nm earlier in your path. The ipis_s structure isn't described in any header file HP-UX releases with HP-UX 11. It appears in the private lsof header file .../dialects/hpux/kmem/hpux11/ipc_s.h. I had to create ipc_s.h during the lsof port to HP-UX 11 by using q4. Lsof gets local and remote connection addresses (IP and port numbers) from ipc_s, so an incorrect ipc_s definition will cause incorrect reporting of TCP/IP connection addresses. Over the long run -- e.g., after the current patch has been replaced by yet another one -- using q4 is the most reliable way to tell if ipis_s exists and what it contains. Unfortunately, q4 needs to be installed in /usr/contrib/bin and the kernel boot image, /stand/vmunix, needs to be processed with pxdb. If either is untrue, lsof issues the above error message, perhaps preceded by q4 messages. For example, if /stand/vmunix hasn't been processed by pxdb, the q4 messages will include: q4: (error) vmunix not pxdb'd or q4: (warning) /stand/vmunix has not been processed by pxdb. To be able to complete HP-UX configuration of lsof, you must determine if the ipis_s structure is defined in your kernel, if the ipis_s structure of your kernel has an ipis_msgsqueued member, and if the ipc_s structure of your kernel uses has an ipc_ipis member. That means you may have to process /stand/vmunix with pxdp, and perhaps install q4 and run it on /stand/vmunix. If you must run q4 to determine the state of ipis_s and ipc_s, use these q4 commands: $ /usr/contrib/bin/q4 /stand/vmunix ... q4> fields -c struct ipis_s ... q4> fields -c struct ipc_s Look in the q4 output for the ipc_ipis member of the ipc_s structure, and look in the ipis_s structure for the ipis_msgsqueued member. If ipc_s has ipc_ipis but ipis_s lacks ipis_msgsqueued, set HPUX_IPC_S_PATCH environment variable to "1". If ipc_s has ipc_ipis and ipis_s has ipis_msgsqueued, set HPUX_IPC_S_PATCH to "2" -- e.g., $ HPUX_IPC_S_PATCH=1 Configure -n hpux or $ HPUX_IPC_S_PATCH=2 Configure -n hpux or % setenv HPUX_IPC_S_PATCH 1 % Configure -n hpux or % setenv HPUX_IPC_S_PATCH 2 % Configure -n hpux (Use setenv if your shell is csh.) If ipc_s has no ipc_ipis member, set HPUX_IPC_S_PATCH to "N" -- e.g., use this Configure step: $ HPUX_IPC_S_PATCH=N Configure -n hpux or % setenv HPUX_IPC_S_PATCH N % Configure -n hpux 9.0 Linux 9.1 What do /dev/kmem-based and /proc-based lsof mean? At approximately Linux 2.1.72 and exactly at lsof revision 4.23 support for Linux forks. The first fork, containing the oldest lsof form is based on access to kernel memory structures, and is called /dev/kmem-based lsof. A /dev/kmem-based lsof is heavily intertwined with the Linux kernel version, its header files, and its system map file. Typically a /dev/kmem-based lsof needs only setgid permission to local all open file information. After approximately Linux 2.1.72 and at revision 4.23 lsof obtains all its information from the /proc file system. That lsof is called the /proc-based lsof. A /proc-based lsof does not read kernel memory, needs neither kernel header files nor the system map file, and is less likely to be affected by Linux kernel changes. However, it does require setuid-root permission to list all open files, and it can't report file offsets (positions). The lsof Configure script automatically determines which type of lsof it will activate. The sources for both lsof types reside in subdirectories of .../dialects/linux; the /dev/kmem-based sources in .../dialects/linux/kmem; the /proc-based, in .../dialects/linux/proc. Configure determines which sources to use by testing the presence of files and file contents in /proc, so the exact Linux kernel version at which Configure activates /proc-based lsof may differ slightly from 2.1.72, the version where I developed /proc-based lsof. (I know the /proc-based lsof does not find the information it needs in the Linux version 2.0.32 /proc.) 9.2 /dev/kmem-based Linux lsof Questions 9.2.1 Why doesn't /dev/kmem-based lsof work (or even compile) on my Linux system? I test lsof on what Linux systems are available to me. Currently my tests systems are provided by Jim Mintha and Jonathan Sergent . If lsof doesn't even compile on your Linux system, you may be using a version of Linux whose header files differ from the ones I used. Or you may not have installed /usr/src/linux, and lsof can't find header files that it needs from that directory. 9.2.1.1 Why does /dev/kmem-based Configure complain about /usr/src/linux? /dev/kmem-based lsof needs kernel header files from /usr/src/linux/... in order to know the definitions of some structures it reads from the kernel. If your /usr/src/linux is missing or lacks the header files lsof needs, you won't be able to build lsof. Hint: if your kernel source files are in a different place, you may be able to use the LINUX_KERNEL environment variable to identify it. Consult the 00XCONFIG file of the lsof distribution for more information on LINUX_KERNEL and other cross-configuration controls. 9.2.2 Why does /dev/kmem-based lsof complain about /dev/kmem? /dev/kmem-based lsof reads kernel information via /dev/kmem. If you get this error message: lsof: can't open /dev/kmem then the permissions on /dev/kmem or the authority you have when using lsof aren't powerful enough to allow lsof to read from it. Often /dev/kmem is owned by the kmem or system group and has group read permission, so lsof needs to run setgid kmem or system, or the login that runs it must be in the kmem or system group (that's the way I test lsof). So, become the super user and: either $ chgrp kmem lsof or $ chgrp system lsof and $ chmod 2755 lsof 9.2.3 Why can't /dev/kmem-based lsof find kernel addresses? The failure of /dev/kmem-based lsof to read kernel addresses usually is accompanied by error messages like: lsof: can't read kernel name list from lsof: missing kernel high memory definition lsof: missing kernel memory map definition lsof: missing kernel memory start definition lsof: no _task kernel definition lsof: can't read memory parameters These messages describe failures in obtaining addresses for the symbols that identify kernel structures lsof wants to read. Lsof obtains kernel symbol addresses from the /zSystem.map or /System.map files -- one will usually be the argument in the "can't read kernel name list from" error message. You might not have that file, or it might not be in that place. (See the following sections for more information.) 9.2.4 Why does /dev/kmem-based lsof have trouble reading kernel structures? Your kernel and /System.map or /zSystem.map file may not match. (See the next section.) 9.2.5 Where is the system map file for kernel symbol->address translations for /dev/kmem-based lsof? Why doesn't it match my kernel? /dev/kmem-based lsof uses the system map file (also called the symbol->address translation file) to locate addresses of the symbols for kernel information it needs to read. Without this file, lsof cannot function. The system map file should be installed when you install a new kernel. It is not uncommon that people make and install a new kernel, but forget to install its corresponding system map file. /dev/kmem-based lsof tries to determine the system map file's path when it runs, by looking for it in these places (in this order): /System.map /boot/System.map /zSystem.map /boot/zSystem.map /usr/src/linux/System.map /usr/src/linux/zSystem.map (This list is coded in the Linux dproc.c module, should you want to change it.) If you want to know what system map file lsof will use, specify the -h option and look at the default name (listed in parentheses) for the -k option. If lsof can't find a system map file, it issues an error message to that effect and quits. If lsof's system map file scan produces the wrong result, and you know where a system map file can be found that matches the running kernel, you can use lsof's -k option to defeat the scan. 9.2.5.1 What do kernel symbol address mismatch error messages mean for /dev/kmem-based lsof? /dev/kmem-based lsof has code, courtesy of Marty Leisner , that tries to determine if the system map file and the booted kernel are a matched set. The code compares symbol names and addresses from the system map file to symbol names and addresses from /proc/ksyms. If any matching pair of names has different addresses, lsof complains and stops -- e.g., $ lsof -k ./XXX lsof: kernel symbol address mismatch: do_munmap /proc/ksyms value=0x122018; ./XXX value=0x12201a There were 161 additional mismatches. ./XXX and the booted kernel may not be a matched set. 9.2.5.2 Why does /dev/kmem-based lsof complain that query_module is unimplemented? When started, the /dev/kmem-based lsof may issue these messages: lsof: query_module unimplemented lsof: (Is CONFIG_MODULES defined in autoconf.h?) lsof: unable to verify symbols in /System.map This fatal error occurs because the running Linux kernel has not been configured with kernel module support. Lsof needs kernel module support to be able to verify that addresses in the System.map file match those of the running kernel. To solve this problem it is necessary to reconfigure and rebuild the Linux kernel, answering "yes" when asked if kernel module support should be included. After the kernel has been configured and rebuilt, when installing it, don't forget to install its matching System.map file. 9.2.6 Why does /dev/kmem-based lsof complain about the random_fops and urandom_fops kernel symbols? When /dev/kmem-based lsof is run on the Linux 1.3.57 through 1.3.61 kernels, it complains about address conflicts for two symbols, random_fops and urandom_fops with a message that looks like this: lsof: kernel symbol address mismatch: random_fops get_kernel_syms() value is 0x100d76c; /System.map value is 0x19abb0. There was 1 additional mismatch. /System.map and the booted kernel may not be a matched set. Then lsof exits, because the address conflict on these symbols between /System.map and get_kernel_syms() output makes lsof believe that the /System.map file and the running kernel do not match. (See the sections on /dev/kmem-based lsof system map file problems.) The address mismatch for these two symbols appears to be a kernel bug, triggered by using the mouse/psaux loadable module. Keith Parks first reported the problem. He discussed it with Ted Ts'o , and Ted suggested a patch to the random.h header file that Keith reports seems to solve the problem. The patch became available at Linux release 1.3.62. If you have a release below that, but above 1.3.56, you should look at the file Linux-mouse-module.patch in the subdirectory .../lsof*/dialects/linux/kmem/patches. Keith's description in that file with has more detail than appears in this 00FAQ section. 9.2.7 Why does /dev/kmem-based lsof complain about get_kernel_syms()? /dev/kmem-based lsof may complain: lsof: get_kernel_syms() unimplemented lsof: (Is CONFIG_MODULES defined in autoconf.h?) lsof: unable to verify symbols in /System.map The fatal error described by these messages means the get_kernel_syms() function isn't implemented in the Linux kernel, probably because the kernel wasn't configured for module support. /dev/kmem-based lsof uses the information obtained from get_kernel_syms() to validate the information in /System.map. Since it is easy to install a new Linux kernel without installing its /System.map file, this lsof check is an important one. The "unable to verify..." message indicates that lsof is unable to validate the /System.map symbols with information obtained from get_kernel_syms(). If you look at /usr/src/linux/include/linux/autoconf.h you'll probably find there's no ``#define CONFIG_MODULES'' in it. To eliminate this error you must define CONFIG_MODULES in autoconf.h and rebuild the kernel. Don't forget to install the System.map file from the newly built kernel. 9.2.8 Why does /dev/kmem-based lsof complain "WARNING: uncertain kernel loader format; assuming..."? Lsof may complain: lsof: WARNING: uncertain kernel loader format; assuming ... if it was unable to determine whether the Linux kernel was loaded in COFF or ELF format from the information provided by get_kernel_syms() or query_module(). The kernel load format dictates whether the kernel symbols whose addresses lsof requires should have a leading underscore. (COFF kernel symbols do.) When /dev/kmem-based lsof can't determine the kernel load format, it assumes and reports a default, established by the Configure script's analysis of autoconf.h -- i.e., if autoconf.h contains ``#defines CONFIG_KERNEL_ELF'' the default is ELF. If the default kernel load format chosen by lsof is wrong, other kernel symbol mismatch warning messages can result. The best attempt at solution is to rebuild the kernel, first making sure that CONFIG_MODULES is defined in /usr/src/linux/include/linux/autoconf.h with a ``#define CONFIG_MODULES'' preprocessor directive. Don't forget to install the System.map file from the newly built kernel. 9.2.9 /dev/kmem-based lsof Problems Under Linux 2.1.x 9.2.9.1 Why does Configure say ``Testing lseek() with gcc'' for /dev/kmem-based lsof for Linux 2.1.x? Some versions of Linux 2.1.x libc have a flawed lseek() function that will not accept kernel addresses as seek offsets. That's because the kernel addresses have their high order bit set to one -- i.e., appear to be negative -- and the lseek() function in some 2.1.x libc's mistakenly perceives them, when returned as a legitimate response to a kernel lseek() system call, to be a kernel error response. Later Linux 2.1.x distributions have a corrected lseek() function in their libc -- it will be 5.4.x, but I don't yet know the value of x. Look at the __lseek.S libc source module and see if it has ``#define __CHECK_RETURN_ADR'' in it. If it does, then your libc has H. J. Lu's lseek() fix, and /dev/kmem-based lsof's Configure should find that it works correctly. To determine the correct operation of lseek(), /dev/kmem-based lsof's Configure script runs a small test program that does a test seek on /dev/kmem. If the seek fails, the test program's report to Configure causes it to use the source code for a private lseek() interface to the kernel (It's in .../dialects/linux/__lseek.s.) and issue the message: Lseek() is suspect; using private __lseek.s. If /dev/kmem-based lsof reports your lseek() function is suspect, you should update your libc. 9.2.9.2 Why does Configure's lseek() test complain about read permissions for /dev/kmem-based lsof for Linux 2.1.x? The lseek() test program that /dev/kmem-based Configure runs must be able to open /dev/kmem with read permission. If it can't do that, it fails. Configure then reports: Configure for Linux 2.1.x needs read permission to /dev/kmem in order to test lseek() on it. You must run the Configure script with sufficient power that its test program can open /dev/kmem for reading -- e.g., run Configure while su'd to root, give /dev/kmem broader permissions, etc. 9.2.10 What do the /dev/kmem-based lsof WARNING messages about kncache.h mean? The dialects/linux/kmem/Mksrc shell script, called by the Configure shell script, builds a private lsof header file in dialects/linux/kmem/include called kncache.h. It defines structures and constants for lsof's probing of kernel name caches. /dev/kmem-based lsof builds kncache.h by examining files in the Linux kernel and application source trees, including: /usr/include/linux/nfs.h /usr/include/linux/nfs_fs.h /usr/src/linux/fs/dcache.c /usr/src/linux/fs/nfs/dir.c /usr/src/linux/include/linux/dcache.h (Mksrc doesn't examine all these files for all Linux kernels. See Mksrc for details.) Warning messages that mention kncache.h are issued when Mksrc's examination of these application and kernel files fails to yield the definitions lsof needs. If your Configure step yields a kncache.h diagnostic message, be sure the Linux kernel source tree is available to and can be read by the Mksrc script. If it is, check Mksrc for further information or contact me via e-mail. 9.2.11 What does "WARNING: kncache.h defines no kernel name caches." from /dev/kmem-based lsof mean? If the /dev/kmem-based lsof Mksrc script can't locate any kernel name cache definitions for the kncache.h header file, when lsof executes it issues this warning message. The warning means that lsof will be unable to transfer any path name components from kernel name caches to its output NAME column. See the preceding section for information on how Mksrc locates kernel name cache definitions. 9.2.12 Why doesn't my /dev/kmem-based lsof have PPID support? /dev/kmem-based lsof for Linux kernels below 2.0.27 doesn't support the parent PID (PPID) option, -R, because at the time Linux PPID support was added, that was the only boundary I knew to apply. David Bacon reports his 2.0.18 Linux kernel appears to support PPID properly. If you have an old Linux kernel and want to see if it will support PPID, edit or remove the #if/#endif bracket around the HASPPID #define in .../dialects/linux/machine.h, reconfigure, and rebuild. If you can compile lsof without error then PPID support may work. You can verify that it does by comparing ps and lsof output. 9.3 /proc-based Linux lsof Questions 9.3.1 Why doesn't /proc-based lsof report file offsets (positions)? /proc-based lsof can't report file offsets (positions) when no offset information is available in the /proc//fd/ links that describe the files open to a process. During its initialization /proc-based lsof tests to see if offset information might be present in the st_size element of the stat structure returned by the lstat(2) kernel function, when applied to one of its own open files. To see if /proc-based lsof thinks your kernel reports reliable offset information, specify the -o option to it. If it replies with: lsof: WARNING: can't report offset; disregarding -o. then its initialization test has indicated that using lstat(2) on one of its own open files in /proc//fd doesn't deliver offset information. (The /proc-based lsof offset test may be found in the .../dialects/linux/proc/dproc.c initialize() function.) Contact me via e-mail for information on a possible kernel patch that allows lstat(/proc//fd/) to deliver offset (position) information. 9.3.2 Why does /proc-based lsof report "can't identify protocol" for some socket files? /proc-based lsof may report: COMMAND PID ... TYPE ... NODE NAME pump 226 ... sock ... 309 can't identify protocol This means that it can't identify the protocol (i.e., the AF_* designation) being used by the open socket file. Lsof identifies protocols by matching the node number associated with the /proc//fd entry to the node numbers found in selected files of the /proc/net sub-directory. Currently /proc-based lsof examines these protocol files: /proc/net/ax25 (untested) /proc/net/ipx (needs kernel patch) /proc/net/raw /proc/net/tcp /proc/net/udp /proc/net/unix If /proc-based lsof says it can't identify the protocol for an open socket file, you may be able to identify the protocol yourself by using grep to look for the specific node number in the files of /proc/net -- e.g., $ grep /proc/net/* You may not be able to find the desired node number, because not all kernel protocol modules fully support /proc/net information. The AF_PACKET driver, for example, doesn't create a /proc/net/packet file with information about its open socket files. If you find a matching node number in a /proc/net file that is not currently being processed by lsof, contact me via e-mail. I'll discuss adding support to /proc-based lsof for the protocol with you. Some /proc-based lsof protocol support is incomplete without further work. The code that processes /proc/net/ax25 has never been tested on a machine with active AX25 support. The code that matches node numbers of open IPX protocol socket files to those in /proc/net/ipx requires Jonathan Sergent's Linux 2.1.79 patch to /usr/src/linux/net/ipx/af_ipx.c. The patch, suitable for input to Larry Wall's patch program, may be found in the lsof distribution file: .../dialects/linux/proc/patches/net_ipx_af_ipx.c.patch 9.3.3 Why does /proc-based lsof warn about unsupported formats? Lsof may issue the following warning: lsof: WARNING: unsupported format: /proc/net/ if the header line of the indicated in /proc/net -- ax25, ipx, raw, tcp, udp, or unix -- doesn't match what lsof expects to find. When the header line of a /proc/net file isn't what lsof expects, lsof probably can't parse the rest of the file correctly and doesn't try. As a result, lsof can't report any NAME column information (e.g., local and remote addresses) for socket files bound to the indicated network protocol. If you get this warning, please send me e-mail. Include the contents of the file lsof claims has an unsupported format. 9.3.4 Why does /proc-based lsof report "(deleted)" after a path name? The "(deleted)" notation following a path name in /proc-based lsof's NAME column comes from the /proc//fd/ entry for the open file. It's the Linux kernel's way of indicating the file is open but has been unlinked (rm'd). 9.3.5 Why doesn't /proc-based lsof report full open file information for all processes? /proc-based lsof can only report on processes whose /proc files it has permission to read. /proc normally grants permission to read all its files only to root or to the owning user ID. Without permission to read most /proc files, lsof can only report full information for processes belonging to the user who is running lsof. /proc-based lsof may be able to report some information for all processes, depending on the permissions of their associated /proc files, but usually /proc-based lsof won't be able to access the files in /proc//fd/ that describe regular open files. If you want /proc-based lsof to report on all processes, you must install it with setuid-root permission. 9.3.6 Why won't Customize offer to change HASDCACHE or WARNDEVACCESS for /proc-based lsof? /proc-based lsof doesn't read device information from /dev or the device cache file, so it makes no sense to change the state of device cache processing or /dev node accessibility warnings. 9.4 What about lsof, Linux, and the Alpha or SPARC processors? Some versions of lsof, /dev/kmem-based and /proc-based, may work under some versions of Linux on the Alpha or SPARC processors. There are some Alpha and SPARC pre-processor tests that change sections of lsof code, but they aren't tested often, since I don't have regular access to these architectures. (The folks at Red Hat test more regularly than I do.) Try lsof on your Alpha or SPARC system; it may work for you. If it doesn't, feel free to discuss it with me via e-mail. 10.0 NetBSD Problems 10.1 Why doesn't lsof report on open kernfs files? Lsof doesn't report on open NetBSD kernfs files because the structures lsof needs aren't defined in the kernfs.h header file in /sys/misc/kernfs. 11.0 NEXTSTEP and OpenStep Problems 11.1 Why can't lsof report on 3.1 lockf() or fcntl(F_SETLK) locks? Lsof has code to test for locks defined with lockf() or fcntl(F_SETLK) under NEXTSTEP 3.1, but that code has never been tested. I couldn't test it, because my NEXTSTEP 3.1 lockf() and fcntl(F_SETLK) functions return "Invalid argument" every way I have tried to invoke them. If your NEXTSTEP 3.1 system does allow you to use lockf() and fcntl(F_SETLK) and lsof doesn't report locks set with them, then the code in .../dialects/next/dnode.c probably isn't correct. Please contact me via e-mail and tell me how you got your lockf() and fcntl(F_SETLK) system calls to work. 11.2 Why doesn't lsof compile for NEXTSTEP with AFS? I no longer have a NEXTSTEP test system that has AFS. Changes to lsof since I once had a test system have caused me to change the AFS code in NEXTSTEP without being able to test the changes. If you need AFS support for NEXTSTEP and can't get it to compile, please contact me. Perhaps we can jointly fix the problems. 12.0 OpenBSD Problems 12.1 Why doesn't lsof support kernfs on my OpenBSD system? Lsof supports the kernel file system on OpenBSD versions whose /sys/miscfs/kernfs/kernfs.h (or header file correctly defines the kern_target structure. The lsof Configure script's openbsd stanza checks for the presence of the structure's kt_name element and activates kernfs support for the CFLAGS -DHASKERNFS definition only when it finds kt_name. The kernfs.h header file is scheduled to be updated in the OpenBSD 2.1 release, according to Kenneth Stailey , who authored its changes. 12.2 Will lsof work on OpenBSD on non-Intel-based architectures? I've not tested lsof on an OpenBSD system that uses a non-Intel-based architecture, but I've had one report that lsof 4.33 compiles and works on OpenBSD for the pmax architecture (decstation 3100). 12.3 problems 12.3.1 Why does the compiler claim nbpg isn't defined? When compiling lsof on some (older) OpenBSD SPARC versions, the compiler may complain: In file included from ../dlsof.h:191, from ../lsof.h:166, from fino.c:52: /usr/include/sys/pipe.h:83: `nbpg' undeclared here (not in a function) /usr/include/sys/pipe.h:83: size of array `ms' has non-integer type This happens because uses NBPG from to size the `ms' array, and some OpenBSD systems define NBPG in terms of a kernel integer variable, nbpg. Lsof revisions 4.46 and above have a hack to dlsof.h, developed by Volker Borchert that avoids the compiler problem for SPARC OpenBSD 2.3. The hack might work for other OpenBSD SPARC versions, but hasn't been tested there. If you want to enable the hack for your OpenBSD SPARC version, modify this code in .../dialects/n+obsd/dlsof.h: # if defined(OPENBSDV) # if OPENBSDV==2030 && defined(__sparc__) # if defined(nbpg) #undef nbpg # endif /* defined(nbpg) */ #define nbpg 4096 /* WARNING!!! ... */ # endif /* OPENBSDV==2030 && defined(__sparc__) */ #include #endif /* defined(OPENBSDV) */ You will probably want to change the second #if test to match your OpenBSD version. You may also want to change what value is assigned to nbpg. See the next section, "What value should I assign to nbpg?" 12.3.2 What value should I assign to nbpg? If you need to enable the nbpg hack, described in "Why does the compiler claim nbpg isn't defined?", you may also need to assign a value other than 4096 to nbpg. 4096 works for the sun4c processor and should work for sun4m, but 8192 may be needed for sun4. Check and other OpenBSD documentation to determine the correct nbpg assignment. 13.0 Output Problems 13.1 Why do the lsof column sizes change? Lsof dynamically sizes its output columns each time it runs to make sure that each column takes the minimum space. Column parsing -- e.g., with awk -- is possible, because each column is guaranteed to be separated from the preceding one by at lease one space, and no column except the last (NAME) contains embedded spaces. 13.2 Why does the offset have ``0t' and ``0x'' prefixes? The offset value that appears in the SIZE/OFF column has ``0t' and ``0x'' prefixes to distinguish it from size values that may appear in the same column. Normally if the offset value is less than 100,000,000 (8 digits), it appears in decimal with a ``0t' prefix; over 99,999,999, in hexadecimal with a ``0x'' prefix. A decimal offset is handy, for example, when tracking the progress of an outbound ftp transfer. When lsof reports on the ftp process, it will report the size of the file being sent with its open descriptor; it will report the progress of the transfer via the offset of the outbound open ftp data socket descriptor. The ``-o [n]'' option may be used to specify the maximum number of decimal digits to be printed after ``0t'' before lsof switches to the hexadecimal digits after `0x''. As already noted, the default decimal digit count is 8. 13.3 What are the values printed in the FILE_FLAG column and why is 0x sometimes included? The two comma separated lists, separated by a semicolon, printed in the FILE-FLAG column (when the "+fg" option is specified), are short-hand names or hexadecimal values for the bits lsof finds in the f_flag or f_flags member of file structures for files (the first list, the one before the semicolon), and process open files flags found in various kernel structures, often named "pofile" (the second list, the one after the semicolon). Lsof determines the short-hand names from symbols in the , , , , o, and header files. See the discussion of FILE-FLAG in the OUTPUT section of the lsof man page, and the FF_* and POF_* symbols in lsof.h for a list of the names. Bits with no names defined for them are represented by an 0x member of the comma-separated list -- a hexadecimal integer. When "+fG" is specified (instead of "+fg"), lsof will list all flag values as two hexadecimal integers, separated by a semicolon. When "-FG" is specified to get the flags in an output field, the format defaults to hexadecimal. You can get names instead by following "-FG" with "+fg" -- e.g., $ lsof -FG +fg ... However, when you precede "-FG" with "+fg" -- e.g., $ lsof +fg -FG the format will be hexadecimal; order is important. 13.3.1 Why doesn't lsof display FILE_FLAG values for my dialect? All versions of lsof except the /proc-based Linux lsof report FILE-FLAG values. Lsof can't obtain FILE-FLAG information from the Linux /proc interface. 13.4 Network Addresses 13.4.1 Why does lsof's -n option cause IPv4 addresses, mapped to IPv6, to be displayed in IPv6 notation? When you use the -n option to tell lsof to display numeric network addresses, and an IPv4 address has been mapped to IPv6, lsof displays the address in IPv6 format and puts "ipv4" in the TYPE column. That combination indicates the IPv4 address has been mapped to IPv6. For example, the IPv4 address 1.2.3.4, when mapped to an IPv6 address, will be displayed by lsof as: [::ffff:1.2.3.4] The enclosing brackets are lsof's signal that this is an IPv6 address. Inside the brackets is a standard IPv6 address, reported by inet_ntop(). The first two colons, signifying zeroes in the first 64 bits of the IPv6 address, and the hexadecimal ffff in the next 32 bits, indicate that the last 32 bits contains a mapped IPv4 address, which is then displayed in IPv4 dot notation. 14.0 Pyramid Version Problems 14.1 DC/OSx Problems 14.2 Reliant UNIX Problems 14.2.1 Why does lsof complain that it can't find /stand/unix? When you attempt to run lsof on a Reliant UNIX multi- processor, it may complain that it can't find the kernel boot file, /stand/unix. That's because normally the /stand/unix file is only located on one node's root file system. Lsof needs the file to obtain kernel data addresses. The work-around is to copy /stand/unix to each node. 14.2.2 Why does lsof complain about bad kernel addresses? Lsof may complain that some Reliant UNIX kernel addresses aren't usable -- e.g., it may issue a warning like this: lsof: WARNING: can't read kernel's name cache: 0x00000000 This is usually the result of having a /stand/unix file on a Reliant multi-processor that isn't the booted kernel file. Because it doesn't have symbol addresses that match those of the running kernel, lsof has problems reading kernel values. One work-around is to copy the correct boot file to /stand/unix. If the booted kernel file is available under another name -- e.g., /stand/unix.myboot -- another work-around is to use lsof's -k flag to specify the alternate name as the source of kernel name list values: $ lsof -k /stand/unix.myboot ... 14.2.3 Why does the Reliant C compiler give so many warning messages when compiling lsof? The Reliant Unix Pyramid C compiler issues warning messages that I haven't found a convenient way to suppress. You can ignore warning messages about casts and conversions that lose bits. The message "warning: undefining __STDC__" is intentionally caused by the lsof MkKernOpts configuration script to suppress warning messages about cast and conversion problems in standard system header files, such as and . 14.2.4 Why does the lsof compilation require -Klp64 for Reliant UNIX 5.44 and why does my compiler reject it? The -Klp64 flag enables the 64 bit data model lsof requires for handling Reliant Unix 5.44 and above 64 bit kernel pointers. Some compilers don't support -Klp64. If lsof's Configure script detects that -Kl64 is required, it test-compiles a null program to see if the compiler supports -Klp64. If the compiler doesn't support -Klp64, Configure echoes this message and quits: /usr/ccs/bin/cc doesn't support -Klp64. Consult 00FAQ. You can't proceed until you have a compiler that supports -Klp64. Hint: if you have a compiler that does support it, but the compiler is located at a path other than /usr/ccs/bin/cc, supply its path to Configure via the LSOF_CC environment variable -- e.g., if the compiler that supports -Klp64 is in /opt/C/bin/cc, you might use this Configure command: $ LSOF_CC=/opt/C/bin/cc Configure pyramid I have used both these compilers successfully on Reliant Unix 5.44 and above: Pyramid C Compiler 06.0A00 CDS++ Version 02.A00 Compilers known to lack support for -Klp64 include C-DS-MI V1.2 and gcc. 15.0 SCO Problems 15.1 SCO OpenServer Problems 15.1.1 How can I avoid segmentation faults when compiling lsof? If you have an older SCO OpenServer compiler, it may get a segmentation fault when compiling some lsof modules. That appears to happen because of the -Ox optimization action requested in the lsof Makefile. Try changing -Ox to -O in the DEBUG string of the Makefile that Configure generates for you -- e.g., change DEBUG= -Ox to DEBUG= -O Bela Lubkin supplied this tip and Steve Williams verified it. 15.1.2 Where is libsocket.a? If you compile lsof and the loader says it can't find the socket library, libsocket.a, called by the -lsocket option in the lsof compile flags, you probably are running an SCO OpenServer release earlier than 5.0 and don't have the TCP/IP Development System package installed. You may have the necessary header files, because you have the TCP/IP run-time package installed, but if you don't have the TCP/IP Development System package installed, you won't have libsocket.a. Your choices are to install the TCP/IP Development System package or upgrade to OpenServer Release 5.0. You will find libsocket.a in 5.0 -- you'll find all the libraries and header files there, in fact -- and you can use gcc to compile lsof if you don't want to install the 5.0 Development System package. 15.1.3 Why do I get "warning C4200" messages when I compile lsof? When you compile lsof under OSR 3.2v4.2 (and perhaps under earlier versions as well), you may get many compiler warning messages of the form: node.c(183) : warning C4200: previous declarator is not compatible with default argument promotion In my opinion this is a bug in the OSR compiler. Because the compiler cannot handle full ANSI-C prototypes, it assumes default types for function parameters as it encounters untyped in a function prototype -- e.g., in this function declaration from node.c, readrnode(ra, r) KA_T ra; struct rnode *r; { ... the compiler assigns default int types to the ra and r arguments. Then, when the compiler encounters the fully typed parameters after the function skeleton and sees parameters with types that don't match the assumptions it previously made, it whines about its own assumptions. You can ignore these messages. 15.2 SCO UnixWare Problems 16.0 Sun Problems 16.1 My Sun gcc-compiled lsof doesn't work -- why? Gcc can be used to build lsof successfully. However, an improperly installed Sun gcc compiler will usually not produce a working lsof. Under SunOS 4.1.x this may happen when the gcc compiler is copied from one Sun architecture -- e.g., from a sun4m to a sun4. The problem comes from the copying of the special #include header files that gcc "fixes" during installation to circumvent ANSI-C conflicts, especially on #else and #endif pre-processor declarations. Some of the "fixed" header files declare kernel structures whose length varies with architecture type. In particular, the size of the user structure () changes with architecture type, and, since lsof gets command name and file pointers from that structure, can cause lsof to malfunction when its length is incorrect. While the "fixing" of header files is eliminated at gcc 2.8 and above, architecture-specific header files may still remain in the SunOS gcc private include tree. In particular, gcc's private machine/ include subdirectory may still exist and may point to a specific architecture's private header files in the same tree -- e.g., machine -> sun4c/. If your machine/ points to an architecture type different from the output of `uname -m`, try the temporary work-around described in the "How can I make lsof compile with gcc under SunOS 4.1.x?" section. These architecture-related structure differences generally do not occur under Solaris 2.x, 7, 8 BETA, and 8 BETA-Refresh. Instead, the more common reason a gcc-compiled lsof doesn't work there is that the special gcc header files were not updated during the change from one version Solaris to the next -- e.g., from 2.4 to 2.5. If your Sun gcc-compiled lsof doesn't report anything, or reports ``can't read proc table,'' check that the gcc fixincludes or fixinc.svr4 (Solaris 2.x, 7, 8 BETA, and 8 BETA-Refresh) step was run on the system where you're using gcc to compile lsof. As an alternative, if you have the SunPro C compiler available, use it to compile lsof -- e.g., use the solariscc or sunoscc Configure abbreviations. 16.2 How can I make lsof compile with gcc under Solaris 2.[456], 2.5.1, or 7? Presuming your gcc-specific header files are wrong for Solaris, edit the lsof Configure-generated Makefile and lib/Makefile and make this change: CFGF= -Dsolaris=20400 ... to CFGF= -Dsolaris=20400 -D__STDC__=0 -I/usr/include ... or change: CFGF= -Dsolaris=20500 ... to CFGF= -Dsolaris=20500 -D__STDC__=0 -I/usr/include ... or change: CFGF= -Dsolaris=20501 ... to CFGF= -Dsolaris=20501 -D__STDC__=0 -I/usr/include ... This is only a temporary work-around. You really should rerun gcc's fixinc.svr4 script to update your gcc-specific header files or install gcc 2.8.0, which has no need for private copies of Solaris include files. 16.3 How can I make lsof compile with gcc under SunOS 4.1.x? Presuming your gcc-specific header files are wrong for SunOS 4.1.x, edit the lsof Configure-generated Makefile and lib/Makefile and change: CFGF= -ansi -DSUNOSV=40103 ... to CFGF= -DSUNOSV=40103 -I/usr/include ... This is only a temporary work-around. You really should rerun gcc's fixincludes script to update your gcc-specific header files or install gcc 2.8, which uses fewer private include files. (But see the caution in the "My Sun gcc-compiled lsof doesn't work -- why?" section about remaining architecture specific SunOS header files that gcc 2.8 will still have in its private include directory.) 16.4 Why does Solaris Sun C complain about system header files? You're probably trying to use /usr/ucb/cc if you get compiler complaints like: cc -O -Dsun -Dsolaris=20300 ... "/usr/include/sys/machsig.h", line 81: macro BUS_OBJERR redefines previous macro at "/usr/ucbinclude/sys/signal.h", line 444 Note the reference to "/usr/ucbinclude/sys/signal.h". It reveals that the BSD Compatibility Package C compiler is in use. Lsof requires the ANSI C version of the Solaris C compiler, usually found in /usr/opt/bin/cc or /opt/SUNWspro/bin/cc. Try adding a CC string to the lsof Makefile that points to the Sun ANSI C version of the Sun C compiler -- e.g., CC= /usr/opt/bin/cc or CC= /opt/SUNWspro/bin/cc. 16.5 Why doesn't lsof work under my Solaris 2.4 system? If lsof doesn't work under your Solaris 2.4 system -- e.g., it produces no output, little output, or the output is missing command names or file descriptors -- you may have a pair of conflicting Sun patches installed. Solaris patch 101945-32 installs a kernel that was built with a header file whose NUM_*_VECTORS definitions don't match the ones in the updated by Solaris patch 102303-02. NUM_*_VECTORS in the kernel of patch 101945-32 are smaller than the ones in the of patch 102303-02. The consequence is that when lsof is compiled with the whose NUM_*_VECTORS definitions are larger than the ones used to compile the patched kernel, lsof's user structure does not align with the one that the kernel employs. If you have these two patches installed, contact Sun and complain about the mis-match. The lsof Configure script attempts to work around the mis-matched patches by including a modified header file from ./dialects/sun/include/sys. That auxv.h has these alternate definitions: #define NUM_GEN_VECTORS 4 #define NUM_SUN_VECTORS 8 The Configure script issues a prominent WARNING that it is putting this work-around into effect. If it doesn't succeed for you, please contact me. I thank Leif Hedstrom for identifying the offending patches. 16.6 Where are the Solaris header files? If you try to compile lsof under Solaris and get a compiler complaint that it can't find system header files, perhaps you forgot to add the header file package, SUNWhea. 16.7 Where is the Solaris /usr/src/uts//sys/machparam.h? When you try to Configure lsof for Solaris 2.[23456], 2.5.1, and 7 -- e.g., on a `uname -m` == sun4m system -- Configure complains: grep: /usr/src/uts/sun4m/sys/machparam.h: No such file or directory grep: /usr/src/uts/sun4m/sys/machparam.h: No such file or directory And when you try to compile the configured lsof, cc or gcc complains: dproc.c:530: `KERNELBASE' undeclared (first use this function) The explanation is that somehow your Solaris system doesn't have the header files in /usr/src/uts it should have. Perhaps someone removed the directory to save space. Perhaps you're using a gcc installation, copied from another system. In any event, you will have to load the header files from the SUNWhea package of your Solaris distribution. KERNELBASE is an important symbol to lsof -- it keeps lsof from sending an illegal kernel value to kvm_read() where a segmentation violation might result (a bug in the kvm library). Lsof can get illegal kernel values because it reads kernel values slowly with kvm_read() calls that the kernel is changing rapidly. Lsof doesn't need KERNELBASE at Solaris 2.5 and above, because it has a Kernelbase value whose address lsof can find with /dev/ksyms and whose value it can read with kvm_read(). Under Solaris 2.5 /usr/src/uts has moved to /usr/platform. 16.8 Why does Solaris lsof say ``can't read proc table''? When lsof collects data on processes, using the kvm_*() functions to scan the kernel's proc structure table, it checks to make sure it has identified a reasonable number of them -- a minimum of three. When lsof can't identify three processes during a scan, it repeats the scan. When five scans fail to yield three processes, lsof issues the fatal message: lsof: can't read proc table and exits. Usually lsof fails to identify three processes during a scan because its idea of the form of the proc structure differs from that being used by the kernel. Since the proc structure is defined in and other /usr/include header files, the root cause of a proc structure discrepancy usually can be found in the composition of /usr/include. One common way that /usr/include header files can be incorrect is that gcc was used to compile lsof, gcc used its special (i.e., "fixed") header files instead of the ones in /usr/include, and the special gcc header files weren't updated when Solaris was. Answers to these questions: My Sun gcc-compiled lsof doesn't work -- why? How can I make lsof compile with gcc under Solaris 2.[456], 2.5.1, or 7? How can I make lsof compile with gcc under SunOS 4.1.x? Why does Solaris Sun C complain about system header files? discuss the gcc header file problem and offer suggestions on how to fix it or work around it. It may also be that you are trying to run a version of lsof that was compiled on an older version of Solaris. For example, an lsof executable, compiled for Solaris 2.4, will produce the ``can't read proc table'' message if you try to run it under Solaris 2.5. If you have compiled lsof under Solaris 2.5 and it still won't work, see if the header files in /usr/include have been updated to 2.5, or still represent a previous version of Solaris. Another source of header file discrepancies to consider is the Solaris patch level and whether a binary kernel patch was not matched with a corresponding header file update. See the "Why doesn't lsof work under my Solaris 2.4 system?" question for an example of one in Solaris 2.4 -- there may be other such patch conflicts I don't know about. 16.9 Why does Solaris lsof complain about a bad cached clone device? When lsof revisions below 4.04 have been run on a Solaris system and have been allowed to create a device cache file, the running of revisions 4.04 and above on the same systems may produce this complaint: lsof: bad cached clone device: ... lsof: WARNING: created device cache file: ... This is the result of a change in the device cache file that took place at lsof revision 4.04. The change introduced a node number into the clone device lines of the device cache file and was done in such a way that lsof could detect device cache files whose clone lines don't have node numbers (lines created by previous lsof revisions) and recognize the need to regenerate the device cache file. 16.10 Why doesn't Solaris make generate .o files? Solaris /usr/ccs/bin/make won't generate .o files from .c files if /usr/share/lib/make/make.rules is missing. It may be found in and installed from the SUNWsport package. 16.11 Why does lsof report some Solaris 2.3 and 2.4 lock types as `N'? For Solaris 2.3 with patch P101318 installed at level 45 or above, and for all versions of Solaris 2.4, NFS locks are represented by a NFS-specific kernel lock structure that sometimes lacks a read or write lock type indicator. When lsof encounters such a lock structure, it reports the lock type as `N'. 16.12 Why does lsof Configure say "WARNING: no cc in ..."? When lsof's Configure script is executed with the solariscc abbreviation it tries to make sure it's using the Sun C compiler and not the UCB substitute from /usr/ucb/cc. Thus, it looks for cc in the "standard" Sun compiler location, /opt/SUNWspro/bin. If Configure can't find cc there, it issues the warning: lsof: WARNING: no cc in /opt/SUNWspro/bin; using cc without path. and uses cc for the compiler name, letting the shell find cc with its PATH environment variable. You can tell Configure where to find your cc with the SOLARIS_CCDIR cross-configuration environment variable. (See 00XCONFIG for more information on SOLARIS_CCDIR). For example, use this Configure shell command: SOLARIS_CCDIR=/usr/special/bin Configure -n solariscc (SOLARIS_CCDIR should be the full path to the directory containing your cc.) 16.13 Solaris 7 and 8 Problems 16.13.1 Why does lsof say the compiler isn't adequate for Solaris 7 or 8? Solaris 7 and 8 kernels come in two flavors, 32 and 64 bit. 64 bit kernels run on machines that support the SPARC v9 instruction set architecture. Separate executables for some programs, -- e.g., ones using libkvm like lsof -- must be built for 32 and 64 bit kernels. Previous Sun (e.g., SC4.0) and gcc (e.g., 2.8.0) compilers will build lsof for 32 bit kernels, but they won't build it for 64 bit kernels. The only compilers that will build lsof for 64 bit Solaris 7 and 8 kernels are the Sun WorkShop Compilers C 5.0 and an appropriately built gcc 2.95 or higher (gcc 2.96.2 is more easily built.) (See the "How do I build a gcc that will produce 64 bit Solaris 7 and 8 executables?" section for tips on building an appropriate gcc.) When given the ``-xarch=v9'' flag, the C 5.0 compiler and associated loader and 64 bit libraries will build a 64 bit lsof executable; when given the "-m64" or "-mcpu=v9" (deprecated) flags, an appropriate gcc 2.95 or above compiler will build a 64 bit lsof executable. When the lsof Configure script detects a 64 bit kernel is in use (e.g., by executing `/bin/isainfo -kv`), and when it finds that the specified compiler is inappropriate, it complains with these messages: For gcc: "!!!WARNING!!!=========!!!WARNING!!!=========!!!WARNING!!!" "! !" "! LSOF NEEDS TO BE CONFIGURED FOR A 64 BIT KERNEL, BUT !" "! THIS GCC DOESN'T SUPPORT THE BUILDING OF 64 BIT !" "! SOLARIS EXECUTABLES. LSOF WILL BE CONFIGURED FOR A !" "! 32 BIT echo KERNEL. !" "! !" "!!!WARNING!!!=========!!!WARNING!!!=========!!!WARNING!!!" For Sun C: !!!WARNING!!!==========!!!WARNING!!!==========!!!WARNING!!! ! ! ! LSOF NEEDS TO BE CONFIGURED FOR A 64 BIT KERNEL, BUT | ! THE VERSION OF SUN C AVAILABLE DOESN'T SUPPORT THE ! ! -xarch=v9 FLAG. LSOF WILL BE CONFIGURED FOR A 32 BIT ! ! KERNEL. ! ! ! !!!WARNING!!!==========!!!WARNING!!!==========!!!WARNING!!! 16.13.2 Why does Solaris 7 or 8 lsof say "FATAL: lsof was compiled for..."? Solaris 7 or 8 lsof may say: lsof: FATAL: lsof was compiled for a xx bit kernel, but this machine has booted a yy bit kernel. Where: xx = 32 or 64 yy = 64 or 32 (xx and yy won't match.) This message indicates that lsof was compiled for one size kernel and is being asked to execute on a different size one. That's not possible for programs like lsof that use libkvm. Depending on the instruction sets for which you need Solaris 7 or 8 lsof, you may need two or more versions of lsof, compiled for each kernel size, installed for use with /usr/lib/isaexec. See the "How do I install lsof for Solaris 7 or 8?" section of this document for more information on that. 16.13.3 How do I build lsof for a 64 bit Solaris kernel under a 32 bit Solaris kernel? If your Solaris system has an appropriate compiler (WorkShop Compilers C 5.0) and the 64 bit libraries have been installed, you can force lsof's Configure script to build a 64 bit version of lsof with: $ SOLARIS_KERNBITS=64 Configure -n solariscc The SOLARIS_KERNBITS environment variable is part of the lsof cross-configuration support, described in the 00XCONFIG file of the lsof distribution. 16.13.4 How do I install lsof for Solaris 7 or 8? If you are installing lsof where it will be used only under the bit size kernel for which it was built, no special installation is required. If, however, you are installing different versions of lsof for different bit sizes -- e.g., for use on a 64 bit NFS server and from its 32 bit clients -- you should read the man page for isaexec(3C) and install lsof according to its instructions. The executable at the directory where lsof is to be found should be a hard link to /usr/lib/isaexec or a copy of it. In the directory there must be instruction architecture subdirectories -- e.g., .../sparc/ and .../sparcv9/. The lsof for 64 bit size kernels is installed in the .../sparcv9/ subdirectory; the one for 32 bit size kernels, in .../sparc/. For example, if you're installing 32 and 64 bit lsof executables in /usr/local/etc, you would: # cd /usr/local/etc # ln /usr/lib/isaexec lsof # mkdir sparc sparcv9 # install the 32 bit lsof as sparc/lsof # install the 64 bit lsof as sparcv9/lsof # chmod, chown, and chgrp sparc/lsof and sparcv9/lsof appropriately Lsof permissions and ownerships are the same whether one or more lsof executables are being installed, with or without the /usr/lib/isaexec hard link. 16.13.5 Why does my Solaris 7 or 8 system say it cannot execute lsof? When you attempt to execute lsof, your Solaris 7 or 8 shell may complain: ksh: ./lsof: cannot execute If the lsof executable exists and has the proper execution permissions, this error may be the result of trying to execute an lsof, built for a 64 bit kernel, on a 32 bit kernel. This will tell you about the lsof executable: $ file lsof lsof: ELF 64-bit MSB executable SPARCV9 Version 1, dynamically linked, not stripped The "64-bit" notation indicates the binary was built for a 64 bit kernel. To see the running kernel bit size, use this command: $ isainfo -kv 32-bit sparc kernel modules The "32-bit" notation indicates a 32 bit kernel has been booted. The only work-around is to obtain, or Configure and make, an lsof for the appropriate kernel bit size. If you Configure and make lsof on the kernel where you wish to run it the proper compiler, the lsof Configure step will generate Makefiles that can be used with make to build an appropriate lsof executable. To compile a 64 bit lsof, you must have a Sun compiler that supports the -xarch-sparcv9 option -- i.e., WorkShop Compilers C 5.0 or higher. 16.13.6 How do I build a gcc that will produce 64 bit Solaris 7 and 8 executables? Gcc 2.96 snapshots, dated August 14, 2000 or later can be used to build 64 bit Solaris 7 and 8 executables. Get the latest gcc snapshot from: ftp://sourceware.cygnus.com/pub/gcc/snapshots (The "core" distribution is limited to gcc.) To build this with a Sun Workshop C compiler (version 5 or above), put "CC=cc" in your environment before executing the gcc configure script. 16.13.7 Why does lsof on my Solaris 7 or 8 system say, "can't read namelist from /dev/ksyms?" You're probably trying to use an lsof executable built for an earlier Solaris release on a 64 bit Solaris 7 or 8 kernel. The output from `lsof -v` will tell you the build environment of your lsof executable. You should also have gotten a warning message that lsof is compiled for a different Solaris version than the one under which it is running -- something like this: lsof: WARNING: compiled for Solaris release X; this is Y You need to build lsof on the system where you want to use it. For 64 bit Solaris 7 and 8 you need a compiler that can generate 64 bit Solaris executables -- e.g., the Sun Workshop 5 C compiler, or gcc 2.95 and above See the "Why does lsof say the compiler isn't adequate for Solaris 7 or 8?" section and the ones following it for a discussion of building lsof for 64 bit Solaris 7 or 8. 16.14 Solaris and COMMON 16.14.1 What does COMMON mean in the NAME column for a Solaris VCHR file? When lsof puts COMMON or (COMMON) in the NAME column of a Solaris VCHR file, it means that the file is handled by the special file system functions of the kernel through a common vnode. 16.14.2 Why does a COMMON Solaris VCHR file sometimes seem to have an incorrect minor device number? When lsof reports on an open file in a Solaris special file system that uses a COMMON vnode, and the file is a VCHR file, lsof tries to locate the associated device node by looking for matches on the major and minor device numbers first. If no major and minor match results, lsof then looks for a match on pseudo and clone device files. (See /devices/pseudo.) Those device nodes are matched specially by either their major or minor device numbers, but not both. Hence, when lsof finds a match under those special conditions, it may report a value in its output DEVICE column that differs from one of the major and minor numbers of the device node. Here's an example from a sun4m Solaris 7 system: $ ls -li /devices/pseudo/pm@0:pm 151261 crw-rw-rw- 1 root sys 117, 0 ... $ lsof /devices/pseudo/pm@0:pm COMMAND ... DEVICE ... NODE NAME powerd 117,1 ... 151261 /devices/pseudo/pm@0:pm (COMMON) Xsun ... 117,0 ... 151261 /devices/pseudo/pm@0:pm Note that the DEVICE value for the file with (COMMON) in its name field has a different minor device number (1) from what ls reports (0), while the DEVICE value for the file without (COMMON) matches the ls output exactly. Both match on the major device number, 117. The minor device number mis-match is a result of the way the Solaris kernel handles special file system common vnodes, and it's the reason lsof puts (COMMON) after the name to signal that a mis-match is possible. 16.15 Why don't lsof and Solaris pfiles reports always match? /usr/proc/bin/pfiles for Solaris 2.6, 7, 8 BETA, and 8 BETA-Refresh also reports information on open files for processes. Sometimes the information it reports differs from what lsof reports. There are several reasons why this might be true. First, because pfiles is a Sun product, based on Sun kernel features, its developers have a better chance of knowing exactly how open file information is organized. I sometimes have to guess at how kernel file structure linkages are constructed by gleaning hints from header files. Second, lsof is aimed at providing information, specifically device and node numbers, that can be used to identify named file system objects -- i.e., path names. Thus, lsof tries to make sure its device and node numbers match those reported by stat(2). Pfiles doesn't always report numbers that match stat(2) -- e.g., for files using clone and pseudo devices via common vnodes like the nlist() /dev/ksyms usage. Here's the Solaris 7 COMMON VCHR example again with additional pfiles output: $ ls -li /devices/pseudo/pm@0:pm 151261 crw-rw-rw- 1 root sys 117, 0 ... $ lsof /devices/pseudo/pm@0:pm vic1: 10 = lsof /dev/pm COMMAND ... DEVICE ... NODE NAME powerd ... 117,1 ... 151261 /devices/pseudo/pm@0:pm (COMMON) Xsun ... 117,0 ... 151261 /devices/pseudo/pm@0:pm $ pfiles ... 0: S_IFCHR ... dev:32,24 ino:61945 ... rdev:117,1 ... 14: S_IFCHR ... dev:32,24 ino:151261 ... rdev:117,0 Note that the NODE number, reported by lsof, matches what ls(1) and stat(2) report, while the ino value pfiles reports doesn't. Lsof also indicates with the (COMMON) notation that the DEVICE number is a pseudo one, derived from the character device's value. The lsof DEVICE value matches the pfiles rdev value, correct behavior for a character device, but pfiles gives no sign that it's not possible to find that character device number in /devices with ls(1) or stat(2). 16.16 Why doesn't lsof report node number and name for some SunOS 4.1.4 Auspex processes? Some Auspex processes have files open for which lsof doesn't report a node number or a name. There is no more information in the kernel node structure chains for these files that would allow lsof to supply the missing items. I suspect these files are using a clone of the master device /dev/ipc. It's device number is (37,80). The major of 37 indicates a clone master and the minor of 80 is reflected as the major number of the Auspex files lsof can't fully describe. However, the Auspex files don't have a stream initialized, so lsof can't connect them to the /dev/ipc clone master. 17.0 Lsof Features 17.1 Why doesn't lsof doesn't report on /proc entries on my system? /proc file system support is generally available only for BSD, SYSV R4 dialects, and Tru64 UNIX (Digital UNIX, DEC OSF/1). It's also available for Linux, and Pyramid DC/OSx and Reliant UNIX. Even on some SYSV R4 dialects I encountered many problems while trying to incorporate /proc file system support. The chief problem is that some vendors don't distribute the header file that describes the /proc file system node -- usually called prdata.h. 17.2 How do I disable the device cache file feature or alter it's behavior? To disable the device cache file feature for a dialect, remove the HASDCACHE definition from the machine.h file of the dialect's machine.h header file. You can also use HASDCACHE to change the default prefix (``.lsof'') of the device cache file. Be sure you consider disabling the device cache file feature carefully. Having a device cache file significantly reduces lsof startup overhead by eliminating a full scan of /dev (or /devices) once the device cache file has been created. That full scan also overloads the kernel's name cache with the names of the /dev (or /devices) nodes, reducing the opportunity for lsof to find path name components of open files. If you're worried about the presence of mode 0600 device cache files in the home directories of the real user IDs that execute lsof, consider these checks that lsof makes on the file before using it: 1. To read the device cache file, lsof must gain permission from access(2). 2. The device cache file's modes must be 0600 (0644 if lsof is reading a system-wide device cache file) and its size non-zero. 3. There must be a correctly formatted section count line at the beginning of the file. 4. Each section must have a header line with a count that properly numbers the lines in the section. Legal sections are device, clone, pseudo-device, and CRC. 5. The lines of a section must have the proper format. 6. All lines are included in a 16 bit CRC, and it is recorded in a non-checksummed section line at the end of the file. 7. The checksum computed when the file is read must match the checksum recorded when the file was written. 8. The checksum section line must be followed by end-of-information. 9. Lsof must be able to get matching results from stat(2) on a randomly chosen entry of the device section. For more information on the device cache file, read the 00DCACHE file of the lsof distribution. 17.2.1 What's the risk with a perverted device cache file? Even with the checks that lsof makes on the device cache file, it's conceivable that an intruder could modify it so it would pass lsof's tests. The only serious consequence I know of this change is the removal of a file whose major device number identifies a socket from some user ID's device cache file. When such a device has been removed from the device cache file, and when lsof doesn't detect the removal, lsof may not be able to identify socket files when executed by the affected user ID. Only certain dialects are at risk to this attack -- e.g., SCO OpenServer and Solaris 2.x, 7, 8 BETA, and 8 BETA-Refresh (but not SunOS 4.1.x). If you're tracking a network intruder with lsof, that could be important to you. If you suspect that someone has corrupted the device cache file you're using, I recommend you use lsof's -Di option to tell it to ignore it and use the contents of /dev (or /devices) instead; or remove the device cache file (usually .lsof_hostname, where hostname is the first component of the host's name returned by gethostname(2)) from the user ID's home directory and let lsof create a new one for you. 17.2.2 How do I put the full host name in a personal device cache file path? Lsof constructs the personal device cache file path name from a format specified in the HASPERSDC #define in the dialect's machine.h header file. As distributed HASPERSDC declares the path to be ``.lsof_'' plus the first component of the host name with the format ``.lsof_%L''. If you want to change the way lsof constructs the personal device cache file path name, you can change the HASPERSDC #define and recompile lsof. If, for example, you #define HASPERSDC to be ``.lsof_%l'' (note the lower case `l'), Configure and remake lsof, then the personal device cache file path will be ``.lsof_'' plus the host name returned by gethostname(2). See the 00DCACHE file of the lsof distribution for more information on the formation of the personal device cache file path and the use of the HASPERSDC #define. 17.2.3 How do I put the personal device cache file in /tmp? Change the HASPERSDC definition in your dialect's machine.h header file. When you redefine HASPERSDC, make sure you put at least one user identification conversion in it to keep separate the device cache files for each user of lsof. Also give some thought to including the ``%0'' conversion to define an alternate path for setuid-root and root processes. Here's a definition that puts a personal device cache file in /tmp with the name ``.lsof_UID''. #define HASPERSDC "/tmp/.lsof_%U" Thus the personal device cache file path for UID 548 would be: /tmp/.lsof_548 You can add the login name to the path with the ``%u'' conversion; the full host name with ``%l''; and the first host name component with ``%L''. CAUTION: be careful using absolute paths like /tmp lest lsof processes that are setuid-root or whose real UID is root be used to exploit some security weakness via /tmp. Elect instead to add an alternate path for those processes with the ``%0'' conversion. Here's an extension of the previous HASPERSDC format for /tmp that declares an alternate path: #define HASPERSDC "/tmp/.lsof_%U%0%h/.lsof_%l" When the lsof process is setuid-root or its real UID is root, presuming root's home directory is `/' and the host's name is ``vic.cc.purdue.edu'', the extended format yields: /.lsof_vic.cc.purdue.edu 17.3 Why doesn't lsof know about AFS files on my favorite dialect? Lsof currently supports AFS for these dialects: AIX 4.1.4 (AFS 3.4a) HP-UX 9.0.5 (AFS 3.4a) Linux 1.2.13 (AFS 3.3) NEXTSTEP 3.2 (AFS 3.3) Solaris 2.[56] (AFS 3.4a) SunOS 4.1.4 (AFS 3.3a) Ultrix 4.2 RISC (AFS 3.2b) It may recognize AFS files on other versions of these dialects, but I have no way to test that. Lsof may report correct information for AFS files on other dialects, but I can't test that either. AFS support must be custom crafted for each UNIX dialect and then tested. If lsof supports your favorite dialect, but doesn't recognize its AFS files, probably I don't have access to a test system. If you want AFS support badly for your dialect, consider helping me do the development and testing. 17.3.1 Why doesn't lsof report node numbers for all AFS volume files, or how do I reveal dynamic module addresses to lsof? When AFS is implemented via dynamic kernel modules -- e.g., in NEXTSTEP or SunOS -- lsof can't obtain the addresses of AFS variables in the kernel that it uses to identify AFS vnodes. It can guess that a vnode is assigned to an AFS file and it can obtain other information about AFS files, but it has trouble computing AFS volume node numbers. To determine node numbers for AFS volumes other than the root volume, /afs, lsof needs access to a hashed volume structure pointer table. When it can't find the address of that table, because AFS support is implemented via dynamic kernel modules, lsof will return blanks in the INODE column for AFS volume files. Lsof can identify the root volume's node number (0), and can compute the node numbers for all other AFS files. If you have a name list file that contains the addresses of the AFS dynamic modules -- e.g., you saved SunOS module symbols when you created a loadable module kernel with modload(8) by specifying -sym -- lsof may be able to find the kernel addresses it needs in that file. Lsof looks up AFS dynamic kernel addresses for two dialects at these default paths: NEXTSTEP 3.2 /usr/vice/etc/afs_loadable SunOS 4.1.4 /usr/vice/etc/modload/libafs A different path to a name list file with AFS dynamic kernel addresses may be specified with the -A option, when the -A option description appears in lsof's -h or -? (help) output. If any addresses appear in the -A name list file that also appear in the regular kernel name list file -- e.g., /vmunix -- they must match, or lsof will silently ignore the -A addresses on the presumption that they are out of date.