Patch Name: PHKL_16216 Patch Description: s700 10.10 Adv. VxFS, JFS cumulative patch Creation Date: 98/11/11 Post Date: 98/11/23 Hardware Platforms - OS Releases: s700: 10.10 Products: N/A Filesets: AdvJournalFS.VXFS-ADV-KRN JournalFS.VXFS-BASE-KRN OS-Core.CORE-KRN Automatic Reboot?: Yes Status: General Superseded Critical: No (superseded patches were critical) PHKL_15079: PANIC PHKL_8733: CORRUPTION PHKL_8311: CORRUPTION PHKL_11209: PANIC PHKL_10170: HANG PHKL_10134: HANG PHKL_9836: HANG PHKL_9567: HANG PHKL_8823: HANG CORRUPTION PHKL_7935: PANIC PHKL_7580: CORRUPTION PHKL_7207: ABORT PHKL_7017: PANIC Path Name: /hp-ux_patches/s700/10.X/PHKL_16216 Symptoms: PHKL_16216: Systems with PHKL_15079 prevents the panic, however it does not go through normal I/O path if code returns ERETRY. PHKL_15079: Systems with advanced JFS and low memory will panic with message "Interrupt Type 17 (Non-accessed data TLB miss)" while trying to access a page with space id of -1. This happens on systems with very low memory ONLY while an application tries to read/write large amounts of data using direct I/O instead of regular I/O. PHKL_9413: Executables residing on a VxFS (JFS) file-system would no longer execute after being marked for VX_DIRECT operation. A typical scenario would be as follow: # cd /vxfs # cp /usr/bin/ksh . # ./ksh # cc vxdirect.c -o vxdirect # ./vxdirect ./ksh # ./ksh ./ksh: ./ksh: cannot execute See the vxfsio(7) man pages for details on VX_DIRECT. PHKL_8733: Under OnlineJFS, when very large writes (e.g. >phymem/16) are done to JFS file(s), old data can appear in the middle of the newly written file(s). PHKL_8311: Data corruption can occur if HP OnLineJFS is installed and a very large write (over a megabyte) is done to a JFS file. A sequence of 1 - 8191 zeros can appear in the middle of the data just written. PHKL_16508: sar -c reports incorrect values for rchar/s and wchar/s for VxFS PHKL_12858: Performance for block devices is poor in comparison to 9.xx release. PHKL_12563: On a VxFS filesystem with quota turned on, users who do not have quota set should not have a quota limit. However, they receive the error message "write: Disk quota exceeded" when creating files on this file system. This message should not be displayed in this situation. PHKL_11504: Loading a program for a second and subsequent time takes much longer if the program is using shared libraries on JFS. PHKL_11469: Quota command shows poor performance on a busy system under JFS. PHKL_11209: With a system running a heavy load using JFS on LVM, the following panic may occur: "lv_syncx: returning error: 5" PHKL_10170: KI queuedone, enqueue and queuestart traces on JFS may contain NULL values in the b_apid and b_upid fields. Systems running JFS may hang due to a deadlock problem. The setuid bit will be removed on a JFS file when the file is edited or text has been appended to it when run as root. PHKL_10134: Customer is running processes which access memory mapped files. Once in a while, these processes deadlock. Any other processes attempting to access the memory mapped file hang as well. PHKL_9836: A deadlock can occur on system running LVM, JFS and HFS. The hang was introduced by one process running lvmerge on HFS logical volumes and another process running umount on a JFS logical volume. This deadlock can only occur with the following scenario : (1). Process A is running a lvmerge or a lvsplit on a HFS logical volume. (2). Process B is running a mount, umount or sync on a JFS logical volume. PHKL_9709: Each time edquota -t is invoked for a VxFS file system, it resets the previously defined file system time limit back to default (7 days). PHKL_9567: This patch addresses 2 distinct VxFS (JFS) symptoms: - When trying to take a file-system snapshot, the mount command could fail with the following error message: # mount -F vxfs -o snapof=/dev/vg00/vxonline \ /dev/vg00/vxbackup /vxbackup vxfs mount: /dev/vg00/vxbackup is already mounted, /vxbackup is busy, or allowable number of mount points exceeded - The system could hang when manipulating directories. PHKL_9265: When MMF activity on VxFS files is very high for a given process (like a process doing a lot of mmap access), then the vhand process may want to pageout some pages onto the VxFS file. On very rare occasion, this pageout process was in a situation were the pageout write can't be satisfied without waiting another ressource (like memory). Then, since vhand can't wait, the page was marked zomb, and later a fault on that page from the process made that process killed by the OS. PHKL_8823: When using edquota the effective user id in the credential structure would sometimes be corrupted. Also when using chown for certain user IDs, the command would fail. PHKL_8349: "vxfs: mesg 008: vx_direrr - /xxx file system inode x \ block y error 22" followed by erroneous indications that the filesystem is corrupted. PHKL_7935: "panic: data page fault", when using fsadm to resize a mounted VxFS filesystem with disk quotas. PHKL_7580: (1) Applications using ftruncate(2) on VxFS files could possibly loose data. This problem was reported with the Empress database. (2) msync(2) with the MS_SYNC flag on VxFS memory map files did not work as documented. Stale data could be found in the buffer cache when resuming file system operations, possibly resulting in data corruption. (3) Poor system performance when directories containing shared libraries, for example /usr, reside on a VxFS file-system. PHKL_7207: attempting to remove linked text file when original file is busy gets ETXTBSY PHKL_7017: This fixes two separate VxFS (JFS) problems. 1) trap type 15 in vx_iget 2) O_DSYNC is ignored for JFS filesystems PHKL_6991: Systems with /usr on a VxFS file-system were experiencing poor performance. PHKL_6953: VxFS reports "No space left on device" when reaching quota limit rather than "Disc quota exceeded" over NFS Defect Description: PHKL_16216: The error code return from vx_dio_read1 should be checked against VX_ERETRY so that it will fall through the normal code path and do a normal I/O. PHKL_15079: Systems with low memory while an application tries to read/write large chunks of data will panic. This is because VxFS for performance reasons tries to perform a direct I/O. In order to achieve this, the user stack has to be grown to the amount of data requested for read/write. In case vx_dio_iovec() is passed a negative space id (meaning there is no free page left), there is no check for a 0xffffffff (-1) space id. The fix was to perform a useracc() to find out if the process has permissions to write to the page or not. If it fails, then do not perform the direct I/O and instead perform a regular I/O. NOTE: ---- After the fix, running the same application on systems with low memory and davanced JFS to perform huge read/write "should" fail with the following message: "Pid xx received a SIGSEGV for stack growth failure. Possible causes: insufficient memory or swap space, or stack size exceeded maxssiz." -- Where xx is the process id of the application performing read/write system call. PHKL_9413: The execve() kernel routine was asking for a KERNEL IO to read in the a.out header, but the VxFS code handling direct IOs (VX_DIRECT) was generating a USER IO. PHKL_8733: OnLineJFS breaks large write requests into multiple direct I/O's. Depending on the size of the data block, the beginnning and end of a direct I/O request may not align on the block boundaries. In these cases, the data is handled through buffer cache. After the first direct I/O, the subsequent iteration may begin with a write that has data starting in the middle of the buffer. If this write passes the current EOF, the buffer is simply allocated and filled with new data. If this buffer happens to be one that previously used to hold old data, the old data remains in the portion that is not overwritten by the new data. Writing this buffer to disk corrupts the file. The fix is to check against the correct file size so the first buffer of the subsequent iteration will be read in from disk to contain the correct data written in the last iteration. PHKL_8311: The problem can be reproduced with this test program: char buf[2097152]; main() { int fd; memset((void *)buf, 'A', sizeof(buf)); fd = creat("A.dat", 0644); write(fd, buf, 512); write(fd, buf, sizeof(buf)); } Every byte of the file should contain the character 'A'. This works fine on UFS. It also works on VxFS as long as HP OnLineJFS is not installed. But with HP OnLineJFS, a sequence of null bytes appears in the middle of the file (not at the boundary between the writes, but in the middle of the second write). PHKL_16508: the kernel was not incrementing the read/write counters for vxfs in vx_bread(), vx_btranwrite(), vx_bcacheclear(), vx_blkinval(), vx_vnode_flush(), vx_reada_chain(), vx_flush_chain(), and vx_fast_read(). The following script can be used to reproduce this defect: #!/bin/sh sync;sync;sync sleep 10 sar -o sar.$$ 1 60 & sleep 2 date >cmdlog.$$ echo "starting copy from HFS to HFS." >>cmdlog.$$ cp /stand/vmunix /stand/vmunix.$$ date >>cmdlog.$$ sync;sync;sync sleep 10 echo >>cmdlog.$$ rm /stand/vmunix.$$ date >>cmdlog.$$ echo "starting copy from HFS to VxFS." >>cmdlog.$$ cp /stand/vmunix vmunix.$$ date >>cmdlog.$$ sync;sync;sync sleep 10 echo >>cmdlog.$$ date >>cmdlog.$$ echo "starting copy from VxFS to VxFS." >>cmdlog.$$ cp vmunix.$$ vmunix2.$$ date >>cmdlog.$$ sync;sync;sync sleep 10 rm vmunix.$$ vmunix2.$$ PHKL_12858: In 10.xx buffer caching was disabled for block devices. This produced degraded performance in reads/writes to block devices. PHKL_12563: An uninitialized variable, depending on what value it picks up from the stack, causes the quota checking routine to return EDQUOT erroneously during extent allocation. PHKL_11504: Performance enhancement to JFS which only invalidates pages in the buffer cache if the corresponding pages in the MMF are dirty, rather than the entire file, as was done previously. PHKL_11469: The quota command uses quotactl(Q_SYNC, NULL, 0, NULL) to update the quota usage file on all quota active file systems. For each VxFS file system, the quota sync operation flushes all transactions and writes quota information to the disk file synchronously. On a system with heavy I/O, this results long delay. The fix is to flush the transaction log only and use asynchronous I/O for disk file update. PHKL_11209: A panic may occur with JFS on LVM due to an inode being able to change identity before it and its dirty pages are flushed to disk in vx_freeze_iflush(). PHKL_10170: KI problem: The JFS buffer allocation and IO paths were not fully instrumented causing buffer header b_apid and b_upid fields not to be updated consistently. The resulting KI queuedone, enqueue, and queuestart traces contain NULL values in these fields. System can deadlock due to a locking order problem in JFS when vx_fast_read() is called from VOP_BREAD. When a JFS file is created with the SETUID flag, the setuid bit is removed when the file has been edited with vi or text has been appended to it; this should only be the case when the writer is not root. PHKL_10134: The problem corrected is a deadlock caused by procedure vm_wait_for_io being called with the iglock being held and releasing the region lock prior to sleeping. The deadlock is thus caused by another process being able to get the region lock and waiting for the iglock. The fix is now to call vm_wait_for_io at the the end of vx_pageout after the iglock has been released. PHKL_9836: A deadlock resulted from a process running lvmerge on HFS logical volumes, and another process running umount on a JFS logical volume. The umount process grabs the JFS update sleep lock (used to serialize JFS syncs/mounts/ umounts), calls spec_close to close the device we are unmounting, and eventually gets to a LVM close routine which is sleeping waiting to acquire the LVM volume_group lock. The lvmerge process is holding the LVM volume-group lock and proceeds to call freeze_and_sync_fs_dev() to freeze and sync the file system associated with the device. The routine ufs_freeze() is first called which in turn calls walk_and_freeze_fs() without a pointer to a vfs structure. This proves faulty since update is now called without a vfsp and will proceed to try and sync every mounted file system instead of just the file system being frozen. So we proceeded to try and sync a JFS file system which first tries to grab the JFS update sleep lock, and a deadlock occurs. This problem can be reproduced by having one process running a lvsplit or lvmerge on a HFS logical volume, and another process running a mount, unmount or sync on a JFS logical volume. The fix for this problem is to pass the vfsp to walk_and_freeze_fs() from ufs_freeze instead of the do_sync argument. The routine walk_and_freeze_fs() now uses vfsp when it calls update(). PHKL_9709: VxFS quota routine vx_getquota() resets the time limit for root because it thinks root should not have a quota limit. Somehow it ignores the fact that the timelimit fields in root's dqblk structure are used to store the file system time limit. PHKL_9567: This patch fixes two different VxFS (JFS) defects: - A snapshot could not be mounted if a process was waiting arbitrarily long for a file record lock. An application using lockf() or fcntl() to get file record locks, and holding the locks for a long period of time, could prevent from mounting a file-system snapshot. - The VxFS rmdir(2) routine could run into a deadlock situation where the directory would be kept locked. Processes attempting to access the locked directory would then wait forever, and eventually this could cause the entire system to hang. PHKL_9265: Under MMF high presure, vx_do_pageio called from vhand incorectly marked a page as r_zomb when EAGAIN occurs on that page. This as the side effect of killing a process that do a fault on that page later on. PHKL_8823: The "edquota" defect was due to an extra parameter being incorrectly passed when calling procedure vx_read1 from vx_dqextred. The "chown" defect was due to an uninitialized field (ex_elen) in the vx_extent structure when allocated by the vx_dqnewid proedure. PHKL_8349: This problem was mainly seen on striped logical volumes. If multiple processes were scanning VxFS directories via commands like ls, find, or cpio, they could cause VxFS to erroneously assume the filesystem is corrupt, making it impossible to remount it until fscked. There would also be errors in the syslog referring to vx_direrr. The defect was in a lack of caching of offsets within the directory block; if the offset changed at an inopportune time, the directory read would fail and the filesystem would be marked corrupt. PHKL_7935: Resizing VxFS filesystems online effectively does quick unmounts and remounts of the filesystem, switching quickly between the two different data areas containing the filesystem structure information. The VxFS disk quota tracking structures were not updated during the switch, with the end result that the disk quota code was accessing invalid memory. The fix was to update the disk quota structures during the switch. PHKL_7580: (1) The VxFS file truncation code was breaking an assumption in brealloc() causing delayed-write buffers to be discarded instead of being flushed to disk. (2) A "purge buffer cache" was not performed by the VxFS pageout code. Stale data could then be found in the buffer cache when resuming file-system operations after a msync(2). (3) VxFS used to purge the buffer cache at mmap(2) time, and the Dynamic Loader (dld.sl) suffered poor performance with shared-libraries residing a VxFS file-system. The fix was to purge the buffer cache at pageout time, and to flush it at pagein time. The previous fix (PHKL_6991) introduced the potential for data corruption, since not invalidating (e.g not purging) meant possibly getting stale data from valid old buffers. Defects #2 and #3 are fixed in 10.20, but #1 is fixed 10.30. PHKL_7207: VxFS forgot to check if nlink is 1 PHKL_7017: JFS neglected to check for the O_DSYNC flag. It only checked for O_SYNC. In vx_iget, the code dereferenced a NULL pointer. PHKL_6991: When creating a memory mapped file, VxFS was flushing and invalidating the file-related buffers from the buffer cache. This behavior caused the dynamic loader (dld.sl) to generate a physical I/O each time it was reading a shared library header before calling mmap(), and shared library headers were never found in the buffer cache. The fix was to only flush (writing dirty buffers) and not do the invalidation. PHKL_6953: Incorrect "No space left on device" errors are generated when the filesystem is not actually full. The filesystem in question is a VxFS filesystem mounted over NFS from another system with quotas enabled on the server. The message occurs when a user reaches the hard limit on the mounted directory. This is caused by the VxFS code in HP-UX interpreting a class of filesystem space allocation failures all as ENOSPC. The fix was to correect this misinterpretation. With this patch installed, when a user exceeds his quota, the error on his terminal will be "Disk quota exceeded". SR: 1653150698 1653161471 1653162297 1653166066 1653166983 1653170464 1653177089 1653180810 1653182857 1653183699 1653186502 1653194555 1653216077 1653250423 4701309070 4701329292 4701329300 4701329441 4701346650 4701357673 5003311837 5003317487 5003328237 5003336933 5003344184 5003348425 5003363523 5003410423 Patch Files: /usr/conf/lib/libhp-ux.a(spec_vnops.o) /usr/conf/lib/libufs.a(ufs_vfsops.o) /usr/conf/lib/libvxfs_adv.a(vx_dio.o) /usr/conf/lib/libvxfs_base.a(vx_bio.o) /usr/conf/lib/libvxfs_base.a(vx_bio1.o) /usr/conf/lib/libvxfs_base.a(vx_bsdquota.o) /usr/conf/lib/libvxfs_base.a(vx_chain.o) /usr/conf/lib/libvxfs_base.a(vx_dirl.o) /usr/conf/lib/libvxfs_base.a(vx_iflush.o) /usr/conf/lib/libvxfs_base.a(vx_inode.o) /usr/conf/lib/libvxfs_base.a(vx_mount.o) /usr/conf/lib/libvxfs_base.a(vx_rdwri.o) /usr/conf/lib/libvxfs_base.a(vx_vfsops.o) /usr/conf/lib/libvxfs_base.a(vx_vm.o) /usr/conf/lib/libvxfs_base.a(vx_vnops.o) what(1) Output: /usr/conf/lib/libhp-ux.a(spec_vnops.o): spec_vnops.c $Date: 97/10/13 15:16:58 $ $Revision: 1 .9.89.8 $ PATCH_10.10 (PHKL_12858) /usr/conf/lib/libufs.a(ufs_vfsops.o): ufs_vfsops.c $Date: 97/10/13 15:24:21 $ $Revision: 1 .16.89.18 $ PATCH_10.10 (PHKL_12858) /usr/conf/lib/libvxfs_adv.a(vx_dio.o): vx_dio.c $Date: 98/11/10 17:46:38 $ $Revision: 1.3.8 9.13 $ PATCH_10.10 (PHKL_16216) /usr/conf/lib/libvxfs_base.a(vx_bio.o): vx_bio.c $Date: 98/09/22 06:57:25 $ $Revision: 1.3.89.14 $ PATCH_10.10 (PHKL_16508) /usr/conf/lib/libvxfs_base.a(vx_bio1.o): vx_bio1.c $Date: 98/09/22 07:03:11 $ $Revision: 1.3.89.12 $ PATCH_10.10 (PHKL_16508) /usr/conf/lib/libvxfs_base.a(vx_bsdquota.o): vx_bsdquota.c $Date: 97/09/15 11:17:26 $ $Revision: 1.3.89.15 $ PATCH_10.10 (PHKL_12563) /usr/conf/lib/libvxfs_base.a(vx_chain.o): vx_chain.c $Date: 98/09/22 07:07:57 $ $Revisio n: 1.3.89.12 $ PATCH_10.10 (PHKL_16508) /usr/conf/lib/libvxfs_base.a(vx_dirl.o): vx_dirl.c $Date: 96/08/20 17:44:39 $ $Revision: 1.3.89.5 $ PATCH_10.10 (PHKL_8349) /usr/conf/lib/libvxfs_base.a(vx_iflush.o): vx_iflush.c $Date: 97/05/28 13:06:32 $ $Revision: 1.3.89.9 $ PATCH_10.10 (PHKL_11209) /usr/conf/lib/libvxfs_base.a(vx_inode.o): vx_inode.c $Date: 97/05/28 12:59:38 $ $Revision: 1.3.89.12 $ PATCH_10.10 (PHKL_11209) /usr/conf/lib/libvxfs_base.a(vx_mount.o): vx_mount.c $Date: 97/10/13 15:23:10 $ $Revision: 1.3.89.11 $ PATCH_10.10 (PHKL_12858) /usr/conf/lib/libvxfs_base.a(vx_rdwri.o): vx_rdwri.c $Date: 98/11/10 15:31:00 $ $Revision: 1.3.89.20 $ PATCH_10.10 (PHKL_16216) /usr/conf/lib/libvxfs_base.a(vx_vfsops.o): vx_vfsops.c $Date: 97/06/30 13:05:38 $ $Revision: 1.3.89.10 $ PATCH_10.10 (PHKL_11469) /usr/conf/lib/libvxfs_base.a(vx_vm.o): vx_vm.c $Date: 97/06/24 12:25:15 $ $Revision: 1.3.89.22 $ PATCH_10.10 (PHKL_11504) /usr/conf/lib/libvxfs_base.a(vx_vnops.o): vx_vnops.c $Date: 96/12/17 18:12:40 $ $Revision: 1.3.89.17 $ PATCH_10.10 (PHKL_9567) cksum(1) Output: 2107010728 17104 /usr/conf/lib/libhp-ux.a(spec_vnops.o) 1492688428 20656 /usr/conf/lib/libufs.a(ufs_vfsops.o) 33358044 9876 /usr/conf/lib/libvxfs_adv.a(vx_dio.o) 4215924464 10412 /usr/conf/lib/libvxfs_base.a(vx_bio.o) 1662106929 4784 /usr/conf/lib/libvxfs_base.a(vx_bio1.o) 1322127986 27196 /usr/conf/lib/libvxfs_base.a(vx_bsdquota.o) 1591367483 5084 /usr/conf/lib/libvxfs_base.a(vx_chain.o) 1838912048 9152 /usr/conf/lib/libvxfs_base.a(vx_dirl.o) 345917004 26660 /usr/conf/lib/libvxfs_base.a(vx_iflush.o) 214054025 38468 /usr/conf/lib/libvxfs_base.a(vx_inode.o) 467511319 19480 /usr/conf/lib/libvxfs_base.a(vx_mount.o) 4040440703 27024 /usr/conf/lib/libvxfs_base.a(vx_rdwri.o) 3602244426 13824 /usr/conf/lib/libvxfs_base.a(vx_vfsops.o) 2916949551 10840 /usr/conf/lib/libvxfs_base.a(vx_vm.o) 1606142415 24720 /usr/conf/lib/libvxfs_base.a(vx_vnops.o) Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_6953 PHKL_6991 PHKL_7017 PHKL_7207 PHKL_7580 PHKL_7935 PHKL_8311 PHKL_8349 PHKL_8733 PHKL_8823 PHKL_9265 PHKL_9413 PHKL_9567 PHKL_9709 PHKL_9836 PHKL_10134 PHKL_10170 PHKL_11209 PHKL_11469 PHKL_11504 PHKL_12563 PHKL_12858 PHKL_15079 PHKL_16508 Equivalent Patches: PHKL_16214: s700: 10.01 PHKL_16215: s800: 10.01 Patch Package Size: 340 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_16216 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_16216.depot 5b. For a homogeneous NFS Diskless cluster run swcluster on the server to install the patch on the server and the clients: swcluster -i -b This will invoke swcluster in the interactive mode and force all clients to be shut down. WARNING: All cluster clients must be shut down prior to the patch installation. Installing the patch while the clients are booted is unsupported and can lead to serious problems. The swcluster command will invoke an swinstall session in which you must specify: alternate root path - default is /export/shared_root/OS_700 source depot path - /tmp/PHKL_16216.depot To complete the installation, select the patch by choosing "Actions -> Match What Target Has" and then "Actions -> Install" from the Menubar. 5c. For a heterogeneous NFS Diskless cluster: - run swinstall on the server as in step 5a to install the patch on the cluster server. - run swcluster on the server as in step 5b to install the patch on the cluster clients. By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_16216. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. Warning: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_16216.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_16216.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: This is a JFS patch that should not be installed on systems using the OmniStorage product. If you are using the OmniStorage product then using the methods taken to receive this patch please obtain the equivalent OmniStorage/JFS patch. If you cannot locate the patch, please contact your local HP support entity. This is a JFS patch that should not be installed on systems using the OmniStorage product. If you are using the OmniStorage product then using the methods taken to receive this patch please obtain the equivalent OmniStorage/JFS patch. If you cannot locate the patch, please contact your local HP support entity. Due to the number of objects in this patch, the customization phase of the update may take more than 10 minutes. During that time the system will not appear to make forward progress, but it will actually be installing the objects.