Wednesday, January 22, 2025

Conflict in Kernel Path Assignment and Device Mapping for Multipath LUN /dev/mapper/mpXXX

 

Problem Overview

The issue stemmed from a conflict with the multipath device /dev/mapper/mpathch. The kernel assigned identical paths (e.g., sdmg, sdfl, sdmd, sdft) to a new LUN, resulting in duplicate device mappings and access conflicts. This caused errors during device scans and prevented the cleanup of the mpathch device.

Errors Observed

  1. During pvscan, errors were encountered when reading the device:


    Error reading device /dev/mapper/mpathch at 0 length 512. Error reading device /dev/mapper/mpathch at 0 length 4. Error reading device /dev/mapper/mpathch at 4096 length 4.

    These errors indicated that the device could not be properly accessed or read by the system, likely due to the path conflicts.

  2. dmsetup info -c revealed that:

    • The device /dev/mapper/mpathch was still active with paths assigned.
    • The logical volume vgexport-lvexport was in use, blocking further actions on mpathch.

Resolution Steps

  1. Checked Active Devices:

    • Used dmsetup info -c to identify active devices and locate mpathch and associated logical volumes.

    dmsetup info -c | grep mpathch dmsetup info -c | grep lv
  2. Removed the Blocking Logical Volume:

    • Identified and removed the logical volume vgexport-lvexport, which was preventing the unmapping of mpathch.
    dmsetup remove vgexport-lvexport
  3. Forcefully Removed the Multipath Device:

    • Used dmsetup remove -f to forcibly delete the mpathch device from the device-mapper layer.

    dmsetup remove -f mpathch

Validation

  • Verified that mpathch and its associated paths were no longer present using:
    dmsetup info -c
    multipath -ll
  • Confirmed the system was no longer referencing the conflicting LUN and informed the storage team for reassignment or cleanup.
  • After the reload, the new LUN became visible in the system.

Conclusion

The problem was caused by duplicate kernel path assignments for a new LUN, which conflicted with existing device mappings and caused read errors during device scans. By removing the blocking logical volume and forcefully unmapping the multipath device, the issue was resolved, and the system was returned to a clean state.

Tuesday, October 8, 2024

Kernel Panic During Oracle Cluster Testing on RHEL 8

 

Hey everyone,

I wanted to share some "exciting" issues I ran into while testing an Oracle cluster on RHEL 8—you know, just your everyday kernel panic to spice things up!

The Issue

While conducting tests, I decided to bring down a couple of network interfaces (enp43s1f8 and enp43s1f9) using nmcli. Little did I know that right after deactivating them, I would be treated to a lovely kernel panic, logged as follows:


Oct 8 16:52:24 serverA kernel: sysrq: SysRq : Trigger a crash Oct 8 16:52:24 serverA kernel: Kernel panic - not syncing: sysrq triggered crash

This surprise happened even with the iSCSI service (iscsi.service) disabled. Apparently, the system thought it was a great time to throw a party!

What I Found

  1. Dispatcher Scripts:

    • I found out that a script (04-iscsi) in /usr/lib/NetworkManager/dispatcher.d/ was doing its own thing and triggering actions whenever the network state changed. It was like that overly enthusiastic colleague who jumps in during a meeting and derails the conversation!
  2. Fixing the Issue:

    • To bring back some sanity, I temporarily removed or renamed the 04-iscsi script. After that, I was able to bring down the interfaces without causing the system to have a meltdown. Who knew a little housekeeping could go such a long way?
  3. For Future Tests:

    • Always use nmcli to gracefully deactivate connections; it’s less dramatic than a kernel panic!
    • Review any service dependencies to make sure nothing throws a tantrum when you change network states.

This little adventure reminded me of the importance of understanding how network management scripts and services like iSCSI interact—especially when they seem to have a mind of their own. By being proactive and keeping a sense of humor, we can avoid these surprises in the future.

Thursday, October 3, 2024

Resolving Oracle ASM Disk Configuration Issues After Node Reboots

Problem Summary:

Each time Node B is rebooted, two disks fail to automatically configure in Oracle ASM (Automatic Storage Management). These disks—ORA01_DSK2 and ORA01_DSK3—require manual intervention to be configured correctly after the system starts. This issue is evident from the output of the oracleasm scandisks command, where these disks are listed as valid but need to be instantiated manually.

Even though the disks are recognized as valid ASM disks, they are not automatically configured during the boot process, requiring the execution of oracleasm scandisks manually to bring them online.

Diagnosis:

Several potential causes could explain this issue:

  1. Service Dependency Misconfiguration: The Oracle ASM service (oracleasm.service) might be starting before the udev service, which manages device initialization, completes its work. This would result in some disks not being available when ASM performs its scan.
  2. Disk Initialization Timing: Some disks may take longer to be detected by the system, causing them to be unavailable when Oracle ASM performs its automatic scan.

Solution Applied:

To address the problem, the following changes were made to the oracleasm service configuration:

  1. Modifying the oracleasm Service: The configuration file for the Oracle ASM service (/etc/systemd/system/multi-user.target.wants/oracleasm.service) was updated to ensure that Oracle ASM starts after the udev service has fully initialized all devices. The following line was added to the configuration:

    After=systemd-udevd.service

    This ensures that the system’s disks are available by the time Oracle ASM performs its scan.

  2. Adding a Startup Timeout: A 120-second startup timeout was added using the TimeoutStartSec=120s directive. This allows Oracle ASM enough time to detect and configure all disks properly, even if disk initialization takes longer than usual. This prevents ASM from failing due to delayed disk recognition.




Conclusion:

By modifying the Oracle ASM service to depend on udev and increasing the timeout for service startup, the issue of disks not being automatically configured after a reboot was resolved. This solution ensures that all ASM disks are properly recognized and configured during the boot process, eliminating the need for manual configuration.

Monday, February 27, 2023

Typosquatting a New Malicious Python Packages in PyPi

PyPI (Python Package Index) is the official repository for Python packages. It is used by developers and users worldwide to find and install Python packages. However, PyPI has been targeted by attackers who uploaded malicious packages to the repository.

Trojanized PyPI packages are Python packages that have been modified by attackers to include malicious code. These packages are usually uploaded with names similar to popular packages, so users might not notice the difference. When users download and install these packages, the malicious code gets executed on their systems, and attackers can use it to steal data or take control of the affected systems.

Cybersecurity researchers are warning of "imposter packages" mimicking popular libraries available on the Python Package Index (PyPI) repository.

The 41 malicious PyPI packages have been found to pose as typosquatted variants of legitimate modules such as HTTP, AIOHTTP, requests, urllib, and urllib3. The names of the packages are as follows:

aio5, aio6, htps1, httiop, httops, httplat, httpscolor, httpsing, httpslib, httpsos, httpsp, httpssp, httpssus, httpsus, httpxgetter, httpxmodifier, httpxrequester, httpxrequesterv2, httpxv2, httpxv3, libhttps, piphttps, pohttp, requestsd, requestse, requestst, ulrlib3, urelib3, urklib3, urlkib3, urllb, urllib33, urolib3, xhttpsp

Finally, as developers should frequently conduct security assessments of third-party libraries and other dependencies in their code.  as Valentić from reversinglabs say.

PyPI advised any users who think they've been compromised to contact security@pypi.org with details about the sender email address and URL of the malicious site to help administrators to respond to this issue.

Here is a simple python script, i deployed via ansible, i used pkg_resources.get_distribution() to check if some of thoose 41 packages are installed.

Friday, April 29, 2022

systemd-tmpfiles to manage temporary files and directories on CentOS/RHEL

A modern system requires a large amount of temporary files and directories. Some applications (and users) use the /tmp directory to store temporary data, while others use a more specific location for the task, such as daemon and user-specific volatile directories in /run. In this context, volatile means that the file system that stores these files only exists in memory. When the system restarts or loses power, all content in volatile storage will disappear.

To keep a system running smoothly, it's necessary for these directories and files to be created when they don't exist, as daemons and scripts may rely on them being there, and for old files to be deleted so they don't fill up disk space or provide incorrect information. CentOS/RHEL 7 and later versions include a new tool called systemd-tmpfiles, which provides a structured and configurable method for managing temporary files and directories. This service is run by a timer unit that queries the systemd's temporary daemon and runs it 15 minutes after the system startup and then every 24 hours from that moment.



The configuration files are located in different places and follow a hierarchical priority process, with the files having the following priority order.
1.   /etc/tmpfiles.d/*.conf
2.   /run/tmpfiles.d/*.conf
3.   /usr/lib/tmpfiles.d/*.conf
The files in /usr/lib/tmpfiles.d/ are provided by relevant RPM packages and should not be edited.
The files under /run/tmpfiles.d/ are themselves volatile files, typically used by daemons to manage their own runtime temporary files.
The files in /etc/tmpfiles.d/ are intended for administrators to configure custom temporary locations and override default values provided by the vendor.
If a file in /run/tmpfiles.d/ has the same file name as a file in /usr/lib/tmpfiles.d/, then the file in /run/tmpfiles.d/ is used. If a file in /etc/tmpfiles.d/ has the same file name as a file in /run/tmpfiles.d/ or /usr/lib/tmpfiles.d/, then the file in /etc/tmpfiles.d/ is used.

Friday, July 31, 2020

Systemd automatically unmount filesystem mounted

A curious thing in RHEL7 (it has kind of a cache in the systemd for mounted FS)

To recreate the problemto i ha´d to change the LV linked to the FS /opt/MyfslogSD and /opt /MyfsWalComprSD after the changes in my the fstab file i re-mount the FS, but checking the log i realise that the systemd tries to unmount the FS over and over again until hours later, when there are no processes running inside.

Jul 2 14:54:17 Server1 systemd: Unit opt-MyfslogSD.mount is bound to inactive unit dev-vgdata-lvMyfslogSD.device. Stopping, too.
Jul 2 14:54:17 Server1 systemd: Unmounting /opt/MyfslogSD...
Jul 2 14:54:17 Server1 umount: (In some cases useful info about processes that use
Jul 2 14:54:17 Server1 umount: the device is found by lsof(8) or fuser(1))
Jul 2 14:54:17 Server1 systemd: opt-MyfslogSD.mount mount process exited, code=exited status=32
Jul 2 14:54:17 Server1 systemd: Failed unmounting /opt/MyfslogSD.
Jul 2 14:54:17 Server1 systemd: Unit opt-MyfslogSD.mount is bound to inactive unit dev-vgdata-lvMyfslogSD.device. Stopping, too.
Jul 2 14:54:17 Server1 systemd: Unmounting /opt/MyfslogSD...
Jul 2 14:54:17 Server1 umount: (In some cases useful info about processes that use
Jul 2 14:54:17 Server1 umount: the device is found by lsof(8) or fuser(1))
Jul 2 14:54:17 Server1 systemd: opt-MyfslogSD.mount mount process exited, code=exited status=32
Jul 2 14:54:17 Server1 systemd: Failed unmounting /opt/MyfslogSD.


[root@Server1 ~]# systemctl --all | grep opt-Postgres
opt-MyfsBackupSD.mount                        loaded    active   mounted   /opt/MyfsBackupSD
opt-MyfsDataSD.mount                          loaded    active   mounted   /opt/MyfsDataSD
opt-MyfsLogSD.mount                           loaded    active   mounted   /opt/MyfsLogSD
opt-MyfsScriptsSD.mount                       loaded    active   mounted   /opt/MyfsScriptsSD
opt-MyfsWalComprSD.mount                      loaded    inactive mounted   /opt/MyfsWalComprSD

opt-MyfslogSD.mount                           loaded    inactive mounted   /opt/MyfslogSD



After altering fstab one should either run systemctl daemon-reload (this makes systemd to reparse /etc/fstab and pick up the changes) or reboot.

[root@Server1 ~]# systemctl --all | grep opt-Postgres
opt-MyfsBackupSD.mount                        loaded    active   mounted   /opt/MyfsBackupSD
opt-MyfsDataSD.mount                          loaded    active   mounted   /opt/MyfsDataSD
opt-MyfsLogSD.mount                           loaded    active   mounted   /opt/MyfsLogSD
opt-MyfsScriptsSD.mount                       loaded    active   mounted   /opt/MyfsScriptsSD
opt-MyfsWalComprSD.mount                      loaded    active   mounted   /opt/MyfsWalComprSD

Thursday, June 25, 2020

Replacing a Boot Mirrored Disk in HP-UX 11.31 (11i v3)

Initialize boot information on the replacement disk.

Save the hardware paths to the disk.
MyHPUX01:(/root/home/root)(root)#ioscan -m lun /dev/disk/disk7
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health  Description
======================================================================
disk      7  64000/0xfa00/0x1   esdisk  CLAIMED     DEVICE       online  HP      DG146BB976 
             0/4/1/0.0x5000c5000c7bc53d.0x0
                      /dev/disk/disk7      /dev/disk/disk7_p3   /dev/rdisk/disk7_p2
                      /dev/disk/disk7_p1   /dev/rdisk/disk7     /dev/rdisk/disk7_p3
                      /dev/disk/disk7_p2   /dev/rdisk/disk7_p1


In My case, the disk to be replaced is at lunpath hardware path
LUN hardware path is 64000/0xfa00/0x1
lunpath hardware path is 0/4/1/0.0x5000c5000c7bc53d.0x0

disk is hot-swappable

Halt LVM access to the disk.

#pvchange -a N /dev/disk/disk7_p2


Determine the new LUN instance number for the replacement disk.
# ioscan -m lun

- Create a partition description file:
# vi /tmp/partitionfile
3
EFI 500MB
HPUX 100%
HPSP 400MB

idisk -wf /tmp/partitionfile /dev/rdisk/disk-newdisk-


           -w   Enable write mode.  By default, idisk operates in read-only
                mode.  To create and write partition information to the disk
                you must specify the -w option.



- Create the new device files for the new partitions (disk28_p1,_p2_p3)
# insf -e Cdisk

#you should see the numbre of partition
# ioscan -m lun


Now assign the old instance number to the replacement disk.
# io_redirect_dsf -d /dev/disk/disk-old- -n /dev/disk/disk-new-

# ioscan -m lun /dev/disk/disk-new-

The LUN representation of the old disk with LUN hardware path 64000/0xfa00/0x0 was
removed. The LUN representation of the new disk with LUN hardware path
64000/0xfa00/0x1c was reassigned from LUN instance disk-new- to LUN instance 14 and its device
special files were renamed as /dev/disk/disk14 and /dev/rdisk/disk14.


#Use efi_fsinit(1M) to initialize the FAT filesystem on the EFI partition:

efi_fsinit -d /dev/rdisk/disk7_p1**

efi_fsinit -d /dev/rdisk/disk7_p3

mkboot -e -l /dev/rdisk/disk7
efi_ls -d /dev/rdisk/disk7_p1

(to check EFI)
lifls -l /dev/rdisk/disk7_p2


(to check LIF)
- Check the content of AUTO file on EFI partition:

# efi_cp -d /dev/rdisk/disk7_p1 -u /EFI/HPUX/AUTO /tmp/x
# cat /tmp/x
boot vmunix
NOTE: Specify the -lq option if prefer that your system boots up without
interruption in case of a disk failure:
on the original boot disk:
# mkboot -a "boot vmunix -lq" /dev/rdisk/disk7


Restore LVM configuration information to the new disk.

For example:

# vgcfgrestore -n /dev/vg00 /dev/rdisk/disk7_p2

10. Restore LVM access to the disk.
If you did not reboot the system in Step 2, reattach the disk as follows:

# vgchange -a y /dev/vg00
# vgdisplay -v vg00
# vgdisplay -v vg00

Syncronize volume group data (only if sync does not start automatically):

# cd /tmp
# nohup vgsync /dev/vg00 &
(output see /tmp/nohup.out)

11. Initialize/check boot information on the disk.
- Check if content of LABEL file (i.e. root, boot, swap and dump device definition) has been
initialized (done by lvextend) on the mirror disk:

# lvlnboot -v