Tuesday, October 8, 2024

Kernel Panic During Oracle Cluster Testing on RHEL 8

 

Hey everyone,

I wanted to share some "exciting" issues I ran into while testing an Oracle cluster on RHEL 8—you know, just your everyday kernel panic to spice things up!

The Issue

While conducting tests, I decided to bring down a couple of network interfaces (enp43s1f8 and enp43s1f9) using nmcli. Little did I know that right after deactivating them, I would be treated to a lovely kernel panic, logged as follows:


Oct 8 16:52:24 serverA kernel: sysrq: SysRq : Trigger a crash Oct 8 16:52:24 serverA kernel: Kernel panic - not syncing: sysrq triggered crash

This surprise happened even with the iSCSI service (iscsi.service) disabled. Apparently, the system thought it was a great time to throw a party!

What I Found

  1. Dispatcher Scripts:

    • I found out that a script (04-iscsi) in /usr/lib/NetworkManager/dispatcher.d/ was doing its own thing and triggering actions whenever the network state changed. It was like that overly enthusiastic colleague who jumps in during a meeting and derails the conversation!
  2. Fixing the Issue:

    • To bring back some sanity, I temporarily removed or renamed the 04-iscsi script. After that, I was able to bring down the interfaces without causing the system to have a meltdown. Who knew a little housekeeping could go such a long way?
  3. For Future Tests:

    • Always use nmcli to gracefully deactivate connections; it’s less dramatic than a kernel panic!
    • Review any service dependencies to make sure nothing throws a tantrum when you change network states.

This little adventure reminded me of the importance of understanding how network management scripts and services like iSCSI interact—especially when they seem to have a mind of their own. By being proactive and keeping a sense of humor, we can avoid these surprises in the future.

Thursday, October 3, 2024

Resolving Oracle ASM Disk Configuration Issues After Node Reboots

Problem Summary:

Each time Node B is rebooted, two disks fail to automatically configure in Oracle ASM (Automatic Storage Management). These disks—ORA01_DSK2 and ORA01_DSK3—require manual intervention to be configured correctly after the system starts. This issue is evident from the output of the oracleasm scandisks command, where these disks are listed as valid but need to be instantiated manually.

Even though the disks are recognized as valid ASM disks, they are not automatically configured during the boot process, requiring the execution of oracleasm scandisks manually to bring them online.

Diagnosis:

Several potential causes could explain this issue:

  1. Service Dependency Misconfiguration: The Oracle ASM service (oracleasm.service) might be starting before the udev service, which manages device initialization, completes its work. This would result in some disks not being available when ASM performs its scan.
  2. Disk Initialization Timing: Some disks may take longer to be detected by the system, causing them to be unavailable when Oracle ASM performs its automatic scan.

Solution Applied:

To address the problem, the following changes were made to the oracleasm service configuration:

  1. Modifying the oracleasm Service: The configuration file for the Oracle ASM service (/etc/systemd/system/multi-user.target.wants/oracleasm.service) was updated to ensure that Oracle ASM starts after the udev service has fully initialized all devices. The following line was added to the configuration:

    After=systemd-udevd.service

    This ensures that the system’s disks are available by the time Oracle ASM performs its scan.

  2. Adding a Startup Timeout: A 120-second startup timeout was added using the TimeoutStartSec=120s directive. This allows Oracle ASM enough time to detect and configure all disks properly, even if disk initialization takes longer than usual. This prevents ASM from failing due to delayed disk recognition.




Conclusion:

By modifying the Oracle ASM service to depend on udev and increasing the timeout for service startup, the issue of disks not being automatically configured after a reboot was resolved. This solution ensures that all ASM disks are properly recognized and configured during the boot process, eliminating the need for manual configuration.