Linux Notes: Managing HPE RAID hardware

  1. The information presented here is intended for educational use by qualified computer technologists.
  2. The information presented here is provided free of charge, as-is, with no warranty of any kind.
Created: 2023-12-31

HPE / HP Drives: SAS for MSA (more alphabet soup)

hpe-sff-disk-old
Older SFF Adapter
hpe-sff-disk-new
Newer SFF Adapter

HP/HPE Acronyms:

Notes:

Managing hardware RAID volumes on HP/HPE systems

date: 2018-12-20 (updated: 2021-05-20)

Executive Summary: There are only three ways (that I'm aware of) to manage h/w RAID on HPE systems
  1. access ORCA (Option Rom Configuration for Arrays) from firmware during any boot
    • mostly limited to ADD / DELETE
  2. access SSA (Smart Storage Administrator) after booting HPE Firmware + Diagnostics (from either USB or DVD)
    •  SSA at this point lets you do whatever you want (very dangerous yet powerful)
    • this media is only available with a support contract and goes by the name Service Pack for ProLiant (SPP)
  3. access SSA (Smart Storage Administrator Command Line Interpretor) or SSACLI from with Linux
    • will not be able to DELETE logical volumes once they have been associated with Linux devices under /dev

1) GUI-based SSA Utility (way cool tool)

Steps:
	1.  login as root from the graphical front console
	2.  download all files related to ssa-2.65-7.0.x86_64.rpm from https://support.hpe.com 
	3.  rpm -i  ssa-2.65-7.0.x86_64.rpm
	4.  /usr/sbin/ssa -local (Firefox auto opens with a beautiful colored diagram of your
	    RAID config). See page 18 of this manual 
	5.  /usr/sbin/ssa -help  (view all available command-line switches)
---------------------------------------------------------------------------
Tips:	1. I have used this tool to convert a volume from "8-disk RAID-60" to "8-disk RAID-0" on the fly
		This requires several hours and would definitely impact server performance
	2. my next experiment was to convert from "8-disk RAID-0" to "4-disk RAID-0" on the fly
		I didn't even know this was possible (would not work if the volume was full)

2) CLI-based SSA Utility (great for scripting)

Steps:
 
	1. login as root from anywhere
	2. rpm -i ssacli-2.65-7.0.x86_64.rpm
	3. then just type "ssacli" (my typing is in blue)
	4. Notice that the drive in bay-8 is marked "Predictive Failure"
 
###############################################################################################
 
[root@localhost ~]# ssacli
Smart Storage Administrator CLI 2.65.7.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.
 
=> ctrl all show   		# firmware sensitive; does not work on all platforms

   this is the only way to see which controllers were found (slot #0 means embedded)

=> set target ctrl slot=0 	# or: set target ctrl all
# or: set target ctrl first => show config   Smart Array P420i in Slot 0 (Embedded) (sn: 001438024F5D170) Port Name: 1I Port Name: 2I    Internal Drive Cage at Port 1I, Box 2, OK Internal Drive Cage at Port 2I, Box 2, OK   Array A (SAS, Unused Space: 0 MB)    logicaldrive 1 (1.1 TB, RAID 60, OK)   physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SAS HDD, 300 GB, OK) physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SAS HDD, 300 GB, OK) physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SAS HDD, 300 GB, OK) physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SAS HDD, 300 GB, OK) physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SAS HDD, 300 GB, OK) physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SAS HDD, 300 GB, OK) physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SAS HDD, 300 GB, OK) physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SAS HDD, 300 GB, Predictive Failure)   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 380 (WWID: 5001438024F5D17F)   => show status   Smart Array P420i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK     => show config detail bla...bla...bla... drive details bla...bla...bla...   => exit [root@localhost ~]#

BASH Scripts (tested on CentOS-7)

#!/bin/bash
#=============================================================================
# title  : raid_monitor.sh
# purpose: inspect the health of drives not visible to Linux
# notes  : meant to be run from root since ssacli is not SUDO friendly
#        : this script will be run 3-times a day from crontab
# history:
# NSR 20190906 1. original effort
# NSR 20190911 2. more work
# NSR 20190917 3. minor fix in cleanup
# NSR 20191104 4. moved logging to /var/log
# NSR 20200306 5. now do not stop on error (needed if ssacli is not installed)
#=============================================================================
set -vex			# tron (v=verbose, e=stop-on-error, x=display data)
STUB="raid_monitor-"
YADA="/var/log/"${STUB}$(date +%Y%m%d.%H%M%S)".trc"
echo "-i-diverting output to file: "${YADA}
exec 1>>${YADA}
exec 2>&1
set +e				# do not stop on errors (in this script)
echo "-i-starting: "${0}" at "$(date +%Y%m%d.%H%M%S)
rm -f raid_monitor.tmp
# ssacli is installed with RPM
ssacli ctrl slot=0 show config > raid_monitor.tmp
saved_status=$?
echo "-i-saved_status:"$saved_status
if [ $saved_status != 0 ];
then
#   mail -s "RAID Problem" neil,neil@kawc09.on.bell.ca,neil@kawc96.on.bell.ca <<< "-e-could not execute SSACLI"
    # note: ats_adm_list is an alias defined here: /etc/aliases
    mail -s "RAID Problem-01 on host: "$HOSTNAME ats_adm_list <<< "-e-could not execute SSACLI"
    mail -s "RAID Problem-01 on host: "$HOSTNAME root         <<< "-e-could not execute SSACLI"
    exit
fi
# this next script will analyze "raid_monitor.tmp"
/root/raid_analyze_file.sh
saved_status=$?
echo "-i-saved_status:"$saved_status
if [ $saved_status != 0 ];
then
    # note: ats_adm_list is an alias defined here: /etc/aliases
    mail -s "RAID Problem-02 on host: "$HOSTNAME ats_adm_list <<< "-e-one or more drives are not 100% healthy"
    mail -s "RAID Problem-02 on host: "$HOSTNAME root         <<< "-e-one or more drives are not 100% healthy"
    exit
fi
#mail -s "RAID Test OKAY host: "$HOSTNAME root                <<< "-i-test OKAY"
#-----------------------------------------------------------------------------
#find /var/log -name ${STUB}"*.trc" -a -mtime +2 -exec ls -la {} \;
find /var/log -name ${STUB}"*.trc" -a -mtime +2 -exec rm {} \;
echo "-i-exiting:  "${0}" at "$(date +%Y%m%d.%H%M%S)
#

#!/bin/bash
#=============================================================================
# script : raid_analyze_file.sh
# author : Neil Rieck
# created: 2019-09-06
# purpose: 1) Reads a text file (searching for some key words)
#          2) this script is called by /root/raid_monitor.sh
#=============================================================================
#set -vex				# verbose, stop-on-error, xpand
echo "-i-starting script: "${0}
MYFILE="./raid_monitor.tmp"		# hard coded fname
echo "-i-reading: "${MYFILE}
declare -i line
declare -i good
declare -i bad
line=0
good=0
bad=0
while IFS='' read -r LINE || [[ -n ${LINE} ]]; do
#    echo "-i-data: "${LINE}
if [[ ${LINE} == *"physical"* ]];
then
  ((line=line+1))
  # test for: "Predicive Failure" or "Failed"
  if [[ ${LINE} == *"Fail"* ]];
  then
    echo "-i-bad  data: "${LINE}
    ((bad=bad+1))
  fi
  if [[ ${LINE} == *"OK)"* ]];
  then
    echo "-i-good data: "${LINE}
    ((good=good+1))
  fi
fi
done < ${MYFILE}
echo "-i-testing has concluded"
echo "-i-report card"
echo "-i-  lines:"$line
echo "-i-  bad  :"$bad
echo "-i-  good :"$good
#
# a little martial arts so we get the best exit value
#
if [ ${line} -eq 0 ] || [ ${bad} -gt 0 ] || [ ${line} -ne ${good} ];
then
   echo "-w-problems were detected"
   rc=99
else
   echo "-i-all is well"
   rc=0
fi
echo "-i-will exit with code: "$rc
echo "-i-exiting script: "${0}
exit $rc
 
Back to Home
Neil Rieck
Waterloo, Ontario, Canada.