2
0
mirror of https://github.com/munin-monitoring/contrib.git synced 2018-11-08 00:59:34 +01:00
contrib-munin/plugins/emc/emc_vnx_block_lun_perfdata

450 lines
13 KiB
Plaintext
Raw Normal View History

2016-11-14 18:19:02 +01:00
#!/bin/bash
2016-12-26 18:13:41 +01:00
: <<=cut
=head1 NAME
2017-01-19 00:42:53 +01:00
emc_vnx_block_lun_perfdata - Plugin to monitor Block statistics of EMC VNX 5300
Unified Storage Processors
2016-12-26 18:13:41 +01:00
=head1 AUTHOR
Evgeny Beysembaev <megabotva@gmail.com>
=head1 LICENSE
GPLv2
=head1 MAGIC MARKERS
#%# family=auto
#%# capabilities=autoconf
=head1 DESCRIPTION
2017-01-19 00:42:53 +01:00
The plugin monitors LUN of EMC Unified Storage FLARE SP's. Probably it can also
be compatible with other Clariion systems. It uses SSH to connect to Control
Stations, then remotely executes /nas/sbin/navicli and fetches and parses data
from it. Obviously, it's easy to reconfigure plugin not to use Control Stations'
navicli in favor of using locally installed /opt/Navisphere's cli. There is no
difference which Storage Processor to use to gather data, so this plugin tries
both of them and uses the first active one. This plugin also automatically
chooses Primary Control Station from the list by calling /nasmcd/sbin/getreason
and /nasmcd/sbin/t2slot.
2016-12-26 18:13:41 +01:00
2017-01-19 00:42:53 +01:00
I left some parts of this plugin as rudimental to make easy to reconfigure it
to draw more (or less) data.
2016-12-26 18:13:41 +01:00
=head1 COMPATIBILITY
2017-01-19 00:42:53 +01:00
The plugin has been written for being compatible with EMC VNX5300 Storage
system, as this is the only EMC storage which i have. By the way, i am pretty
sure it can also work with other VNX1 storages, like VNX5100 and VNX5500, and
old-style Clariion systems.
About VNX2 series, i don't know whether the plugin will be able to work with
them. Maybe it would need some corrections in command-line backend. The same
situation is with other EMC systems, so i encourage you to try and fix the
plugin.
2016-11-14 18:19:02 +01:00
2016-12-26 18:13:41 +01:00
=head1 CONFIGURATION
=head2 Prerequisites
2017-01-19 00:42:53 +01:00
First of all, be sure that statistics collection is turned on. You can do this
by typing:
2016-12-26 18:13:41 +01:00
navicli -h spa setstats -on
on your Control Station or locally through /opt/Navisphere
2017-01-19 00:42:53 +01:00
Also, the plugin actively uses buggy "cdef" feature of Munin 2.0, and here we
can be hit by the following bugs:
http://munin-monitoring.org/ticket/1017 - Here I have some workarounds in the
plugin, be sure that they are working.
http://munin-monitoring.org/ticket/1352 - Metrics in my plugin can be much
longer than 15 characters.
Without these workarounds "Load" and "Queue Length" would not work.
2016-12-26 18:13:41 +01:00
=head2 Installation
2017-01-19 00:42:53 +01:00
The plugin uses SSH to connect to Control Stations. It's possible to use
'nasadmin' user, but it would be better if you create read-only global user by
Unisphere Client. The user should have only Operator role. I created "operator"
user but due to the fact that Control Stations already had one internal
"operator" user, the new one was called "operator1". So be careful.
2016-12-26 18:13:41 +01:00
2017-01-19 00:42:53 +01:00
On munin-node side choose a user which will be used to connect through SSH.
Generally user "munin" is ok. Then, execute "sudo su munin -s /bin/bash",
"ssh-keygen" and "ssh-copy-id" to both Control Stations with newly created
2016-12-26 18:13:41 +01:00
user.
2017-01-19 00:42:53 +01:00
Make a link from /usr/share/munin/plugins/emc_vnx_dm_basic_stats to
/etc/munin/plugins/emc_vnx_dm_basic_stats_<NAME>, where <NAME> is any
arbitrary name of your storage system. The plugin will return <NAME> in its
answer as "host_name" field. Assume your storage system is called "VNX5300".
2016-12-26 18:13:41 +01:00
2017-01-19 00:42:53 +01:00
Make a configuration file at
/etc/munin/plugin-conf.d/emc_vnx_block_lun_perfdata_VNX5300:
2016-12-26 18:13:41 +01:00
[emc_vnx_block_lun_perfdata_VNX5300]
2017-01-19 00:42:53 +01:00
user munin
env.username operator1
env.cs_addr 192.168.1.1 192.168.1.2
Where:
user - SSH Client local user
env.username - Remote user with Operator role
env.cs_addr - Control Stations addresses
2016-12-26 18:13:41 +01:00
=head1 ERRATA
2017-01-19 00:42:53 +01:00
It counts Queue Length in not fully correct way. We take parameters totally
from both SP's, but after we divide them independently by load of SPA and SPB.
Anyway, in most AAA / ALUA cases the formula is correct.
2016-12-26 18:13:41 +01:00
=head1 HISTORY
09.11.2016 - First Release
26.12.2016 - Compatibility with Munin coding style
=cut
2016-11-14 18:19:02 +01:00
export LANG=C
2017-01-19 01:10:26 +01:00
. "$MUNIN_LIBDIR/plugins/plugin.sh"
2016-11-14 18:19:02 +01:00
TARGET=$(echo "${0##*/}" | cut -d _ -f 6)
2017-01-19 00:48:58 +01:00
# "All SP's we have"
2016-11-15 16:32:39 +01:00
SPALL="SPA SPB"
NAVICLI="/nas/sbin/navicli"
2016-11-14 18:19:02 +01:00
2017-01-19 00:48:58 +01:00
ssh_check_cmd() {
2017-01-19 00:51:55 +01:00
ssh -q $username@$1 "/nasmcd/sbin/getreason | grep -w slot_\`/nasmcd/sbin/t2slot\` | cut -d- -f1"
2016-12-26 20:08:26 +01:00
}
2016-11-14 18:19:02 +01:00
2016-12-26 20:08:26 +01:00
check_conf () {
if [ -z "$username" ]; then
echo "No username ('username' environment variable)!"
return 1
fi
2016-11-14 18:19:02 +01:00
2016-12-26 20:08:26 +01:00
if [ -z "$cs_addr" ]; then
2017-01-19 00:51:55 +01:00
echo "No control station addresses ('cs_addr' environment variable)!"
2016-12-26 20:08:26 +01:00
return 1
2016-11-14 18:19:02 +01:00
fi
2016-12-26 20:08:26 +01:00
#Choosing Cotrol Station. Code have to be "10"
for CS in $cs_addr; do
2017-01-19 18:25:10 +01:00
if [[ "10" -eq "$(ssh_check_cmd $CS)" ]]; then
2016-12-26 20:08:26 +01:00
PRIMARY_CS=$CS
break
fi
done
if [ -z "$PRIMARY_CS" ]; then
echo "No alive primary Control Station from list \"$cs_addr\"";
return 1
fi
return 0
}
if [ "$1" = "autoconf" ]; then
check_conf_ans=$(check_conf)
2017-01-19 00:51:55 +01:00
if [ $? -eq 0 ]; then
echo "yes"
else
echo "no ($check_conf_ans)"
fi
exit 0
2016-12-26 20:08:26 +01:00
fi
2017-01-19 01:10:03 +01:00
check_conf 1>&2
2016-12-26 20:08:26 +01:00
if [[ $? -eq 1 ]]; then
2017-01-19 00:51:55 +01:00
exit 1;
2016-11-14 18:19:02 +01:00
fi
SSH="ssh -q $username@$PRIMARY_CS "
2017-01-19 01:10:03 +01:00
get_working_sp() {
local probe_sp
for probe_sp in $SPALL; do
2017-01-19 18:25:10 +01:00
if $SSH $NAVICLI -h $probe_sp >/dev/null 2>&1; then
2017-01-19 01:10:03 +01:00
echo "$probe_sp"
return 0
fi
done
}
2016-11-15 16:32:39 +01:00
2017-01-19 01:10:03 +01:00
StorageProcessor=$(get_working_sp)
2017-01-19 18:25:10 +01:00
[ -z "$StorageProcessor" ] && echo "No active Storage Processor found!" >&2 && exit 1
NAVICLI="$NAVICLI -h $StorageProcessor"
2017-01-19 01:10:03 +01:00
run_remote_navicli() {
2017-01-19 18:25:10 +01:00
$SSH $NAVICLI "$@"
}
run_remote_ssh() {
$SSH "$@"
2017-01-19 01:10:03 +01:00
}
2016-11-14 18:19:02 +01:00
# Get Lun List
2017-01-19 18:25:10 +01:00
LUNLIST=$(run_remote_navicli "lun -list -drivetype | sed -ne 's/^Name:\ *//p'")
2016-11-14 18:19:02 +01:00
2017-01-19 01:10:03 +01:00
echo "host_name ${TARGET}"
echo
2016-11-14 18:19:02 +01:00
if [ "$1" = "config" ] ; then
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_blocks
graph_category disk
graph_title EMC VNX 5300 LUN Blocks
graph_vlabel Blocks Read (-) / Written (+)
graph_args --base 1000
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_read.label none
${LUN}_read.graph no
${LUN}_read.min 0
${LUN}_read.draw AREA
${LUN}_read.type COUNTER
${LUN}_write.label $LUN Blocks
${LUN}_write.negative ${LUN}_read
${LUN}_write.type COUNTER
${LUN}_write.min 0
${LUN}_write.draw STACK
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_req
graph_category disk
graph_title EMC VNX 5300 LUN Requests
graph_vlabel Requests: Read (-) / Write (+)
graph_args --base 1000
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_readreq.label none
${LUN}_readreq.graph no
${LUN}_readreq.min 0
${LUN}_readreq.type COUNTER
${LUN}_writereq.label $LUN Requests
${LUN}_writereq.negative ${LUN}_readreq
${LUN}_writereq.type COUNTER
${LUN}_writereq.min 0
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_ticks
graph_category disk
graph_title EMC VNX 5300 Counted Load per LUN
graph_vlabel Load, % * Number of LUNs
graph_args --base 1000 -l 0 -r
2016-12-26 21:31:37 +01:00
EOF
echo -n "graph_order "
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2017-01-19 00:51:55 +01:00
echo -n "${LUN}_busyticks ${LUN}_idleticks ${LUN}_bta=${LUN}_busyticks_spa ${LUN}_idleticks_spa ${LUN}_btb=${LUN}_busyticks_spb ${LUN}_idleticks_spb "
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
echo ""
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_busyticks_spa.label $LUN Busy Ticks SPA
${LUN}_busyticks_spa.type COUNTER
${LUN}_busyticks_spa.graph no
${LUN}_bta.label $LUN Busy Ticks SPA
${LUN}_bta.graph no
${LUN}_idleticks_spa.label $LUN Idle Ticks SPA
${LUN}_idleticks_spa.type COUNTER
${LUN}_idleticks_spa.graph no
${LUN}_busyticks_spb.label $LUN Busy Ticks SPB
${LUN}_busyticks_spb.type COUNTER
${LUN}_busyticks_spb.graph no
${LUN}_btb.label $LUN Busy Ticks SPB
${LUN}_btb.graph no
${LUN}_idleticks_spb.label $LUN Idle Ticks SPB
${LUN}_idleticks_spb.type COUNTER
${LUN}_idleticks_spb.graph no
${LUN}_load_spa.label $LUN load SPA
${LUN}_load_spa.draw AREASTACK
${LUN}_load_spb.label $LUN load SPB
${LUN}_load_spb.draw AREASTACK
${LUN}_load_spa.cdef 100,${LUN}_bta,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/,*
${LUN}_load_spb.cdef 100,${LUN}_btb,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/,*
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_outstanding
graph_category disk
graph_title EMC VNX 5300 Sum of Outstanding Requests
graph_vlabel Requests
graph_args --base 1000
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_outstandsum.label $LUN
${LUN}_outstandsum.type COUNTER
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_nonzeroreq
graph_category disk
graph_title EMC VNX 5300 Non-Zero Request Count Arrivals
graph_vlabel Count Arrivals
graph_args --base 1000
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_nonzeroreq.label $LUN
${LUN}_nonzeroreq.type COUNTER
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_trespasses
graph_category disk
graph_title EMC VNX 5300 Trespasses
graph_vlabel Trespasses
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_implic_tr.label ${LUN} Implicit Trespasses
${LUN}_explic_tr.label ${LUN} Explicit Trespasses
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
2016-12-26 21:31:37 +01:00
cat <<-EOF
multigraph emc_vnx_block_queue
graph_category disk
graph_title EMC VNX 5300 Counted Block Queue Length
graph_vlabel Length
2016-12-26 21:31:37 +01:00
EOF
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_busyticks_spa.label ${LUN}
${LUN}_busyticks_spa.graph no
${LUN}_busyticks_spa.type COUNTER
${LUN}_idleticks_spa.label ${LUN}
${LUN}_idleticks_spa.graph no
${LUN}_idleticks_spa.type COUNTER
${LUN}_busyticks_spb.label ${LUN}
${LUN}_busyticks_spb.graph no
${LUN}_busyticks_spb.type COUNTER
${LUN}_idleticks_spb.label ${LUN}
${LUN}_idleticks_spb.graph no
${LUN}_idleticks_spb.type COUNTER
${LUN}_outstandsum.label ${LUN}
${LUN}_outstandsum.graph no
${LUN}_outstandsum.type COUNTER
${LUN}_nonzeroreq.label ${LUN}
${LUN}_nonzeroreq.graph no
${LUN}_nonzeroreq.type COUNTER
${LUN}_readreq.label ${LUN}
${LUN}_readreq.graph no
${LUN}_readreq.type COUNTER
${LUN}_writereq.label ${LUN}
${LUN}_writereq.graph no
${LUN}_writereq.type COUNTER
2016-12-26 21:31:37 +01:00
EOF
2017-01-19 01:32:27 +01:00
# Queue Length SPA = ((Sum of Outstanding Requests SPA - NonZero Request Count Arrivals SPA / 2)/(Host Read Requests SPA + Host Write Requests SPA))*
# (Busy Ticks SPA/(Busy Ticks SPA + Idle Ticks SPA)
# We count together SPA and SPB, although it is not fully corrext
2016-12-26 21:31:37 +01:00
cat <<-EOF
${LUN}_ql_l_a.label ${LUN} Queue Length SPA
${LUN}_ql_l_a.cdef ${LUN}_outstandsum,${LUN}_nonzeroreq,2,/,-,${LUN}_readreq,${LUN}_writereq,+,/,${LUN}_busyticks_spa,*,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/
${LUN}_ql_l_b.label ${LUN} Queue Length SPB
${LUN}_ql_l_b.cdef ${LUN}_outstandsum,${LUN}_nonzeroreq,2,/,-,${LUN}_readreq,${LUN}_writereq,+,/,${LUN}_busyticks_spb,*,${LUN}_busyticks_spb,${LUN}_idleticks_spb,+,/
2016-12-26 21:31:37 +01:00
EOF
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2017-01-19 01:32:27 +01:00
exit 0
2016-11-14 18:19:02 +01:00
fi
2016-12-26 17:45:00 +01:00
#Preparing big complex command to SP's to have most work done remotely.
2017-01-19 18:25:10 +01:00
#BIGSSHCMD="$SSH"
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
FILTERLUN="$(clean_fieldname "$LUN")"
2017-01-19 18:25:10 +01:00
BIGSSHCMD+="$NAVICLI lun -list -name $LUN -perfData |
2017-01-19 01:10:03 +01:00
sed -ne 's/^Blocks Read\:\ */${FILTERLUN}_read.value /p;
s/^Blocks Written\:\ */${FILTERLUN}_write.value /p;
s/Read Requests\:\ */${FILTERLUN}_readreq.value /p;
s/Write Requests\:\ */${FILTERLUN}_writereq.value /p;
s/Busy Ticks SP A\:\ */${FILTERLUN}_busyticks_spa.value /p;
s/Idle Ticks SP A\:\ */${FILTERLUN}_idleticks_spa.value /p;
s/Busy Ticks SP B\:\ */${FILTERLUN}_busyticks_spb.value /p;
s/Idle Ticks SP B\:\ */${FILTERLUN}_idleticks_spb.value /p;
s/Sum of Outstanding Requests\:\ */${FILTERLUN}_outstandsum.value /p;
s/Non-Zero Request Count Arrivals\:\ */${FILTERLUN}_nonzeroreq.value /p;
s/Implicit Trespasses\:\ */${FILTERLUN}_implic_tr.value /p;
s/Explicit Trespasses\:\ */${FILTERLUN}_explic_tr.value /p;
2017-01-19 18:25:10 +01:00
' ;"
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2017-01-19 18:25:10 +01:00
ANSWER=$(run_remote_ssh "$BIGSSHCMD")
2017-01-19 01:32:27 +01:00
get_answer_field() {
echo "$ANSWER" | grep -F "_${1}."
}
2017-01-19 18:25:10 +01:00
#ANSWER=$BIGSSHCMD
2016-11-14 18:19:02 +01:00
echo "multigraph emc_vnx_block_blocks"
2017-01-19 01:32:27 +01:00
get_answer_field "read"
get_answer_field "write"
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_req"
2017-01-19 01:32:27 +01:00
get_answer_field "readreq"
get_answer_field "writereq"
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_ticks"
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2017-01-19 01:33:44 +01:00
#Will count these values later, using cdef
2016-11-14 18:19:02 +01:00
echo "${LUN}_load_spa.value 0"
echo "${LUN}_load_spb.value 0"
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2017-01-19 01:32:27 +01:00
get_answer_field "busyticks_spa"
get_answer_field "idleticks_spa"
get_answer_field "busyticks_spb"
get_answer_field "idleticks_spb"
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_outstanding"
2017-01-19 01:32:27 +01:00
get_answer_field "outstandsum"
2016-11-15 16:32:39 +01:00
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_nonzeroreq"
2017-01-19 01:32:27 +01:00
get_answer_field "nonzeroreq"
2016-11-15 16:32:39 +01:00
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_trespasses"
2017-01-19 01:32:27 +01:00
get_answer_field "implic_tr"
get_answer_field "explic_tr"
2016-11-14 18:19:02 +01:00
echo -e "\nmultigraph emc_vnx_block_queue"
# Queue Length
2017-01-19 01:32:27 +01:00
get_answer_field "busyticks"
get_answer_field "idleticks"
get_answer_field "outstandsum"
get_answer_field "nonzeroreq"
get_answer_field "readreq"
get_answer_field "writereq"
2016-11-14 18:19:02 +01:00
while read -r LUN ; do
2017-01-19 01:10:03 +01:00
LUN="$(clean_fieldname "$LUN")"
2017-01-19 01:33:44 +01:00
#Will count these values later, using cdef
2016-11-14 18:19:02 +01:00
echo "${LUN}_ql_l_a.value 0 "
echo "${LUN}_ql_l_b.value 0 "
2016-12-26 21:50:15 +01:00
done <<< "$LUNLIST"
2016-11-14 18:19:02 +01:00
exit 0