Agenda: exercises-nagios.txt

File exercises-nagios.txt, 29.2 KB (added by admin, 8 years ago)
Line 
1
2Nagios Installation and Configuration
3
4Notes:
5------
6* Commands preceded with "$" imply that you should execute the command as
7  a general user - not as root.
8* Commands preceded with "#" imply that you should be working as root.
9* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
10  imply that you are executing commands on remote equipment, or within
11  another program.
12
13Exercises
14---------
15
16Exercises Part I
17----------------
18
190. Log in to your PC or open a terminal window as the sysadmin user.
20
21   Note that at this stage, your PC is connected to BOTH networks.
22   
23   It will have two IP addresses, as explained by the instructor.  Even so,
24   use the *CURRENT* IP/name (pcX.mgmt) to contact your PC for now.
25
261. You could install Nagios version 3. You would do this as root or as the sysadmin
27   user and use the "sudo" command:
28
29    # apt-get install nagios3
30
31   Unless you already have an MTA installed, nagios3 will install
32   postfix as a dependency. Select "Internet Site" option. (If you had wanted
33   to use a different MTA like you'd install it before nagios3)
34
35   You will be prompted for nagiosadmin password. Give it the normal
36   workshop password.
37
38   To get the documentation in /usr/share/doc/nagios3-doc/html/ (which
39   can also be read via the nagios web interface), do:
40
41    # apt-get install nagios3-doc
42
43
442. Look at the file which contains the password. It's hashed (encrypted)
45
46    # cat /etc/nagios3/htpasswd.users
47
48
493. You should already have a working Nagios!
50
51    - Open a browser, and go to
52
53    http://pcX.mgmt/nagios3/
54
55        Check with the instructor or your neighbor if you are in doubt.
56
57    - At the login prompt, login as:
58
59        user: nagiosadmin
60        pass:
61
62    Browse to the "Host Detail" page to see what's already configured.
63
64
654. Let's look at the configuration layout...
66
67    # cd /etc/nagios3
68    # ls -l
69
70    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
71    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
72    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
73    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
74    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
75    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
76    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
77    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
78
79    # cd conf.d
80    # ls -l   
81
82    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
83    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
84    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
85    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
86    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
87    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
88    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
89    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
90    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
91
92    Notice that the package installs files with "nagios2" in their name.
93    This is because they are the same files as were used for the Nagios
94    version 2 Debian package. However there was a change made to the
95    host-gateway configuration file, so this has a new name.
96
97
985. You have a config which is already monitoring your own system
99(localhost_nagios2.cfg) and your upstream default gateway
100(host-gateway_nagios3.cfg).
101
102Have a look at the config file for the default gateway: it's very simple.
103(Note: tab completion is useful here. Type cat host-g then hit tab; the
104filename will be filled in for you)
105
106    # cat host-gateway_nagios3.cfg
107
108    # a host definition for the gateway of the default route
109    define host {
110            host_name   gateway
111            alias       Default Gateway
112            address     10.10.X.254
113            use         generic-host
114            }
115
116It should point to the virtual Cisco router which is upstream of your VM.
117
118
1196. You should be ssh'd into your VM on its management address
120   (10.10.0.x or pcX.mgmt). If so, it is safe to temporarily break its link to
121   the outside world, which is now running via your virtual router.
122
123   We will still be able to reach it on its management IP. (This is why we
124   build separate management LANs :-)
125
126   Break the connectivity like this, and double-check that you can no
127   longer reach anything outside.
128
129    # ifconfig eth2 down
130    # ping 10.10.254.254    # should not get any response
131    connect: Network is unreachable
132
133   Now monitor your Nagios host detail page, refresh it from time to time.
134   After a few minutes, you should see the problem detected and shown
135   in Nagios (i.e.: your Router is no longer reachable)
136
137   Once you've seen this, restore the connectivity on your VM.
138
139    # ifconfig eth2 up
140        # route add default gw 10.10.x.254                      # the default GW is lost when
141                                                                                                # doing an ifconfig down on Linux
142
143   and check the problem is cleared, within a few minutes, and that Nagios
144   has detected that your router is once again up.
145
146
147   Now, we will ask you to break the connection of your PC to the 10.10.0.0/24
148   backbone:
149
150   # ifconfig eth1 down
151
152   You wil LOSE connection to your PC at this point, but you will STILL be
153   able to reach your PC via its 10.10.x.1 IP.
154
155
156
157PART II
158Configuring Equipment
159-----------------------------------------------------------------------------
160
1610. Order of configuration
162
163Conceptually we will build our configuration files from the "nearest" device
164then the further away ones.
165
166By going in this order you will have defined the devices that act as parents
167for other devices.
168
169Your upstream Cisco virtual router (your default GW) is already defined.
170
171
1721. Let's configure Nagios to start monitoring the classroom backbone router
173and then the switch.
174
175
176    # cd /etc/nagios3/conf.d/
177
178Let's create the router:
179
180        # joe routers.cfg
181
182define host {
183    use         generic-host
184    host_name   bb-gw
185    alias       backbone gw
186    address     10.10.254.254
187    parents     gateway
188}
189
190Now, add the switch:
191
192    # joe switches.cfg
193
194define host {
195    use         generic-host
196    host_name   bb-sw
197    alias       backbone switch
198    address     10.10.0.253
199    parents     bb-gw
200}
201
202Notice the "parents" entry. This must point at a device or devices which are
203also defined.  "gateway" is already defined in host-gateway_nagios3.cfg, so
204this will work.
205
206We end up with this relationship from the point of view of Nagios:
207
208bb-sw <---- bb-gw <---- gateway <---- PC-running-nagios
209
210(this is reflected in the configuration above: bb-sw has a parent bb-gw, and
211bb-gw has a parent gateway, which is *your virtual router)
212
213
214
215STEPS 2a - 2c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION!
216   
217
2182a. Verify that your configuration files are OK:
219
220    # nagios3 -v /etc/nagios3/nagios.cfg
221
222    ... You should get :
223Warning: Host 'bb-sw' has no services associated with it!
224Warning: Host 'bb-gw' has no services associated with it!
225...
226Total Warnings: 2
227Total Errors:   0
228
229Things look okay - No serious problems were detected during the check.
230Nagios is saying that it's unusual to monitor a device just for its
231existence on the network, without also monitoring some service.
232
233
2342b. Reload/Restart Nagios
235
236    # /etc/init.d/nagios3 restart
237
238Not always 100% reliable to use the "restart" option due to a bug in the Nagios init script.
239To be sure you may want to get used to doing:
240
241    # /etc/init.d/nagios3 stop
242    # /etc/init.d/nagios3 start
243
244
2452c. Go to the web interface (http://pcX.mgmt/nagios3) and check that the hosts
246   you just added are now visible in the interface. Click on the "Host Detail" item
247   on the left of the Nagios screen to see this. You may see it in "PENDING"
248   status until the check is carried out.
249
250
251HINT: You will be doing this a lot. If you do it all on one line, like this,
252then you can hit cursor-up and rerun all in one go:
253
254    nagios3 -v /etc/nagios3/nagios.cfg && /etc/init.d/nagios3 restart
255
256The '&&' ensures that the restart only happens if the config is valid.
257
258
2593. Create entries for other routers and PCs in the classroom
260
261Now that we have our routers and switches defined it is quite easy to create
262entries for all our PCs.  Think about the parent relationships:
263
264* The parent of the NOC is the backbone switch, and then the backbone router:
265
266NOC <--- bb-sw <--- bb-gw
267
268* The parent of one of your neighbors' PCs is THEIR router, then YOUR router:
269
270 R16 <---- Rx
271 |         ^
272 V         |
273PC16     Nagios-on-your-PC
274
275... where rX is *your* router.
276
277If you are in doubt: DRAW this on paper!
278
279Below are three sample entries. One for the NOC, one for pc1 and one for
280pc16.  You should be able to use this example to create entries for all
281classroom PCs plus the NOC.
282
283We could put these entries in to separate files, but as our network is small
284we'll use a single file called pcs.cfg.
285
286NOTE! You do not add in an entry for your own PC or router. This has already
287been defined in the file /etc/nagios3/conf.d/localhost_nagios2.cfg.  This
288definition is what defines the Nagios network viewpoint. So, when you come to
289the spot where you might add an entry for your PC you should skip this and go
290on to the next PC in the list.
291
292        # joe pcs.cfg
293       
294# Our classroom NOC
295
296define host {
297    use         generic-host
298    host_name   noc
299    alias       Workshop NOC machine
300    address     10.10.0.200
301    parents     bb-gw
302}
303
304# Group 1 devices
305
306define host {
307    use         generic-host
308    host_name   r1
309    alias       pc1 router
310    address     10.10.254.1
311    parents     gateway
312}
313define host {
314    use         generic-host
315    host_name   pc1
316    alias       pc1 outside interface
317    address     10.10.1.1
318    parents     r1
319}
320
321...
322
323# Group Y devices
324
325define host {
326    use         generic-host
327    host_name   rY
328    alias       pcY router
329    address     10.10.254.Y
330    parents     gateway
331}
332define host {
333    use         generic-host
334    host_name   pcY
335    alias       pcY outside interface
336    address     10.10.Y.1
337    parents     rY
338}
339
340Take the three entries above and now expand this to create the remaining
341entries for all active PCs.
342
343Remember, not ALL PCs and routers are allocated, so check with the instructor
344which ones to monitor or if you have any questions about IP addresses, etc
345
346You can review the Network Diagram for the class linked off the classroom wiki
347main page.
348
349
350Exit and save the file pcs.cfg
351
352As before, repeat steps 2a-2c to verify your configuration, correct any
353errors, and activate it.
354
355
3565. Look at your Nagios instance on the web. Note that "Status Map" gives
357you a graphical view of the parent-child relationships you have defined.
358
359
360PART III
361Configure Service check for the classroom NOC
362-----------------------------------------------------------------------------
363
3640. Configuring
365
366Now that we have our hardware configured we can start telling Nagios what services to monitor
367on the configured hardware, how to group the hardware in interesting ways, how to group
368services, etc.
369
3701. Associate a service check for our classroom NOC
371
372    # joe hostgroups_nagios2.cfg
373
374    - Find the hostgroup named "ssh-servers". In the members section of the defintion
375      change the line:
376
377members                 localhost
378
379    to
380
381members                 localhost,noc
382
383Exit and save the file.
384
385Verify that your changes are OK:
386
387        # nagios3 -v /etc/nagios3/nagios.cfg
388       
389Restart Nagios to see the new service assocation with your host:
390
391        # /etc/init.d/nagios3 Restart
392
393Click on the "Service Detail" link in the Nagios web interface to see your new entry.
394
395
396PART IV
397Defining Services for all PCs
398-----------------------------------------------------------------------------
399
4000. For services, the default normal_check_interval is 5 (minutes) in
401   generic-service_nagios2.cfg. You may wish to change this to 1 to speed up
402   how quickly service issues are detected, at least in the workshop.
403
4041. Determine what services to define for what devices
405
406   - This is core to how you use Nagios and network monitoring tools in
407     general. So far we are simply using ping to verify that physical hosts
408     are up on our network and we have started monitoring a single service on
409     a single host (your PC). The next step is to decide what services you wish
410     to monitor for each host in the classroom.
411
412   - In this particular class we have:
413
414     routers:  running ssh and snmp
415     switches: running telnet and possibly ssh as well as snmp
416     pcs:      All PCs are running ssh and http and should be running snmp
417               The NOC is currently running an snmp daemon
418             
419     So, let's configure Nagios to check for these services for these
420     devices.
421
4222.) Verify that SSH is running on the routers and workshop PCs images
423
424   - In the file services_nagios2.cfg there is already an entry for the SSH
425     service check, so you do not need to create this step. Instead, you
426     simply need to re-define the "ssh-servers" entry in the file
427     /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file
428     looked like:
429
430# A list of your ssh-accessible servers
431define hostgroup {
432        hostgroup_name  ssh-servers
433                alias           SSH servers
434                members         localhost,noc
435        }
436
437     What do you think you should change? Correct, the "members" line. You should
438     add in entries for all the classroom pcs, routers and  the switches that run ssh.
439     With this information and the network diagram you should be able complete this entry.
440     
441     The entry will look something like this:
442
443define hostgroup {
444        hostgroup_name  ssh-servers
445                alias           SSH servers
446                members         localhost,pc1,pc2,pc3,pc4....,bb-gw
447        }
448
449         Note: leave in "localhost" - This is your PC and represents Nagios' network point of
450         view. So, for instance, if you are on "pc3" you would not include "pc3" in the list
451         of all the classroom pcs as it is represented by the "localhost" entry.
452         
453         The "members" entry will be a long line and will likely wrap on the screen.
454
455         Remember to include all your PCs.
456
457    - Once you are done, run the pre-flight check:
458
459    # nagios3 -v /etc/nagios3/nagios.cfg
460
461    If everything looks good, then restart Nagios
462
463    # /etc/init.d/nagios3 stop
464    # /etc/init.d/nagios3 start
465
466    and view your changes in the Nagios web interface.
467
4683.) Check that http is running on all the classroom PCs.
469
470    - This is almost identical to the previous exercise. Just make the change to the
471      HTTP service adding in each PC (no routers or switches). Remember, you don't need
472      to add your machine as it is already defined as "localhost".     
473
4744.)  OPTIONAL EXTRA: as opposed to just checking that a web server is
475     running on the classroom PCs, you could also check that the nagios3
476     service is available, by requesting the /nagios3/ path. This means
477     passing extra options to the check_http plugin.
478
479     For a description of the available options, type this:
480
481      # /usr/lib/nagios/plugins/check_http
482      # /usr/lib/nagios/plugins/check_http --help
483
484     and of course you can browse the online nagios documentation or google
485     for information on check_http. You can even run the plugin by hand to
486     perform a one-shot service check:
487
488     # /usr/lib/nagios/plugins/check_http -H localhost -u /nagios3/
489
490     So the goal is to configure nagios to call check_http in this way.
491
492define command{
493        command_name    check_http_arg
494        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' $ARG1$
495        }
496
497define service {
498        hostgroup_name                  nagios-servers
499        service_description             NAGIOS
500        check_command                   check_http_arg!-u /nagios3/
501        use                             generic-service
502}
503
504     and of course you'll need to create a hostgroup called nagios-servers to
505     link to this service check.
506
507     Once you have done this, check that Nagios warns you about failing
508     authentication (because it's trying to fetch the page without providing
509     the username/password). There's an extra parameter you can pass to
510     check_http_arg to provide that info, see if you can find it.
511
512      WARNING: in the tradition of "Debian Knows Best", their definition of the
513      check_http command in /etc/nagios-plugins/config/http.cfg
514      is *not* the same as that recommended in the nagios3 documentation.
515      It is missing $ARG1$, so any parameters to pass to check_http are
516      ignored. So you might think you are monitoring /nagios3/ but actually
517      you are monitoring root!
518
519     This is why we had to make a new command definition "check_http_arg".
520     You could make a more specific one like "check_nagios", or you could
521     modify the Ubuntu check_http definition to fit the standard usage.
522
523
524
525PART V
526Create More Host Groups
527-----------------------------------------------------------------------------
528
5290. In the web view, look at the pages "Hostgroup Overview", "Hostgroup
530   Summary", "Hostgroup Grid". This gives a convenient way to group together
531   hosts which are related (e.g. in the same site, serving the same purpose).
532
5331. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg
534
535    - For the following exercises it will be very useful if we have created
536      or update the following hostgroups:
537
538      debian-servers
539      routers
540      switches
541 
542      If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
543      will see an entry for debian-servers that just contains localhost.
544      Update this entry to include all the classroom PCs, including the
545      noc (this assumes that you created a "noc" entry in your pcs.cfg
546      file). Remember to skip your PC entry as it is represented by the
547      localhost entry.
548
549    # joe /etc/nagios3/conf.d/hostgroups_nagios2.cfg
550
551     Update the entry that says:
552
553
554# A list of your Debian GNU/Linux servers
555define hostgroup {
556        hostgroup_name  debian-servers
557                alias           Debian GNU/Linux Servers
558                members         localhost
559        }
560     
561      So that the "members" parameter contains something like this. Use your
562      classroom network diagram to confirm the exact number of machines and names
563      in your workshop.
564
565                members         localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9
566                                pc10,pc11,pc12,pc13,pc14,pc15,pc16,pc17,pc18
567
568        Be sure that the line wraps and is not on two separate lines. Otherwise
569        you will get an error when you go to restart Nagios. Remember that
570        your own PC is "localhost".
571
572      - Once you have done this, add in two more host groups, one for routers and
573        one for switches. Call these entries "routers" and "switches".
574
575      - When you are done be sure to verify your work and restart Nagios.
576 
5772. Go back to the web interface and look at your new hostgroups
578
579
580PART VI
581Extended Host Information ("making your graphs pretty")
582-----------------------------------------------------------------------------
583
5841. Update extinfo_nagios2.cfg
585
586    - If you would like to use appropriate icons for your defined hosts in
587      Nagios this is where you do this. We have the three types of devices:
588
589      Cisco routers
590      Cisco switches
591      Ubuntu servers
592
593      There is a fairly large repository of icon images available for you to
594      use located here:
595
596      /usr/share/nagios/htdocs/images/logos/
597
598      these were installed by default as dependent packages of the nagios3
599      package in Ubuntu. In some cases you can find model-specific icons for
600      your hardware, but to make things simpler we will use the following
601      icons for our hardware:
602
603      /usr/share/nagios/htodcs/images/logos/base/debian.*
604      /usr/share/nagios/htdocs/images/logos/cook/router.*
605      /usr/share/nagios/htdocs/images/logos/cook/switch.*
606
607    - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
608      and tell nagios what image you would like to use to represent your devices.
609
610    # joe /etc/nagios3/conf.d/extinfo_nagios2.cfg
611
612      Here is what an entry for your routers looks like (there is already an entry
613      for debian-servers that will work as is). Note that the router model (3600)
614      is not all that important. The image used represents a router in general.
615
616define hostextinfo {
617        hostgroup_name   routers
618        icon_image       cook/router.png
619        icon_image_alt   Cisco Routers (3600)
620        vrml_image       router.png
621        statusmap_image  cook/router.gd2
622}
623
624      Now add an entry for your switches. Once you are done check your
625      work and restart Nagios. Take a look at the Status Map in the web interface.
626      It should be much nicer, with real icons instead of question marks.
627
628
629PART VII
630Create Service Groups
631-----------------------------------------------------------------------------
632
6331. Create service groups for ssh and http for each set of pcs.
634
635   - The idea here is to create three service groups. Each service group will
636     be for a quarter of the classroom. We want to see these PCs grouped together
637     and include status of their ssh and http services. To do this edit
638     and create the file:
639
640   # joe /etc/nagios3/conf.d/servicegroups.cfg
641
642     Here is a sample of the service group for group 1:
643
644define servicegroup {
645        servicegroup_name       group1-servers
646        alias                   group 1 servers
647        members                 pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,pc3,HTTP,pc4,SSH,pc4
648        }
649
650        - Note that the members line should wrap and not be on two lines.
651       
652        - Note that "SSH" and "HTTP" need to be uppercase as this is how the service_description is
653          written in the file /etc/nagios3/conf.d/services_nagios2.cfg
654         
655        - You should create an entry for other groups of servers too
656
657    - Save your changes, verify your work and restart Nagios. Now if you click on
658      the Servicegroup menu items in the Nagios web interface you should see
659      this information grouped together.
660
661
662
663PART VIII
664Configure Guest Access to the Nagios Web Interface
665-----------------------------------------------------------------------------
666
6671. Edit /etc/nagios3/cgi.cfg to give read-only guest user access to the Nagios
668   web interface.
669
670    - By default Nagios is configured to give full r/w access via the Nagios
671      web interface to the user nagiosadmin. You can change the name of this
672      user, add other users, change how you authenticate users, what users
673      have access to what resources and more via the cgi.cfg file.
674
675    - First, lets create a "guest" user and password in the htpasswd.users
676      file.
677     
678    # htpasswd /etc/nagios3/htpasswd.users guest
679
680      You can use any password you want (or none). A password of "guest" is
681      not a bad choice.
682
683    - Next, edit the file /etc/nagios3/cgi.cfg and look for what type of access
684      has been given to the nagiosadmin user. By default you will see the following
685      directives (note, there are comments between each directive):
686
687      authorized_for_system_information=nagiosadmin
688      authorized_for_configuration_information=nagiosadmin
689      authorized_for_system_commands=nagiosadmin
690      authorized_for_all_services=nagiosadmin
691      authorized_for_all_hosts=nagiosadmin
692      authorized_for_all_service_commands=nagiosadmin
693      authorized_for_all_host_commands=nagiosadmin
694
695      Now let's tell Nagios to allow the "guest" user some access to
696      information via the web interface. You can choose whatever you would
697      like, but what is pretty typical is this:
698
699      authorized_for_system_information=nagiosadmin,guest
700      authorized_for_configuration_information=nagiosadmin,guest
701      authorized_for_system_commands=nagiosadmin
702      authorized_for_all_services=nagiosadmin,guest
703      authorized_for_all_hosts=nagiosadmin,guest
704      authorized_for_all_service_commands=nagiosadmin
705      authorized_for_all_host_commands=nagiosadmin
706
707    - Once you make the changes, save the file cgi.cfg, verify your
708      work and restart Nagios.
709
710    - To see if you can log in as the "guest" user you may need to clear
711      the cookies in your web browser. You will not notice any difference
712      in the web interface. The difference is that a number of items that
713      are available via the web interface (forcing a service/host check,
714      scheduling checks, comments, etc.) will not work for the guest
715      user.
716
717
718OPTIONAL
719--------
720
721* Check that SNMP is running on the classroom NOC
722
723    - First you will need to add in the appropriate service check for SNMP in the file
724      /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
725      are hundreds, if not thousands, of service checks available via the various Nagios
726      sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
727      package that we've installed by looking in the following directory:
728
729    # ls /usr/lib/nagios/plugins
730
731      As you'll see there is already a check_snmp plugin available to us. If you are
732      interested in the options the plugin takes you can execute the plugin from the
733      command line by typing:
734
735    # /usr/lib/nagios/plugins/check_snmp
736    # /usr/lib/nagios/plugins/check_snmp --help
737
738      to see what options are available, etc. You can use the check_snmp plugin and
739      Nagios to create very complex or specific system checks.
740
741    - Now to see all the various service/host checks that have been created using the
742      check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
743      see that there are a lot of preconfigured checks using snmp, including:
744
745      snmp_load
746      snmp_cpustats
747      snmp_procname
748      snmp_disk
749      snmp_mem
750      snmp_swap
751      snmp_procs
752      snmp_users
753      snmp_mem2
754      snmp_swap2
755      snmp_mem3
756      snmp_swap3
757      snmp_disk2
758      snmp_tcpopen
759      snmp_tcpstats
760      snmp_bgpstate
761      check_netapp_uptime
762      check_netapp_cupuload
763      check_netapp_numdisks
764      check_compaq_thermalCondition
765     
766      And, even better, you can create additional service checks quite easily.
767      For the case of verifying that snmpd (the SNMP service on Linux) is running we
768      need to ask SNMP a question. If we don't get an answer, then Nagios can assume
769      that the SNMP service is down on that host. When you use service checks such as
770      check_http, check_ssh and check_telnet this is what they are doing as well.
771
772    - In our case, let's create a new service check and call it "check_system". This
773      service check will connect with the specified host, use the private community
774      string we have defined in class and ask a question of snmp on that ask - in this
775      case we'll ask about the System Description, or the OID "sysDescr.0" -
776
777    - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
778
779    # joe /etc/nagios-plugins/config/snmp.cfg
780
781      At the top (or the bottom, your choice) add the following entry to the file:
782
783# 'check_system' command definition
784define command{
785       command_name    check_system
786       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
787'$ARG1$' -o sysDescr.0
788        }
789     
790      You may wish to copy and paste this vs. trying to type this out.
791
792          Note that "command_line" is a single line. If you copy and paste in joe the line
793          may not wrap properly and you may have to manually add the part:
794         
795                        '$ARG1$' -o sysDescr.0
796                       
797          to the end of the line.
798
799    - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
800      in this service check. We'll run this check against all our servers in the
801      classroom, or the hostgroup "debian-servers"
802
803    - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
804
805    # joe /etc/nagios3/conf.d/services_nagios2.cfg
806
807      At the bottom of the file add the following definition:
808
809# check that snmp is up on all servers
810define service {
811        hostgroup_name                  snmp-servers
812        service_description             SNMP
813        check_command                   check_system!xxxxxx
814        use                             generic-service
815        notification_interval           0 ; set > 0 if you want to be renotified
816}
817
818      The "xxxxxx" is the community string previously (or to be) defined in class.
819     
820      Note that we have included our private community string here vs. hard-coding
821      it in the snmp.cfg file earlier. You must change the "xxxxx" to be the snmp
822      community string given in class or this check will not work.
823     
824    - Now we must create the "snmp-servers" group in our hostgroups_nagios2.cfg file.
825      Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and go to the end of the
826      file. Add in the following hostgroup definition:
827     
828# A list of snmp-enabled devices on which we wish to run the snmp service check
829define hostgroup {
830           hostgroup_name       snmp-servers
831                   alias        snmp servers
832                   members      noc
833          }
834         
835        - Note that for "members" you could, also, add in the switches and routers for
836          group 1 and 2. But, the particular item (MIB) we are checking for "sysDescr.0"
837          may not be available on the switches and/or routers, so the check would then fail.
838
839    - Now verify that your changes are correct and restart Nagios.
840
841    - If you click on the Service Detail menu choice in web interface you should see
842      the SNMP check appear for the noc host.
843     
844    - After we do the SNMP presentation and exercises in class, then you could come
845      back to this exercise and add in all the classroom PCs to the members list in the
846      hostgroups_nagios2.cfg file, snmp-servers hostgroup definition. Remember to list
847      your PC as "localhost".
848
849