Agenda: exercises-nagios.txt

File exercises-nagios.txt, 24.0 KB (added by nocadmin, 9 years ago)

Nagios Install and configuration Exercises

Line 
1Registry Operations Curriculum
2Nagios Installation and Configuration
3
4Notes:
5------
6* Commands preceded with "$" imply that you should execute the command as
7  a general user - not as root.
8* Commands preceded with "#" imply that you should be working as root.
9* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
10  imply that you are executing commands on remote equipment, or within
11  another program.
12
13Exercises
14---------
15
16Exercises Part I
17----------------
18
190. Log in to your PC or open a terminal window as the tladmain user.
20
21
221. You could nstall Nagios version 3. You would do this as root or as the tldadmin
23   user and use the "sudo" command:
24
25    # apt-get install nagios3
26   
27   Nagios version 3 is already installed, but you can still run the command.
28
29
302. Create the Web user password file:
31
32    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
33
34New password:         
35Re-type new password:
36
37   We suggest you use your standard user password used in class.
38
39
402. You should already have a working Nagios!
41
42    - Open a browser, and go to
43
44    http://localhost/nagios3/
45
46    - At the login prompt, login as:
47
48        user: nagiosadmin
49        pass:
50
513. Let's look at the interface together...
52
53    # cd /etc/nagios3/
54
55    # ls -l
56    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
57    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
58    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
59    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
60    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
61    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
62    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
63    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
64   
65    # ls -l conf.d/
66
67    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
68    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
69    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
70    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
71    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
72    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
73    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
74    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
75    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
76
77    Notice that the package does not have renamed filenames for the conf.d
78    directory - they are the same files as used for the Nagios version 2
79    Ubuntu package. There was an update made to the host-gateway configuration
80    file so this has been renamed.
81
82PART II
83Configuring Equipment
84-----------------------------------------------------------------------------
85
860. Order of configuration
87
88Conceptually we will build our configuration files from the "top" of our network down. That
89is we define entries for our gateway router and swith first, then our group routers and switches.
90Once we have these entries we will add an entry for our NOC machine, then pc1, pc2, pc3, etc...
91
92By going in this order you will have defined the devices that act as parents for other devices.
93
94
951. Let's configure Nagios to start monitoring our classroom gateway router:
96
97    # cd /etc/nagios3/conf.d/
98
99    # vi routers.cfg
100   
101define host {
102    use         generic-host
103    host_name   bb-gw
104    alias       aroc border router
105    address     10.10.10.254
106}
107   
108   Now define entries for our two group routers:
109   
110define host {
111    use         generic-host
112    host_name   router1
113    alias       aroc-en router 1 router
114    address     10.10.10.21
115}
116
117define host {
118    use         generic-host
119    host_name   router2
120    alias       aroc-en router 2 router
121    address     10.10.10.22
122}
123
124Save and exit from the file /etc/nagios3/conf.d/routers.cfg
125
126
1272. Configure our classroom switches
128
129Now that we have our routers configured we can configure our switches. Note that each
130switch will have a parent relationship with the router next to them.
131
132        # vi switches.cfg
133
134First the switch on our classroom backbone:
135
136define host {
137    use         generic-host
138    host_name   bb-sw
139    alias       cctld backbone switch
140    address     10.10.10.253
141    parents     bb-gw
142}
143
144Notice the "parents" entry. You can only add the parent entry once you have a definition for
145"bb-gw". We did this in our routers.cfg file, so this will work.
146
147Save and exit from the file switches.cfg
148
149
1503. Update the file routers.cfg with parents
151
152The border router does not have a parent for purposes of our class. In reality it does, but
153you have to stop your monitoring somewhere.
154
155Our two group routers, however, now have a parent defined. This is the backbone switch. We
156need to update our group router entries to look like this:
157
158define host {
159    use         generic-host
160    host_name   router1
161    alias       aroc-en router 1 router
162    address     10.10.10.21
163    parents     bb-sw
164}
165
166define host {
167    use         generic-host
168    host_name   router2
169    alias       aroc-en router 2 router
170    address     10.10.10.22
171    parents     bb-sw
172}
173
174Save and exit from the file routers.cfg
175
176
1774. Create entries for each PC in the classroom
178
179Now that we have our routers and switches defined it is quite easy to create entries for all our
180PCs. Think about the parent relationships. The parent of the NOC is the backbone switch. The
181parent of pc1 through pc9 is the switch for group 1. The parent for pc10-18 is the switch for
182group 2.
183
184Below are three sample entries. One for the NOC, one for pc1 and one for pc10. You should be able
185to use this example to create entries for all 18 classroom pcs plus the NOC:
186
187We could put these entries in to separate files, but as our network is small we'll use a single
188file called pcs.cfg.
189
190NOTE! You do not add in an entry for your PC. This has already been defined in the file
191/etc/nagios3/conf.d/localhost_nagios2.cfg. This definition is what define's the Nagios
192network viewpoint. So, when you come to the spot where you might add an entry for your PC
193you should skip this and go on to the next PC in the list (i.e. from pc1 to pc18).
194
195        # vi pcs.cfg
196       
197# Our classroom NOC
198
199define host {
200    use         generic-host
201    host_name   noc
202    alias       aroc NOC machine
203    address     10.10.10.10
204    parents     bb-sw
205}
206
207# Group 1 PCs
208
209define host {
210    use         generic-host
211    host_name   tld1
212    alias       tld1 aroc-en cctld
213    address     10.10.10.41
214    parents     router1
215}
216
217# Group 2 PCs
218
219define host {
220    use         generic-host
221    host_name   tld14
222    alias       tld14 aroc-en cctld
223    address     10.10.10.54
224    parents     router2
225}
226
227Take the three entries above and now expand this to create the remaining entries for pc1-pc9 and
228for pc10-pc18. If you have any questions about IP addresses, etc. you can review the Network
229Diagram for the class linked off the classroom wiki main page at http://localhost/trac/.
230
231Exit and save the file pcs.cfg
232
233Now let's verify that our initial Nagios configuration is working:
234
235
2365. Verify that your configuration files are OK:
237
238    # nagios3 -v /etc/nagios3/nagios.cfg
239
240    ... You should get :
241
242Total Warnings: 0
243Total Errors:   0
244
245Things look okay - No serious problems were detected during the check.
246
247
2485. Reload/Restart Nagios
249
250    # /etc/init.d/nagios3 restart
251
252Not always 100% reliable to use the "restart" option due to a bug in the Nagios init script.
253To be sure you may want to get used to doing:
254
255    # /etc/init.d/nagios3 stop
256    # /etc/init.d/nagios3 start
257
2586. Go to the web interface (http://localhost/nagios3) and check that the hosts
259   you just added are now visible in the interface. Click on the "Host Detail" item
260   on the left of the Nagios screen to see this.
261
262
263PART III
264Configure Service check for your the classroom NOC
265-----------------------------------------------------------------------------
266
2670. Configuring
268
269Now that we have our hardware configured we can start telling Nagios what services to monitor
270on the configured hardware, how to group the hardware in interesting ways, how to group
271services, etc.
272
2731. Associate a service check for our classroom NOC
274
275    # vi hostgroups_nagios2.cfg
276
277    - Find the hostgroup named "ssh-servers". In the members section of the defintion
278      change the line:
279
280members                 localhost
281
282    to
283
284members                 localhost,noc,tld1,tld2,tld3,tld4
285
286
287
288Exit and save the file.
289
290Verify that your changes are OK:
291
292        # nagios3 -v /etc/nagios3/nagios.cfg
293       
294Restart Nagios to see the new service assocation with your host:
295
296        # /etc/init.d/nagios3 Restart
297
298Click on the "Service Detail" link in the Nagios web interface to see your new entry.
299
300
301PART IV
302Defining Services for all PCs
303-----------------------------------------------------------------------------
304
3051. Determine what services to define for what devices
306
307   - This is core to how you use Nagios and network monitoring tools in
308     general. So far we are simply using ping to verify that physical hosts
309     are up on our network and we have started monitoring a single service on
310     a single host (your PC). The next step is to decide what services you wish
311     to monitor for each host in the classroom.
312
313   - In this particular class we have:
314
315     routers:  running ssh and snmp
316     switches: running telnet and possibly ssh as well as snmp
317     pcs:      All PCs are running ssh and http and should be running snmp
318               The NOC is currently running an snmp daemon
319             
320     So, let's configure Nagios to check for these services for these
321     devices.
322
3232.) Verify that SSH is running on the routers and workshop PCs images
324
325   - In the file services_nagios2.cfg there is already an entry for the SSH
326     service check, so you do not need to create this step. Instead, you
327     simply need to re-define the "ssh-servers" entry in the file
328     /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file
329     looked like:
330
331# A list of your ssh-accessible servers
332define hostgroup {
333        hostgroup_name  ssh-servers
334                alias           SSH servers
335                members         localhost
336        }
337
338     What do you think you should change? Correct, the "members" line. You should
339     add in entries for all the classroom pcs, routers and  the switches that run ssh.
340     With this information and the network diagram you should be able complete this entry.
341     
342     The entry will look something like this:
343
344define hostgroup {
345        hostgroup_name  ssh-servers
346                alias           SSH servers
347                members         localhost,tld1,tld2,tld3,tld14,noc
348        }
349
350         Note: leave in "localhost" - This is your PC and represents Nagios' network point of
351         view. So, for instance, if you are on "pc3" you would not include "pc3" in the list
352         of all the classroom pcs as it is represented by the "localhost" entry.
353         
354         The "members" entry will be a long line and will likely wrap on the screen.
355
356         Remember to include all your PCs.
357
358    - Once you are done, run the pre-flight check:
359
360    # nagios3 -v /etc/nagios3/nagios.cfg
361
362    If everything looks good, then restart Nagios
363
364    # /etc/init.d/nagios3 stop
365    # /etc/init.d/nagios3 start
366
367    and view your changes in the Nagios web interface.
368
3693.) Check that http is running on all the classroom PCs.
370
371    - This is almost identical to the previous exercise. Just make the change to the
372      HTTP service adding in each PC (no routers or switches). Remember, you don't need
373      to add your machine as it is already defined as "localhost".     
374
375PART V
376Create More Host Groups
377-----------------------------------------------------------------------------
378
3791. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg
380
381    - For the following exercises it will be very useful if we have created
382      or update the following hostgroups:
383
384      debian-servers
385      routers
386      switches
387 
388      If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
389      will see an entry for debian-servers that just contains localhost.
390      Update this entry to include all the classroom PCs, including the
391      noc (this assumes that you created a "noc" entry in your pcs.cfg
392      file). Remember to skip your PC entry as it is represented by the
393      localhost entry.
394
395    # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
396
397     Update the entry that says:
398
399
400# A list of your Debian GNU/Linux servers
401define hostgroup {
402        hostgroup_name  debian-servers
403                alias           Debian GNU/Linux Servers
404                members         localhost
405        }
406     
407      So that the "members" parameter contains something like this. Use your
408      classroom network diagram to confirm the exact number of machines and names
409      in your workshop.
410
411                members         localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9
412                                pc10,pc11,pc12,pc13,pc14,pc15,pc16,pc17,pc18
413
414        Be sure that the line wraps and is not on two separate lines. Otherwise
415        you will get an error when you go to restart Nagios.
416
417      - Once you have done this, add in two more entries. One for routers and
418        one for switches. Call these entries "routers" and "switches".
419
420      - When you are done be sure to verify your work and restart Nagios.
421   
422
423PART VI
424Extended Host Information ("making your graphs pretty")
425-----------------------------------------------------------------------------
426
4271. Update extinfo_nagios2.cfg
428
429    - If you would like to use appropriate icons for your defined hosts in
430      Nagios this is where you do this. We have the three types of devices:
431
432      Cisco routers
433      Cisco switches
434      Ubuntu servers
435
436      There is a fairly large repository of icon images available for you to
437      use located here:
438
439      /usr/share/nagios/htdocs/images/logos/
440
441      these were installed by default as dependent packages of the nagios3
442      package in Ubuntu. In some cases you can find model-specific icons for
443      your hardware, but to make things simpler we will use the following
444      icons for our hardware:
445
446      /usr/share/nagios/htodcs/images/logos/base/debian.*
447      /usr/share/nagios/htdocs/images/logos/cook/router.*
448      /usr/share/nagios/htdocs/images/logos/cook/switch.*
449
450    - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
451      and tell nagios what image you would like to use to represent your devices.
452
453    # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg
454
455      Here is what an entry for your routers looks like (there is already an entry
456      for debian-servers that will work as is). Note that the router model (3600)
457      is not all that important. The image used represents a router in general.
458
459define hostextinfo {
460        hostgroup_name   routers
461        icon_image       cook/router.png
462        icon_image_alt   Cisco Routers (3600)
463        vrml_image       router.png
464        statusmap_image  cook/router.gd2
465}
466
467      Now add an entry for your switches. Once you are done check your
468      work and restart Nagios. Take a look at the Status Map in the web interface.
469      It should be much nicer.     
470
471PART VII
472Create Service Groups
473-----------------------------------------------------------------------------
474
4751. Create service groups for ssh and http for each set of pcs.
476
477   - The idea here is to create three service groups. Each service group will
478     be for the group of PCs that are connected to each router xxxxxxx,
479     yyyyyy, zzzzzz, etc. We want to see these PCs grouped together
480     and include status of their ssh and http services. To do this edit
481     and create the file:
482
483   # vi /etc/nagios3/conf.d/servicegroups.cfg
484
485     Here is a sample of the service group for group 1:
486
487define servicegroup {
488        servicegroup_name       group1-servers
489        alias                   group 1 servers
490        members                 pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,
491                        pc5,HTTP,pc6,SSH,pc6,HTTP,pc7,SSH,pc7,HTTP,pc8,SSH,pc8,HTTP,pc9,SSH,pc9,HTTP
492        }
493
494        - Note that the members line should wrap and not be on two lines.
495       
496        - Note that "SSH" and "HTTP" need to be uppercase as this is how the service_description is
497          written in the file /etc/nagios3/conf.d/services_nagios2.cfg
498         
499        - You should create an entry for the group 2 servers as well.
500
501    - Save your changes, verify your work and restart Nagios. Now if you click on
502      the Servicegroup menu items in the Nagios web interface you should see
503      this information grouped together.
504
505    - Be sure you to this for TLD1 through TLD8 to create a servicegroup of SSH
506      and HTTP servers for all 8 TLDs in the classroom.
507
508
509PART VIII
510Configure Guest Access to the Nagios Web Interface
511-----------------------------------------------------------------------------
512
5131. Edit /etc/nagios3/cgi.cfg to give read only guest user access to the Nagios
514   web interface.
515
516    - By default Nagios is configured to give full r/w access via the Nagios
517      web interface to the user nagiosadmin. You can change the name of this
518      user, add other users, change how you authenticate users, what users
519      have access to what resources and more via the cgi.cfg file.
520
521    - First, lets create a "guest" user and password in the htpasswd.users
522      file.
523     
524    # cd /etc/nagios3
525    # htpasswd /etc/nagios3/htpasswd.users guest
526
527      You can use any password you want (or none). A password of "guest" is
528      not a bad choice.
529
530    - Next, edit the file /etc/nagios3/cgi.cfg and look for what type of access
531      has been given to the nagiosadmin user. By default you will see the following
532      directives (note, there are comments between each directive):
533
534      authorized_for_system_information=nagiosadmin
535      authorized_for_configuration_information=nagiosadmin
536      authorized_for_system_commands=nagiosadmin
537      authorized_for_all_services=nagiosadmin
538      authorized_for_all_hosts=nagiosadmin
539      authorized_for_all_service_commands=nagiosadmin
540      authorized_for_all_host_commands=nagiosadmin
541
542      Now lets tell Nagios to allow the "guest" user some access to
543      information via the web interface. You can choose whatever you would
544      like, but what is pretty typical is this:
545
546      authorized_for_system_information=nagiosadmin,guest
547      authorized_for_configuration_information=nagiosadmin,guest
548      authorized_for_system_commands=nagiosadmin
549      authorized_for_all_services=nagiosadmin,guest
550      authorized_for_all_hosts=nagiosadmin,guest
551      authorized_for_all_service_commands=nagiosadmin
552      authorized_for_all_host_commands=nagiosadmin
553
554    - Once you make the changes, save the file cgi.cfg, verify your
555      work and restart Nagios.
556
557    - To see if you can log in as the "guest" user you may need to clear
558      the cookies in your web browser. You will not notice any difference
559      in the web interface. The difference is that a number of items that
560      are available via the web interface (forcing a service/host check,
561      scheduling checks, comments, etc.) will not work for the guest
562      user.
563
564
565OPTIONAL
566--------
567
5685.) Check that SNMP is running on the classroom NOC
569
570    - First you will need to add in the appropriate service check for SNMP in the file
571      /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
572      are hundreds, if not thousands, of service checks available via the various Nagios
573      sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
574      package that we've installed by looking in the following directory:
575
576    # ls /usr/lib/nagios/plugins
577
578      As you'll see there is already a check_snmp plugin available to us. If you are
579      interested in the options the plugin takes you can execute the plugin from the
580      command line by typing:
581
582    # /usr/lib/nagios/plugins/check_snmp
583
584      to see what options are available, etc. You can use the check_snmp plugin and
585      Nagios to create very complex or specific system checks.
586
587    - Now to see all the various service/host checks that have been created using the
588      check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
589      see that there are a lot of preconfigured checks using snmp, including:
590
591      snmp_load
592      snmp_cpustats
593      snmp_procname
594      snmp_disk
595      snmp_mem
596      snmp_swap
597      snmp_procs
598      snmp_users
599      snmp_mem2
600      snmp_swap2
601      snmp_mem3
602      snmp_swap3
603      snmp_disk2
604      snmp_tcpopen
605      snmp_tcpstats
606      snmp_bgpstate
607      check_netapp_uptime
608      check_netapp_cupuload
609      check_netapp_numdisks
610      check_compaq_thermalCondition
611     
612      And, even better, you can create additional service checks quite easily.
613      For the case of verifying that snmpd (the SNMP service on Linux) is running we
614      need to ask SNMP a question. If we don't get an answer, then Nagios can assume
615      that the SNMP service is down on that host. When you use service checks such as
616      check_http, check_ssh and check_telnet this is what they are doing as well.
617
618    - In our case, let's create a new service check and call it "check_system". This
619      service check will connect with the specified host, use the private community
620      string we have defined in class and ask a question of snmp on that ask - in this
621      case we'll ask about the System Description, or the OID "sysDescr.0" -
622
623    - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
624
625    # vi /etc/nagios-plugins/config/snmp.cfg
626
627      At the top (or the bottom, your choice) add the following entry to the file:
628
629# 'check_system' command definition
630define command{
631       command_name    check_system
632       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
633'$ARG1$' -o sysDescr.0
634        }
635     
636      You may wish to copy and past this vs. trying to type this out.
637
638          Note that "command_line" is a single line. If you copy and paste in vi the line
639          may not wrap properly and you may have to manually add the part:
640         
641                        '$ARG1$' -o sysDescr.0
642                       
643          to the end of the line.
644
645    - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
646      in this service check. We'll run this check against all our servers in the
647      classroom, or the hostgroup "debian-servers"
648
649    - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
650
651    # vi /etc/nagios3/conf.d/services_nagios2.cfg
652
653      At the bottom of the file add the following definition:
654
655# check that snmp is up on all servers
656define service {
657        hostgroup_name                  snmp-servers
658        service_description             SNMP
659        check_command                   check_system!xxxxxx
660        use                             generic-service
661        notification_interval           0 ; set > 0 if you want to be renotified
662}
663
664      The "xxxxxx" is the community string previously (or to be) defined in class.
665     
666      Note that we have included our private community string here vs. hard-coding
667      it in the snmp.cfg file earlier. You must change the "xxxxx" to be the snmp
668      community string given in class or this check will not work.
669     
670    - Now we must create the "snmp-servers" group in our hostgroups_nagios2.cfg file.
671      Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and go to the end of the
672      file. Add in the following hostgroup definition:
673     
674# A list of snmp-enabled devices on which we wish to run the snmp service check
675define hostgroup {
676           hostgroup_name       snmp-servers
677                   alias        snmp servers
678                   members       noc,tld1,tld2,etc
679          }
680         
681        - Note that for "members" you could, also, add in the switches and routers for
682          group 1 and 2. But, the particular item (MIB) we are checking for "sysDescr.0"
683          may not be available on the switches and/or routers, so the check would then fail.
684
685    - Now verify that your changes are correct and restart Nagios.
686
687    - If you click on the Service Detail menu choice in web interface you should see
688      the SNMP check appear.
689     
690    - After we do the SNMP presentation and exercises in class, then you could come
691      back to this exercise and add in all the classroom PCs to the members list in the
692      hostgroups_nagios2.cfg file, snmp-servers hostgroup definition. Remember to list
693      your PC as "localhost".
694
695
696
697
698Last update 30 September, 2010 by MM