Agenda: exercises-nagios.txt

File exercises-nagios.txt, 24.3 KB (added by admin, 9 years ago)
Line 
1Registry Operations Curriculum
2Nagios Installation and Configuration
3
4Notes:
5------
6* Commands preceded with "$" imply that you should execute the command as
7  a general user - not as root.
8* Commands preceded with "#" imply that you should be working as root.
9* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
10  imply that you are executing commands on remote equipment, or within
11  another program.
12
13Exercises
14---------
15
16Exercises Part I
17----------------
18
190. Log in to your PC or open a terminal window as the tladmain user.
20
21
221. You could nstall Nagios version 3. You would do this as root or as the tldadmin
23   user and use the "sudo" command:
24
25    # apt-get install nagios3
26   
27   Nagios version 3 is already installed, but you can still run the command.
28
29
302. Create the Web user password file:
31
32    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
33
34New password:         
35Re-type new password:
36
37   We suggest you use your standard user password used in class.
38
39
402. You should already have a working Nagios!
41
42    - Open a browser, and go to
43
44    http://localhost/nagios3/
45
46    - At the login prompt, login as:
47
48        user: nagiosadmin
49        pass:
50
513. Let's look at the interface together...
52
53    # cd /etc/nagios3/
54
55    # ls -l
56    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
57    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
58    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
59    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
60    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
61    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
62    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
63    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
64   
65    # ls -l conf.d/
66
67    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
68    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
69    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
70    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
71    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
72    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
73    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
74    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
75    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
76
77    Notice that the package does not have renamed filenames for the conf.d
78    directory - they are the same files as used for the Nagios version 2
79    Ubuntu package. There was an update made to the host-gateway configuration
80    file so this has been renamed.
81
82PART II
83Configuring Equipment
84-----------------------------------------------------------------------------
85
860. Order of configuration
87
88Conceptually we will build our configuration files from the "top" of our network down. That
89is we define entries for our gateway router and swith first, then our group routers and switches.
90Once we have these entries we will add an entry for our NOC machine, then pc1, pc2, pc3, etc...
91
92By going in this order you will have defined the devices that act as parents for other devices.
93
94
951. Let's configure Nagios to start monitoring our classroom gateway router:
96
97    # cd /etc/nagios3/conf.d/
98
99    # vi routers.cfg
100   
101define host {
102    use         generic-host
103    host_name   bb-gw
104    alias       cctld border router
105    address     192.168.17.2
106}
107   
108   Now define entries for our two group routers:
109   
110define host {
111    use         generic-host
112    host_name   pc1-pc9-gw
113    alias       cctld group 1 router
114    address     192.168.5.129
115}
116
117define host {
118    use         generic-host
119    host_name   pc10-pc18-gw
120    alias       cctld group 2 router
121    address     192.168.5.161
122}
123
124Save and exit from the file /etc/nagios3/conf.d/routers.cfg
125
126
1272. Configure our classroom switches
128
129Now that we have our routers configured we can configure our switches. Note that each
130switch will have a parent relationship with the router next to them.
131
132        # vi switches.cfg
133
134First the switch on our classroom backbone:
135
136define host {
137    use         generic-host
138    host_name   bb-sw
139    alias       cctld backbone switch
140    address     192.168.17.4
141    parents     bb-gw
142}
143
144Notice the "parents" entry. You can only add the parent entry once you have a definition for
145"bb-gw". We did this in our routers.cfg file, so this will work.
146
147Now add in the switches for the two groups:
148
149define host {
150    use         generic-host
151    host_name   pc1-pc9-sw
152    alias       cctld group 1 switch
153    address     192.168.5.130
154    parents     pc1-pc9-gw
155}
156
157define host {
158    use         generic-host
159    host_name   pc10-pc18-sw
160    alias       cctld group 2 switch
161    address     192.168.5.162
162    parents     pc10-pc18-gw
163}
164
165Save and exit from the file switches.cfg
166
167
1683. Update the file routers.cfg with parents
169
170The border router does not have a parent for purposes of our class. In reality it does, but
171you have to stop your monitoring somewhere.
172
173Our two group routers, however, now have a parent defined. This is the backbone switch. We
174need to update our group router entries to look like this:
175
176define host {
177    use         generic-host
178    host_name   pc1-pc9-gw
179    alias       cctld group 1 router
180    address     192.168.5.129
181    parents     bb-sw
182}
183
184define host {
185    use         generic-host
186    host_name   pc10-pc18-gw
187    alias       cctld group 2 router
188    address     192.168.5.161
189    parents     bb-sw
190}
191
192Save and exit from the file routers.cfg
193
194
1954. Create entries for each PC in the classroom
196
197Now that we have our routers and switches defined it is quite easy to create entries for all our
198PCs. Think about the parent relationships. The parent of the NOC is the backbone switch. The
199parent of pc1 through pc9 is the switch for group 1. The parent for pc10-18 is the switch for
200group 2.
201
202Below are three sample entries. One for the NOC, one for pc1 and one for pc10. You should be able
203to use this example to create entries for all 18 classroom pcs plus the NOC:
204
205We could put these entries in to separate files, but as our network is small we'll use a single
206file called pcs.cfg.
207
208NOTE! You do not add in an entry for your PC. This has already been defined in the file
209/etc/nagios3/conf.d/localhost_nagios2.cfg. This definition is what define's the Nagios
210network viewpoint. So, when you come to the spot where you might add an entry for your PC
211you should skip this and go on to the next PC in the list (i.e. from pc1 to pc18).
212
213        # vi pcs.cfg
214       
215# Our classroom NOC
216
217define host {
218    use         generic-host
219    host_name   noc
220    alias       aroc cctld NOC machine
221    address     192.168.17.5
222    parents     bb-sw
223}
224
225# Group 1 PCs
226
227define host {
228    use         generic-host
229    host_name   pc1
230    alias       pc1 group 1 aroc cctld
231    address     192.168.5.131
232    parents     pc1-pc9-sw
233}
234
235# Group 2 PCs
236
237define host {
238    use         generic-host
239    host_name   pc10
240    alias       pc10 group 2 aroc cctld
241    address     192.168.5.170
242    parents     pc10-pc18-sw
243}
244
245Take the three entries above and now expand this to create the remaining entries for pc1-pc9 and
246for pc10-pc18. If you have any questions about IP addresses, etc. you can review the Network
247Diagram for the class linked off the classroom wiki main page at http://localhost/trac/.
248
249Exit and save the file pcs.cfg
250
251Now let's verify that our initial Nagios configuration is working:
252
253
2545. Verify that your configuration files are OK:
255
256    # nagios3 -v /etc/nagios3/nagios.cfg
257
258    ... You should get :
259
260Total Warnings: 0
261Total Errors:   0
262
263Things look okay - No serious problems were detected during the check.
264
265
2665. Reload/Restart Nagios
267
268    # /etc/init.d/nagios3 restart
269
270Not always 100% reliable to use the "restart" option due to a bug in the Nagios init script.
271To be sure you may want to get used to doing:
272
273    # /etc/init.d/nagios3 stop
274    # /etc/init.d/nagios3 start
275
2766. Go to the web interface (http://localhost/nagios3) and check that the hosts
277   you just added are now visible in the interface. Click on the "Host Detail" item
278   on the left of the Nagios screen to see this.
279
280
281PART III
282Configure Service check for your the classroom NOC
283-----------------------------------------------------------------------------
284
2850. Configuring
286
287Now that we have our hardware configured we can start telling Nagios what services to monitor
288on the configured hardware, how to group the hardware in interesting ways, how to group
289services, etc.
290
2911. Associate a service check for our classroom NOC
292
293    # vi hostgroups_nagios2.cfg
294
295    - Find the hostgroup named "ssh-servers". In the members section of the defintion
296      change the line:
297
298members                 localhost
299
300    to
301
302members                 localhost,noc
303
304Exit and save the file.
305
306Verify that your changes are OK:
307
308        # nagios3 -v /etc/nagios3/nagios.cfg
309       
310Restart Nagios to see the new service assocation with your host:
311
312        # /etc/init.d/nagios3 Restart
313
314Click on the "Service Detail" link in the Nagios web interface to see your new entry.
315
316
317PART IV
318Defining Services for all PCs
319-----------------------------------------------------------------------------
320
3211. Determine what services to define for what devices
322
323   - This is core to how you use Nagios and network monitoring tools in
324     general. So far we are simply using ping to verify that physical hosts
325     are up on our network and we have started monitoring a single service on
326     a single host (your PC). The next step is to decide what services you wish
327     to monitor for each host in the classroom.
328
329   - In this particular class we have:
330
331     routers:  running ssh and snmp
332     switches: running telnet and possibly ssh as well as snmp
333     pcs:      All PCs are running ssh and http and should be running snmp
334               The NOC is currently running an snmp daemon
335             
336     So, let's configure Nagios to check for these services for these
337     devices.
338
3392.) Verify that SSH is running on the routers and workshop PCs images
340
341   - In the file services_nagios2.cfg there is already an entry for the SSH
342     service check, so you do not need to create this step. Instead, you
343     simply need to re-define the "ssh-servers" entry in the file
344     /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file
345     looked like:
346
347# A list of your ssh-accessible servers
348define hostgroup {
349        hostgroup_name  ssh-servers
350                alias           SSH servers
351                members         localhost
352        }
353
354     What do you think you should change? Correct, the "members" line. You should
355     add in entries for all the classroom pcs, routers and  the switches that run ssh.
356     With this information and the network diagram you should be able complete this entry.
357     
358     The entry will look something like this:
359
360define hostgroup {
361        hostgroup_name  ssh-servers
362                alias           SSH servers
363                members         localhost,pc1,pc2,pc3,pc4....,bb-rtr,group1-rtr,goup2-rtr
364        }
365
366         Note: leave in "localhost" - This is your PC and represents Nagios' network point of
367         view. So, for instance, if you are on "pc3" you would not include "pc3" in the list
368         of all the classroom pcs as it is represented by the "localhost" entry.
369         
370         The "members" entry will be a long line and will likely wrap on the screen.
371
372         Remember to include all your PCs.
373
374    - Once you are done, run the pre-flight check:
375
376    # nagios3 -v /etc/nagios3/nagios.cfg
377
378    If everything looks good, then restart Nagios
379
380    # /etc/init.d/nagios3 stop
381    # /etc/init.d/nagios3 start
382
383    and view your changes in the Nagios web interface.
384
3853.) Check that http is running on all the classroom PCs.
386
387    - This is almost identical to the previous exercise. Just make the change to the
388      HTTP service adding in each PC (no routers or switches). Remember, you don't need
389      to add your machine as it is already defined as "localhost".     
390
391PART V
392Create More Host Groups
393-----------------------------------------------------------------------------
394
3951. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg
396
397    - For the following exercises it will be very useful if we have created
398      or update the following hostgroups:
399
400      debian-servers
401      routers
402      switches
403 
404      If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
405      will see an entry for debian-servers that just contains localhost.
406      Update this entry to include all the classroom PCs, including the
407      noc (this assumes that you created a "noc" entry in your pcs.cfg
408      file). Remember to skip your PC entry as it is represented by the
409      localhost entry.
410
411    # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
412
413     Update the entry that says:
414
415
416# A list of your Debian GNU/Linux servers
417define hostgroup {
418        hostgroup_name  debian-servers
419                alias           Debian GNU/Linux Servers
420                members         localhost
421        }
422     
423      So that the "members" parameter contains something like this. Use your
424      classroom network diagram to confirm the exact number of machines and names
425      in your workshop.
426
427                members         localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9
428                                pc10,pc11,pc12,pc13,pc14,pc15,pc16,pc17,pc18
429
430        Be sure that the line wraps and is not on two separate lines. Otherwise
431        you will get an error when you go to restart Nagios.
432
433      - Once you have done this, add in two more entries. One for routers and
434        one for switches. Call these entries "routers" and "switches".
435
436      - When you are done be sure to verify your work and restart Nagios.
437   
438
439PART VI
440Extended Host Information ("making your graphs pretty")
441-----------------------------------------------------------------------------
442
4431. Update extinfo_nagios2.cfg
444
445    - If you would like to use appropriate icons for your defined hosts in
446      Nagios this is where you do this. We have the three types of devices:
447
448      Cisco routers
449      Cisco switches
450      Ubuntu servers
451
452      There is a fairly large repository of icon images available for you to
453      use located here:
454
455      /usr/share/nagios/htdocs/images/logos/
456
457      these were installed by default as dependent packages of the nagios3
458      package in Ubuntu. In some cases you can find model-specific icons for
459      your hardware, but to make things simpler we will use the following
460      icons for our hardware:
461
462      /usr/share/nagios/htodcs/images/logos/base/debian.*
463      /usr/share/nagios/htdocs/images/logos/cook/router.*
464      /usr/share/nagios/htdocs/images/logos/cook/switch.*
465
466    - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
467      and tell nagios what image you would like to use to represent your devices.
468
469    # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg
470
471      Here is what an entry for your routers looks like (there is already an entry
472      for debian-servers that will work as is). Note that the router model (3600)
473      is not all that important. The image used represents a router in general.
474
475define hostextinfo {
476        hostgroup_name   routers
477        icon_image       cook/router.png
478        icon_image_alt   Cisco Routers (3600)
479        vrml_image       router.png
480        statusmap_image  cook/router.gd2
481}
482
483      Now add an entry for your switches. Once you are done check your
484      work and restart Nagios. Take a look at the Status Map in the web interface.
485      It should be much nicer.     
486
487PART VII
488Create Service Groups
489-----------------------------------------------------------------------------
490
4911. Create service groups for ssh and http for each set of pcs.
492
493   - The idea here is to create three service groups. Each service group will
494     be for the group of PCs that are connected to each router xxxxxxx,
495     yyyyyy, zzzzzz, etc. We want to see these PCs grouped together
496     and include status of their ssh and http services. To do this edit
497     and create the file:
498
499   # vi /etc/nagios3/conf.d/servicegroups.cfg
500
501     Here is a sample of the service group for group 1:
502
503define servicegroup {
504        servicegroup_name       group1-servers
505        alias                   group 1 servers
506        members                 pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,
507                        pc5,HTTP,pc6,SSH,pc6,HTTP,pc7,SSH,pc7,HTTP,pc8,SSH,pc8,HTTP,pc9,SSH,pc9,HTTP
508        }
509
510        - Note that the members line should wrap and not be on two lines.
511       
512        - Note that "SSH" and "HTTP" need to be uppercase as this is how the service_description is
513          written in the file /etc/nagios3/conf.d/services_nagios2.cfg
514         
515        - You should create an entry for the group 2 servers as well.
516
517    - Save your changes, verify your work and restart Nagios. Now if you click on
518      the Servicegroup menu items in the Nagios web interface you should see
519      this information grouped together.
520
521    - Be sure you to this for TLD1 through TLD8 to create a servicegroup of SSH
522      and HTTP servers for all 8 TLDs in the classroom.
523
524
525PART VIII
526Configure Guest Access to the Nagios Web Interface
527-----------------------------------------------------------------------------
528
5291. Edit /etc/nagios3/cgi.cfg to give read only guest user access to the Nagios
530   web interface.
531
532    - By default Nagios is configured to give full r/w access via the Nagios
533      web interface to the user nagiosadmin. You can change the name of this
534      user, add other users, change how you authenticate users, what users
535      have access to what resources and more via the cgi.cfg file.
536
537    - First, lets create a "guest" user and password in the htpasswd.users
538      file.
539     
540    # cd /etc/nagios3
541    # htpasswd /etc/nagios3/htpasswd.users guest
542
543      You can use any password you want (or none). A password of "guest" is
544      not a bad choice.
545
546    - Next, edit the file /etc/nagios3/cgi.cfg and look for what type of access
547      has been given to the nagiosadmin user. By default you will see the following
548      directives (note, there are comments between each directive):
549
550      authorized_for_system_information=nagiosadmin
551      authorized_for_configuration_information=nagiosadmin
552      authorized_for_system_commands=nagiosadmin
553      authorized_for_all_services=nagiosadmin
554      authorized_for_all_hosts=nagiosadmin
555      authorized_for_all_service_commands=nagiosadmin
556      authorized_for_all_host_commands=nagiosadmin
557
558      Now lets tell Nagios to allow the "guest" user some access to
559      information via the web interface. You can choose whatever you would
560      like, but what is pretty typical is this:
561
562      authorized_for_system_information=nagiosadmin,guest
563      authorized_for_configuration_information=nagiosadmin,guest
564      authorized_for_system_commands=nagiosadmin
565      authorized_for_all_services=nagiosadmin,guest
566      authorized_for_all_hosts=nagiosadmin,guest
567      authorized_for_all_service_commands=nagiosadmin
568      authorized_for_all_host_commands=nagiosadmin
569
570    - Once you make the changes, save the file cgi.cfg, verify your
571      work and restart Nagios.
572
573    - To see if you can log in as the "guest" user you may need to clear
574      the cookies in your web browser. You will not notice any difference
575      in the web interface. The difference is that a number of items that
576      are available via the web interface (forcing a service/host check,
577      scheduling checks, comments, etc.) will not work for the guest
578      user.
579
580
581OPTIONAL
582--------
583
5845.) Check that SNMP is running on the classroom NOC
585
586    - First you will need to add in the appropriate service check for SNMP in the file
587      /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
588      are hundreds, if not thousands, of service checks available via the various Nagios
589      sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
590      package that we've installed by looking in the following directory:
591
592    # ls /usr/lib/nagios/plugins
593
594      As you'll see there is already a check_snmp plugin available to us. If you are
595      interested in the options the plugin takes you can execute the plugin from the
596      command line by typing:
597
598    # /usr/lib/nagios/plugins/check_snmp
599
600      to see what options are available, etc. You can use the check_snmp plugin and
601      Nagios to create very complex or specific system checks.
602
603    - Now to see all the various service/host checks that have been created using the
604      check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
605      see that there are a lot of preconfigured checks using snmp, including:
606
607      snmp_load
608      snmp_cpustats
609      snmp_procname
610      snmp_disk
611      snmp_mem
612      snmp_swap
613      snmp_procs
614      snmp_users
615      snmp_mem2
616      snmp_swap2
617      snmp_mem3
618      snmp_swap3
619      snmp_disk2
620      snmp_tcpopen
621      snmp_tcpstats
622      snmp_bgpstate
623      check_netapp_uptime
624      check_netapp_cupuload
625      check_netapp_numdisks
626      check_compaq_thermalCondition
627     
628      And, even better, you can create additional service checks quite easily.
629      For the case of verifying that snmpd (the SNMP service on Linux) is running we
630      need to ask SNMP a question. If we don't get an answer, then Nagios can assume
631      that the SNMP service is down on that host. When you use service checks such as
632      check_http, check_ssh and check_telnet this is what they are doing as well.
633
634    - In our case, let's create a new service check and call it "check_system". This
635      service check will connect with the specified host, use the private community
636      string we have defined in class and ask a question of snmp on that ask - in this
637      case we'll ask about the System Description, or the OID "sysDescr.0" -
638
639    - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
640
641    # vi /etc/nagios-plugins/config/snmp.cfg
642
643      At the top (or the bottom, your choice) add the following entry to the file:
644
645# 'check_system' command definition
646define command{
647       command_name    check_system
648       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
649'$ARG1$' -o sysDescr.0
650        }
651     
652      You may wish to copy and past this vs. trying to type this out.
653
654          Note that "command_line" is a single line. If you copy and paste in vi the line
655          may not wrap properly and you may have to manually add the part:
656         
657                        '$ARG1$' -o sysDescr.0
658                       
659          to the end of the line.
660
661    - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
662      in this service check. We'll run this check against all our servers in the
663      classroom, or the hostgroup "debian-servers"
664
665    - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
666
667    # vi /etc/nagios3/conf.d/services_nagios2.cfg
668
669      At the bottom of the file add the following definition:
670
671# check that snmp is up on all servers
672define service {
673        hostgroup_name                  snmp-servers
674        service_description             SNMP
675        check_command                   check_system!xxxxxx
676        use                             generic-service
677        notification_interval           0 ; set > 0 if you want to be renotified
678}
679
680      The "xxxxxx" is the community string previously (or to be) defined in class.
681     
682      Note that we have included our private community string here vs. hard-coding
683      it in the snmp.cfg file earlier. You must change the "xxxxx" to be the snmp
684      community string given in class or this check will not work.
685     
686    - Now we must create the "snmp-servers" group in our hostgroups_nagios2.cfg file.
687      Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and go to the end of the
688      file. Add in the following hostgroup definition:
689     
690# A list of snmp-enabled devices on which we wish to run the snmp service check
691define hostgroup {
692           hostgroup_name       snmp-servers
693                   alias        snmp servers
694                   members      noc
695          }
696         
697        - Note that for "members" you could, also, add in the switches and routers for
698          group 1 and 2. But, the particular item (MIB) we are checking for "sysDescr.0"
699          may not be available on the switches and/or routers, so the check would then fail.
700
701    - Now verify that your changes are correct and restart Nagios.
702
703    - If you click on the Service Detail menu choice in web interface you should see
704      the SNMP check appear.
705     
706    - After we do the SNMP presentation and exercises in class, then you could come
707      back to this exercise and add in all the classroom PCs to the members list in the
708      hostgroups_nagios2.cfg file, snmp-servers hostgroup definition. Remember to list
709      your PC as "localhost".
710
711
712Last update 25 September, 2010 by HA