Agenda: exercises-nagios.txt

File exercises-nagios.txt, 14.9 KB (added by admin, 8 years ago)
Line 
1
2Nagios Installation and Configuration
3
4Notes:
5------
6* Commands preceded with "$" imply that you should execute the command as
7  a general user - not as root.
8* Commands preceded with "#" imply that you should be working as root.
9* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
10  imply that you are executing commands on remote equipment, or within
11  another program.
12
13Exercises
14---------
15
16Exercises Part I
17----------------
18
190. Log in to your PC or open a terminal window as the sysadm user.
20
211. You may need to install Nagios version 3. You would do this as root or as the sysadmin
22   user and use the "sudo" command. As sysadm:
23
24   $ sudo apt-get install nagios3
25
26   Unless you already have an MTA installed, nagios3 will install
27   postfix as a dependency. Select "Internet Site" option. (If you had wanted
28   to use a different MTA likely you'd install it before nagios3)
29
30   You will be prompted for nagiosadmin password. Give it the normal
31   workshop password.
32
33   To get the documentation in /usr/share/doc/nagios3-doc/html/ (which
34   can also be read via the nagios web interface), do:
35
36    $ sudo apt-get install nagios3-doc
37
38
392. Look at the file which contains the password. It's hashed (encrypted)
40
41    $ cat /etc/nagios3/htpasswd.users
42
43
443. You should already have a working Nagios!
45
46    - Open a browser, and go to your machine like this:
47
48    http://pcN.ws.nsrc.org/nagios3/
49
50    - At the login prompt, login as:
51
52        user: nagiosadmin
53        pass: <CLASS PASSWORD>
54
55    Browse to the "Host Detail" page to see what's already configured.
56
57
584. Let's look at the configuration layout... But, first, let's become the root
59   user on your machine:
60
61    $ sudo bash
62
63    # cd /etc/nagios3
64    # ls -l
65
66    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
67    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
68    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
69    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
70    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
71    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
72    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
73    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
74
75    # cd conf.d
76    # ls -l   
77
78    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
79    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
80    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
81    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
82    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
83    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
84    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
85    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
86    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
87
88    Notice that the package installs files with "nagios2" in their name.
89    This is because they are the same files as were used for the Nagios
90    version 2 Debian package. However there was a change made to the
91    host-gateway configuration file, so this has a new name.
92
93
945. You have a config which is already monitoring your own system
95(localhost_nagios2.cfg) and your upstream default gateway
96(host-gateway_nagios3.cfg).
97
98Have a look at the config file for the default gateway: it's very simple.
99(Note: tab completion is useful here. Type cat host-g then hit tab; the
100filename will be filled in for you)
101
102    # cat host-gateway_nagios3.cfg
103
104    # a host definition for the gateway of the default route
105    define host {
106            host_name   gateway
107            alias       Default Gateway
108            address     10.10.0.254
109            use         generic-host
110            }
111
112
113
114PART II
115Configuring Equipment
116-----------------------------------------------------------------------------
117
1180. Order of configuration
119
120Conceptually we will build our configuration files from the "nearest" device
121then the further away ones.
122
123By going in this order you will have defined the devices that act as parents
124for other devices.
125
126Remember to refer to the Network Diagram for our classroom if you get confused.
127
128We have the following instances:
129
130rtr     (the gateway router: 10.10.0.254)
131sw      (the gateway switch: 10.10.0.253, parent: rtr)
132rtr1    (group 1 router: 10.10.0.201, parent: sw)
133rtr2    (group 2 router: 10.10.0.202, parent: sw)
134rtr3    (group 3 router: 10.10.0.203, parent: sw)
135rtr4    (group 4 router: 10.10.0.204, parent: sw)
136rtr5    (group 5 router: 10.10.0.205, parent: sw)
137
138pc1     (10.10.0.1, parent: sw)
139pc2     (10.10.0.2, parent: sw)
140...
141pc29 (10.10.0.29, parent: sw)
142pc30 (10.10.0.30, parent: sw)
143
144s1      (10.10.0.241, parent: sw)
145s2      (10.10.0.242, parent: sw)
146noc     (10.10.0.250, parent: sw)
147ap1     (10.10.0.251, parent: sw)       
148ap2     (10.10.0.252, parent: sw)
149
150We recommend grouping these items in the files:
151
152routers.cfg             (rtr, rtr1...rtr5)
153switches.cfg            (sw)
154pcs.cfg                 (pc1...pc30, s1, s2, noc, ap1, ap2)
155
156
1571. First we need to tell Nagios to monitor the gateway router for
158   our classroom which is 10.10.0.254:
159
160   # cd /etc/nagios3/conf.d/
161
162Create the routers gateway like this:
163
164   # editor routers.cfg
165
166define host {
167    use         generic-host
168    host_name   rtr
169    alias       Gateway Router
170    address     10.10.0.254
171}
172
173In the same file create the 5 entries for the group routers:
174
175define host {
176    use         generic-host
177    host_name   rtrX
178    alias       Group 1 Router
179    address     10.10.0.20X
180    parents     sw
181}
182
183... and replace 'X' in the definition above with the router number (1 - 5)
184
185repeate this for rtr2, rtr3, rtr4 and rtr5
186
187Note that the entry for "sw" our gateway switch has not yet been created. That is
188next.
189
190Exit and save this file.
191
192
1932. Create a file called switches.cfg and add an entry for this item:
194
195   # editor switches.cfg
196
197define host {
198    use         generic-host
199    host_name   sw
200    alias       Backbone Switch
201    address     10.10.0.253
202    parents     rtr
203}
204
205At this point Nagios is configured to monitor whether our core hosts (the parents)
206are up on our classroom network. Your next steps are to add in the individual hosts
207such as the classroom virtual PC images on your table (for example for group 1,
208pc1 - 6, for group 2, pc7 - 12, etc.), the Wireless Access Points (ap1 and ap2),
209the servers s1, s2 and the noc:
210
211Be sure you add in a proper "parents" entry for each host.
212
213Remember, if you don't understand the parent relations in our network you can
214review the logical network diagram here:
215
216        http://noc.ws.nsrc.org/sanog18/wiki/NetworkDiagram
217
218Note the Nagios parent bullet points in the slides!
219
220Nagios Parent Relationships
221
222
223STEPS 2a - 2c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION!
224   
225
2262a. Verify that your configuration files are OK:
227
228    # nagios3 -v /etc/nagios3/nagios.cfg
229
230    ... You should get some warnings like :
231Warning: Host 'rtr' has no services associated with it!
232Warning: Host 'sw' has no services associated with it!
233etc....
234...
235Total Warnings: N
236Total Errors:   0
237
238Things look okay - No serious problems were detected during the check.
239Nagios is saying that it's unusual to monitor a device just for its
240existence on the network, without also monitoring some service.
241
242
2432b. Reload/Restart Nagios
244
245    # /etc/init.d/nagios3 restart
246
247Not always 100% reliable to use the "restart" option due to a bug in the Nagios init script.
248To be sure you may want to get used to doing:
249
250    # /etc/init.d/nagios3 stop
251    # /etc/init.d/nagios3 start
252
253
2542c. Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that the hosts
255   you just added are now visible in the interface. Click on the "Host Detail" item
256   on the left of the Nagios screen to see this. You may see it in "PENDING"
257   status until the check is carried out.
258
259
260HINT: You will be doing this a lot. If you do it all on one line, like this,
261then you can hit cursor-up and rerun all in one go:
262
263    nagios3 -v /etc/nagios3/nagios.cfg && /etc/init.d/nagios3 restart
264
265The '&&' ensures that the restart only happens if the config is valid.
266
267
2683. Create entries for the classroom PCs
269
270Now that we have our routers and switches defined it is quite easy to create
271entries for all our PCs.  Think about the parent relationships:
272
273Remember, if you do not understand the parent relationship refer back to the
274classroom network diagram here:
275
276        http://noc.ws.nsrc.org/sanog18/wiki/NetworkDiagram
277
278Below are three sample entries. One for the NOC, one for pc1 and one for
279pc6.  You should be able to use this example to create entries for all
280classroom PCs plus the NOC.
281
282We could put these entries in to separate files, but as our network is small
283we'll use a single file called pcs.cfg.
284
285NOTE! You do not add in an entry for your own PC or router. This has already
286been defined in the file /etc/nagios3/conf.d/localhost_nagios2.cfg.  This
287definition is what defines the Nagios network viewpoint. So, when you come to
288the spot where you might add an entry for your PC you should skip this and go
289on to the next PC in the list.
290
291        # editor pcs.cfg
292       
293# Our classroom NOC
294
295define host {
296    use         generic-host
297    host_name   noc
298    alias       Workshop NOC machine
299    address     10.10.0.250
300    parents     sw
301}
302
303# PCs
304
305define host {
306    use         generic-host
307    host_name   pc1
308    alias       pc1
309    address     10.10.0.1
310    parents     sw
311}
312
313define host {
314    use         generic-host
315    host_name   pc6
316    alias       pc6
317    address     10.10.0.6
318    parents     sw
319}
320
321Pay attention to the parent entries and the IP addresses.
322
323Take the three entries above and now expand this to create the remaining
324entries for the PCs in your group. That is, if you are in group 1, fill in
325for PCs 2 through 5 (rememember to skip your own PC!).
326
327
328Exit and save the file pcs.cfg
329
330As before, repeat steps 2a-2c to verify your configuration, correct any
331errors, and activate it.
332
3335. Look at your Nagios instance on the web. Note that "Status Map" gives
334you a graphical view of the parent-child relationships you have defined.
335
336
337PART III
338Configure Service check for the classroom NOC
339-----------------------------------------------------------------------------
340
3410. Configuring
342
343Now that we have our hardware configured we can start telling Nagios what services to monitor
344on the configured hardware, how to group the hardware in interesting ways, how to group
345services, etc.
346
3471. Associate a service check for our classroom NOC
348
349    # editor hostgroups_nagios2.cfg
350
351    - Find the hostgroup named "ssh-servers". In the members section of the defintion
352      change the line:
353
354members                 localhost
355
356    to
357
358members                 localhost,noc
359
360Exit and save the file.
361
362Verify that your changes are OK:
363
364        # nagios3 -v /etc/nagios3/nagios.cfg
365       
366Restart Nagios to see the new service assocation with your host:
367
368        # /etc/init.d/nagios3 restart
369
370Click on the "Service Detail" link in the Nagios web interface to see your new entry.
371
372
373PART IV
374Defining Services for all PCs
375-----------------------------------------------------------------------------
376
3770. For services, the default normal_check_interval is 5 (minutes) in
378   generic-service_nagios2.cfg. You may wish to change this to 1 to speed up
379   how quickly service issues are detected, at least in the workshop.
380
3811. Determine what services to define for what devices
382
383   - This is core to how you use Nagios and network monitoring tools in
384     general. So far we are simply using ping to verify that physical hosts
385     are up on our network and we have started monitoring a single service on
386     a single host (your PC). The next step is to decide what services you wish
387     to monitor for each host in the classroom.
388
389   - In this particular class we have:
390
391     routers:  running ssh and snmp
392     switches: running telnet and possibly ssh as well as snmp
393     pcs:      All PCs are running ssh and http and should be running snmp
394               The NOC is currently running an snmp daemon
395             
396     So, let's configure Nagios to check for these services for these
397     devices.
398
3992.) Verify that SSH is running on the routers and workshop PCs images
400
401   - In the file services_nagios2.cfg there is already an entry for the SSH
402     service check, so you do not need to create this step. Instead, you
403     simply need to re-define the "ssh-servers" entry in the file
404     /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file
405     looked like:
406
407# A list of your ssh-accessible servers
408define hostgroup {
409        hostgroup_name  ssh-servers
410                alias           SSH servers
411                members         localhost,noc
412        }
413
414     What do you think you should change? Correct, the "members" line. You should
415     add in entries for all the classroom pcs, routers and  the switches that run ssh.
416     With this information and the network diagram you should be able complete this entry.
417     
418     The entry will look something like this:
419
420define hostgroup {
421        hostgroup_name  ssh-servers
422                alias           SSH servers
423                members         localhost,pc1,pc2,pc3,...,pc6,ap1,ap2,s1,s2,noc
424        }
425
426         Note: leave in "localhost" - This is your PC and represents Nagios' network point of
427         view. So, for instance, if you are on "pc3" you would not include "pc3" in the list
428         of all the classroom pcs as it is represented by the "localhost" entry.
429         
430         The "members" entry will be a long line and will likely wrap on the screen.
431
432         Remember to include all the PCs on your table and the routers that you have defined. Do no
433         include any entries if they are not already defined in pcs.cfg, switches.cfg or
434         routers.cfg.
435
436    - Once you are done, run the pre-flight check:
437
438    # nagios3 -v /etc/nagios3/nagios.cfg
439
440    If everything looks good, then restart Nagios
441
442    # /etc/init.d/nagios3 stop
443    # /etc/init.d/nagios3 start
444
445    and view your changes in the Nagios web interface.
446
447To continue with hostgroups you can add additional groups for later use, such as all our virtual
448servers. Go ahead and edit the file hostgroups_nagios2.cfg again:
449
450     # editor hostgroups_nagios2.cfg
451
452and add the following to the end of the file:
453
454# A list of our virtual routers
455define hostgroup {
456        hostgroup_name  cisco7200
457                alias           Cisco 7200 Routers
458                members         rtr1,rtr2,rtr3,rtr4,rtr5,rtr6
459        }
460
461Save and exit from the file. Verify that everything is OK:
462
463    # nagios3 -v /etc/nagios3/nagios.cfg
464
465    If everything looks good, then restart Nagios
466
467    # /etc/init.d/nagios3 stop
468    # /etc/init.d/nagios3 start
469
4703.) Check that http is running on all the classroom PCs.
471
472    - This is almost identical to the previous exercise. Just make the change to the
473      HTTP service adding in each PC (no routers or switches). Remember, you don't need
474      to add your machine as it is already defined as "localhost".     
475
476          Find the definition in hostgroups_nagios2.cfg:
477
478                define hostgroup {
479                hostgroup_name  http-servers
480                alias           HTTP servers
481                members         localhost
482                }
483
484          and after localhost, add all the PCs on your table
485
486