| 1 | Nagios Installation and Configuration |
|---|
| 2 | |
|---|
| 3 | Notes: |
|---|
| 4 | ------ |
|---|
| 5 | * Commands preceded with "$" imply that you should execute the command as |
|---|
| 6 | a general user - not as root. |
|---|
| 7 | * Commands preceded with "#" imply that you should be working as root. |
|---|
| 8 | * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") |
|---|
| 9 | imply that you are executing commands on remote equipment, or within |
|---|
| 10 | another program. |
|---|
| 11 | |
|---|
| 12 | Exercises |
|---|
| 13 | --------- |
|---|
| 14 | |
|---|
| 15 | PART I |
|---|
| 16 | ---------------- |
|---|
| 17 | |
|---|
| 18 | 0. Log in to your virtual machine as the sysadm user. |
|---|
| 19 | |
|---|
| 20 | 1. Install Nagios Version 3 |
|---|
| 21 | --------------------------- |
|---|
| 22 | |
|---|
| 23 | Become the root user: |
|---|
| 24 | |
|---|
| 25 | $ sudo bash |
|---|
| 26 | # apt-get install nagios3 nagios3-doc |
|---|
| 27 | |
|---|
| 28 | During installation you will be prompted for the "Nagios web administration password:" - This |
|---|
| 29 | will be for the Nagios user "nagiosadmin". When prompted enter in the password you are using |
|---|
| 30 | your sysadm account. |
|---|
| 31 | |
|---|
| 32 | Note: if you have not already done so, you may be asked to configure |
|---|
| 33 | the Postfix Mail Transport Agent during the Nagios installation process. |
|---|
| 34 | Just accept the default "Internet Site". |
|---|
| 35 | |
|---|
| 36 | 2. See Initial Nagios Configuration |
|---|
| 37 | ------------------------------------ |
|---|
| 38 | |
|---|
| 39 | Open a browser, and go to your machine like this: |
|---|
| 40 | |
|---|
| 41 | http://pcN.ws.nsrc.org/nagios3/ |
|---|
| 42 | |
|---|
| 43 | At the login prompt, login as: |
|---|
| 44 | |
|---|
| 45 | User Name: nagiosadmin |
|---|
| 46 | Password: <CLASS PASSWORD> |
|---|
| 47 | |
|---|
| 48 | Click on the "Hosts" link on the left of the initial Nagios page to see what has |
|---|
| 49 | already been configured. |
|---|
| 50 | |
|---|
| 51 | 3. Enable External commands in nagios.cfg |
|---|
| 52 | ----------------------------------------- |
|---|
| 53 | |
|---|
| 54 | This change is required in order to allow users to "Acknowledge" problems with |
|---|
| 55 | hosts and services in the Web interface. |
|---|
| 56 | |
|---|
| 57 | First, edit the file /etc/nagios3/nagios.cfg, and change the line: |
|---|
| 58 | |
|---|
| 59 | check_external_commands=0 |
|---|
| 60 | |
|---|
| 61 | to |
|---|
| 62 | |
|---|
| 63 | check_external_commands=1 |
|---|
| 64 | |
|---|
| 65 | Save the file and exit. |
|---|
| 66 | |
|---|
| 67 | Then, perform the following commands to change directory permissions and |
|---|
| 68 | to make the changes permanent: |
|---|
| 69 | |
|---|
| 70 | /etc/init.d/nagios3 stop |
|---|
| 71 | dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw |
|---|
| 72 | dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3 |
|---|
| 73 | /etc/init.d/nagios3 start |
|---|
| 74 | |
|---|
| 75 | |
|---|
| 76 | 4. Update the File hostgroups_nagios2.cfg |
|---|
| 77 | ----------------------------------------- |
|---|
| 78 | |
|---|
| 79 | # cd /etc/nagios3/conf.d |
|---|
| 80 | # editor hostgroups_nagios2.cfg |
|---|
| 81 | |
|---|
| 82 | Go to the bottom of the file and add the following entry (we STRONGLY encourage you |
|---|
| 83 | to COPY and PASTE!): |
|---|
| 84 | |
|---|
| 85 | |
|---|
| 86 | define hostgroup { |
|---|
| 87 | hostgroup_name ping-servers |
|---|
| 88 | alias Pingable servers |
|---|
| 89 | members rtrX |
|---|
| 90 | } |
|---|
| 91 | |
|---|
| 92 | Where "rtrX" is the router for your group. That is, if you are in group 1, then |
|---|
| 93 | replace "rtrX" with "rtr1". Now save and exit the from the file. |
|---|
| 94 | |
|---|
| 95 | |
|---|
| 96 | 5. Add Routers, PCs and Switches |
|---|
| 97 | -------------------------------- |
|---|
| 98 | |
|---|
| 99 | We will create three files, routers.cfg, switches.cfg and pcs.cfg and make |
|---|
| 100 | entries for the hardware in our classroom. |
|---|
| 101 | |
|---|
| 102 | 6a. Creating the switches.cfg file |
|---|
| 103 | ---------------------------------- |
|---|
| 104 | |
|---|
| 105 | # cd /etc/nagios3/conf.d (just to be sure) |
|---|
| 106 | # editor switches.cfg |
|---|
| 107 | |
|---|
| 108 | In this file add the following entry (COPY and PASTE!): |
|---|
| 109 | |
|---|
| 110 | define host { |
|---|
| 111 | use generic-host |
|---|
| 112 | host_name sw |
|---|
| 113 | alias Backbone Switch |
|---|
| 114 | address 10.10.0.253 |
|---|
| 115 | } |
|---|
| 116 | |
|---|
| 117 | Save the file and exit. |
|---|
| 118 | |
|---|
| 119 | 6b. Creating the routers.cfg file |
|---|
| 120 | --------------------------------- |
|---|
| 121 | |
|---|
| 122 | We have up to 10 total routers. These are rtr1-rtr9 and gw-rtr. And, we have 1 or 2 |
|---|
| 123 | wireless Access Points (ap1, ap2). We will define entries for each of these. If any |
|---|
| 124 | of these devices do not exist in your workshop, then do not include them. Remember, |
|---|
| 125 | COPY and PASTE! |
|---|
| 126 | |
|---|
| 127 | # editor routers.cfg |
|---|
| 128 | |
|---|
| 129 | |
|---|
| 130 | define host { |
|---|
| 131 | use generic-host |
|---|
| 132 | host_name gw-rtr |
|---|
| 133 | alias Classrooom Gateway Router |
|---|
| 134 | address 10.10.0.254 |
|---|
| 135 | } |
|---|
| 136 | |
|---|
| 137 | define host { |
|---|
| 138 | use generic-host |
|---|
| 139 | host_name rtr1 |
|---|
| 140 | alias Group 1 Gateway Router |
|---|
| 141 | address 10.10.1.254 |
|---|
| 142 | } |
|---|
| 143 | |
|---|
| 144 | define host { |
|---|
| 145 | use generic-host |
|---|
| 146 | host_name rtr2 |
|---|
| 147 | alias Group 2 Gateway Router |
|---|
| 148 | address 10.10.2.254 |
|---|
| 149 | } |
|---|
| 150 | |
|---|
| 151 | define host { |
|---|
| 152 | use generic-host |
|---|
| 153 | host_name rtr3 |
|---|
| 154 | alias Group 3 Gateway Router |
|---|
| 155 | address 10.10.3.254 |
|---|
| 156 | } |
|---|
| 157 | |
|---|
| 158 | define host { |
|---|
| 159 | use generic-host |
|---|
| 160 | host_name rtr4 |
|---|
| 161 | alias Group 4 Gateway Router |
|---|
| 162 | address 10.10.4.254 |
|---|
| 163 | } |
|---|
| 164 | |
|---|
| 165 | define host { |
|---|
| 166 | use generic-host |
|---|
| 167 | host_name rtr5 |
|---|
| 168 | alias Group 5 Gateway Router |
|---|
| 169 | address 10.10.5.254 |
|---|
| 170 | } |
|---|
| 171 | |
|---|
| 172 | define host { |
|---|
| 173 | use generic-host |
|---|
| 174 | host_name rtr6 |
|---|
| 175 | alias Group 6 Gateway Router |
|---|
| 176 | address 10.10.6.254 |
|---|
| 177 | } |
|---|
| 178 | |
|---|
| 179 | define host { |
|---|
| 180 | use generic-host |
|---|
| 181 | host_name rtr7 |
|---|
| 182 | alias Group 7 Gateway Router |
|---|
| 183 | address 10.10.7.254 |
|---|
| 184 | } |
|---|
| 185 | |
|---|
| 186 | define host { |
|---|
| 187 | use generic-host |
|---|
| 188 | host_name rtr8 |
|---|
| 189 | alias Group 8 Gateway Router |
|---|
| 190 | address 10.10.8.254 |
|---|
| 191 | } |
|---|
| 192 | |
|---|
| 193 | define host { |
|---|
| 194 | use generic-host |
|---|
| 195 | host_name rtr9 |
|---|
| 196 | alias Group 9 Gateway Router |
|---|
| 197 | address 10.10.9.254 |
|---|
| 198 | } |
|---|
| 199 | |
|---|
| 200 | define host { |
|---|
| 201 | use generic-host |
|---|
| 202 | host_name ap1 |
|---|
| 203 | alias Wireless Access Point 1 |
|---|
| 204 | address 10.10.0.251 |
|---|
| 205 | } |
|---|
| 206 | |
|---|
| 207 | define host { |
|---|
| 208 | use generic-host |
|---|
| 209 | host_name ap2 |
|---|
| 210 | alias Wireless Access Point 2 |
|---|
| 211 | address 10.10.0.252 |
|---|
| 212 | } |
|---|
| 213 | |
|---|
| 214 | |
|---|
| 215 | Now save and exit from the file. |
|---|
| 216 | |
|---|
| 217 | |
|---|
| 218 | 6c. Creating the pcs.cfg File |
|---|
| 219 | ----------------------------- |
|---|
| 220 | |
|---|
| 221 | Now we will create entries for all the Virtual Machines in our classroom. Below |
|---|
| 222 | we give you the first few entries. You should complete the file with as many PCs |
|---|
| 223 | as you wish to add. We recommend that, at least, you add the 4 PCs that are |
|---|
| 224 | members of your group as well as an entry for the classroom NOC, and at least |
|---|
| 225 | one PC from another group (remember to COPY and PASTE!): |
|---|
| 226 | |
|---|
| 227 | # editor pcs.cfg |
|---|
| 228 | |
|---|
| 229 | |
|---|
| 230 | define host { |
|---|
| 231 | use generic-host |
|---|
| 232 | host_name noc |
|---|
| 233 | alias Workshop NOC machine |
|---|
| 234 | address 10.10.0.250 |
|---|
| 235 | } |
|---|
| 236 | |
|---|
| 237 | # |
|---|
| 238 | # Group 1 |
|---|
| 239 | # |
|---|
| 240 | |
|---|
| 241 | define host { |
|---|
| 242 | use generic-host |
|---|
| 243 | host_name pc1 |
|---|
| 244 | alias pc1 |
|---|
| 245 | address 10.10.1.1 |
|---|
| 246 | } |
|---|
| 247 | |
|---|
| 248 | define host { |
|---|
| 249 | use generic-host |
|---|
| 250 | host_name pc2 |
|---|
| 251 | alias pc2 |
|---|
| 252 | address 10.10.1.2 |
|---|
| 253 | } |
|---|
| 254 | |
|---|
| 255 | define host { |
|---|
| 256 | use generic-host |
|---|
| 257 | host_name pc3 |
|---|
| 258 | alias pc3 |
|---|
| 259 | address 10.10.1.3 |
|---|
| 260 | } |
|---|
| 261 | |
|---|
| 262 | define host { |
|---|
| 263 | use generic-host |
|---|
| 264 | host_name pc4 |
|---|
| 265 | alias pc4 |
|---|
| 266 | address 10.10.1.4 |
|---|
| 267 | } |
|---|
| 268 | |
|---|
| 269 | # |
|---|
| 270 | # Another PC (example only!) |
|---|
| 271 | # |
|---|
| 272 | |
|---|
| 273 | define host { |
|---|
| 274 | use generic-host |
|---|
| 275 | host_name pc20 |
|---|
| 276 | alias pc20 |
|---|
| 277 | address 10.10.5.20 |
|---|
| 278 | } |
|---|
| 279 | |
|---|
| 280 | You can save and exit from the file now, or you can continue to add more PC entries. |
|---|
| 281 | If you have not added PCs for your group be sure to do that before you exit from the |
|---|
| 282 | file. |
|---|
| 283 | |
|---|
| 284 | |
|---|
| 285 | |
|---|
| 286 | STEPS 7a - 7c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION! |
|---|
| 287 | ======================================================================= |
|---|
| 288 | |
|---|
| 289 | 7a. Verify that your configuration files are OK |
|---|
| 290 | ----------------------------------------------- |
|---|
| 291 | |
|---|
| 292 | # nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 293 | |
|---|
| 294 | |
|---|
| 295 | ... You should get some warnings like : |
|---|
| 296 | |
|---|
| 297 | Checking services... |
|---|
| 298 | Checked 7 services. |
|---|
| 299 | Checking hosts... |
|---|
| 300 | Warning: Host 'gw-rtr' has no services associated with it! |
|---|
| 301 | Warning: Host 'rtr1' has no services associated with it! |
|---|
| 302 | Warning: Host 'rtr2' has no services associated with it! |
|---|
| 303 | |
|---|
| 304 | etc.... |
|---|
| 305 | ... |
|---|
| 306 | Total Warnings: N |
|---|
| 307 | Total Errors: 0 |
|---|
| 308 | |
|---|
| 309 | Things look okay - No serious problems were detected during the check. |
|---|
| 310 | Nagios is saying that it's unusual to monitor a device just for its |
|---|
| 311 | existence on the network, without also monitoring some service. |
|---|
| 312 | |
|---|
| 313 | |
|---|
| 314 | 7b. Reload/Restart Nagios |
|---|
| 315 | ------------------------- |
|---|
| 316 | |
|---|
| 317 | # service nagios3 restart |
|---|
| 318 | |
|---|
| 319 | HINT: You will be doing this a lot. If you do it all on one line, like this, |
|---|
| 320 | then you can hit cursor-up and rerun all in one go: |
|---|
| 321 | |
|---|
| 322 | # nagios3 -v /etc/nagios3/nagios.cfg && /etc/init.d/nagios3 restart |
|---|
| 323 | |
|---|
| 324 | The '&&' ensures that the restart only happens if the config is valid. |
|---|
| 325 | |
|---|
| 326 | |
|---|
| 327 | 7c. Verify via the Web Interface |
|---|
| 328 | -------------------------------- |
|---|
| 329 | |
|---|
| 330 | Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that the hosts |
|---|
| 331 | you just added are now visible in the interface. Click on the "Hosts" item on the |
|---|
| 332 | left of the Nagios screen to see this. You may see it in "PENDING" status until the |
|---|
| 333 | check is carried out. |
|---|
| 334 | |
|---|
| 335 | |
|---|
| 336 | 8. View Status Map |
|---|
| 337 | -------------------- |
|---|
| 338 | |
|---|
| 339 | Go to http://pcN.ws.nsrc.org/nagios3 |
|---|
| 340 | |
|---|
| 341 | Click on the "Map" item on the left. You should see all your hosts with the Nagios |
|---|
| 342 | process in the middle. The "?" are because we have not told Nagios what type of host |
|---|
| 343 | each items is (router, switch, AP, PC running Linux, etc...) |
|---|
| 344 | |
|---|
| 345 | |
|---|
| 346 | |
|---|
| 347 | PART II |
|---|
| 348 | Configure Service check for the classroom NOC |
|---|
| 349 | ----------------------------------------------------------------------------- |
|---|
| 350 | |
|---|
| 351 | 0. Configuring |
|---|
| 352 | |
|---|
| 353 | Now that we have our hardware configured we can start telling Nagios what services to monitor |
|---|
| 354 | on the configured hardware, how to group the hardware in interesting ways, how to group |
|---|
| 355 | services, etc. |
|---|
| 356 | |
|---|
| 357 | 1. Associate a service check for our classroom NOC |
|---|
| 358 | |
|---|
| 359 | # editor hostgroups_nagios2.cfg |
|---|
| 360 | |
|---|
| 361 | - Find the hostgroup named "ssh-servers". In the members section of the defintion |
|---|
| 362 | change the line: |
|---|
| 363 | |
|---|
| 364 | members localhost |
|---|
| 365 | |
|---|
| 366 | to |
|---|
| 367 | |
|---|
| 368 | members localhost,noc |
|---|
| 369 | |
|---|
| 370 | Exit and save the file. |
|---|
| 371 | |
|---|
| 372 | Verify that your changes are OK: |
|---|
| 373 | |
|---|
| 374 | # nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 375 | |
|---|
| 376 | Restart Nagios to see the new service assocation with your host: |
|---|
| 377 | |
|---|
| 378 | # service nagios3 restart |
|---|
| 379 | |
|---|
| 380 | Click on the "Services" link in the Nagios web interface to see your new entry - it should |
|---|
| 381 | say "noc SSH PENDING ...". |
|---|
| 382 | |
|---|
| 383 | |
|---|
| 384 | |
|---|
| 385 | PART III |
|---|
| 386 | Defining Services for all PCs |
|---|
| 387 | ----------------------------------------------------------------------------- |
|---|
| 388 | |
|---|
| 389 | 0. For services, the default normal_check_interval is 5 (minutes) in |
|---|
| 390 | generic-service_nagios2.cfg. You may wish to change this to 1 to speed up |
|---|
| 391 | how quickly service issues are detected, at least in the workshop. |
|---|
| 392 | |
|---|
| 393 | 1. Determine what services to define for what devices |
|---|
| 394 | |
|---|
| 395 | - This is core to how you use Nagios and network monitoring tools in |
|---|
| 396 | general. So far we are simply using ping to verify that physical hosts |
|---|
| 397 | are up on our network and we have started monitoring a single service on |
|---|
| 398 | a single host (your PC). The next step is to decide what services you wish |
|---|
| 399 | to monitor for each host in the classroom. |
|---|
| 400 | |
|---|
| 401 | - In this particular class we have: |
|---|
| 402 | |
|---|
| 403 | routers: running ssh and snmp |
|---|
| 404 | switches: running telnet and possibly ssh as well as snmp |
|---|
| 405 | pcs: All PCs are running ssh and http and should be running snmp |
|---|
| 406 | The NOC is currently running an snmp daemon |
|---|
| 407 | |
|---|
| 408 | So, let's configure Nagios to check for these services for these |
|---|
| 409 | devices. |
|---|
| 410 | |
|---|
| 411 | 2.) Verify that SSH is running on the routers and workshop PCs images |
|---|
| 412 | |
|---|
| 413 | - In the file services_nagios2.cfg there is already an entry for the SSH |
|---|
| 414 | service check, so you do not need to create this step. Instead, you |
|---|
| 415 | simply need to re-define the "ssh-servers" entry in the file |
|---|
| 416 | /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file |
|---|
| 417 | looked like: |
|---|
| 418 | |
|---|
| 419 | # A list of your ssh-accessible servers |
|---|
| 420 | define hostgroup { |
|---|
| 421 | hostgroup_name ssh-servers |
|---|
| 422 | alias SSH servers |
|---|
| 423 | members localhost |
|---|
| 424 | } |
|---|
| 425 | |
|---|
| 426 | What do you think you should change? Correct, the "members" line. You should |
|---|
| 427 | add in entries for all the classroom pcs, routers and the switches that run ssh. |
|---|
| 428 | With this information and the network diagram you should be able complete this entry. |
|---|
| 429 | |
|---|
| 430 | The entry will look something like this: |
|---|
| 431 | |
|---|
| 432 | define hostgroup { |
|---|
| 433 | hostgroup_name ssh-servers |
|---|
| 434 | alias SSH servers |
|---|
| 435 | members localhost,pc1,pc2,pc3,pc4...,pc36,ap1,noc,rtr1,rtr2âŠrtr9,gw-rtr |
|---|
| 436 | } |
|---|
| 437 | |
|---|
| 438 | Note: leave in "localhost" - This is your PC and represents Nagios' network point of |
|---|
| 439 | view. So, for instance, if you are on "pc3" you would not include "pc3" in the list |
|---|
| 440 | of all the classroom pcs as it is represented by the "localhost" entry. |
|---|
| 441 | |
|---|
| 442 | The "members" entry will be a long line and will likely wrap on the screen. If you want to |
|---|
| 443 | start additional entries on newline then use "\" to indicate a newline like this: |
|---|
| 444 | |
|---|
| 445 | members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12, \ |
|---|
| 446 | pc13,pc14...pc36,ap1,noc,rtr1,rtr2,rtr3...rtr9,gw-rtr |
|---|
| 447 | |
|---|
| 448 | Remember to include all your PCs and all your routers that you have defined. Do not |
|---|
| 449 | include any entries if they are not already defined in pcs.cfg, switches.cfg or |
|---|
| 450 | routers.cfg. |
|---|
| 451 | |
|---|
| 452 | - Once you are done, run the pre-flight check and restart Nagios: |
|---|
| 453 | |
|---|
| 454 | # nagios3 -v /etc/nagios3/nagios.cfg && /etc/init.d/nagios3 restart |
|---|
| 455 | |
|---|
| 456 | and view your changes in the Nagios web interface. |
|---|
| 457 | |
|---|
| 458 | To continue with hostgroups you can add additional groups for later use, such as all our virtual |
|---|
| 459 | routers. Go ahead and edit the file hostgroups_nagios2.cfg again: |
|---|
| 460 | |
|---|
| 461 | # editor hostgroups_nagios2.cfg |
|---|
| 462 | |
|---|
| 463 | and add the following to the end of the file (COPY and PASTE this): |
|---|
| 464 | |
|---|
| 465 | # A list of our virtual routers |
|---|
| 466 | |
|---|
| 467 | define hostgroup { |
|---|
| 468 | hostgroup_name routers |
|---|
| 469 | alias Cisco 7200 Routers |
|---|
| 470 | members rtr1,rtr2,rtr3,rtr4,rtr5,rtr6,rtr7,rtr8,rtr9 |
|---|
| 471 | } |
|---|
| 472 | |
|---|
| 473 | Save and exit from the file. Verify that everything is OK: |
|---|
| 474 | |
|---|
| 475 | # nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 476 | |
|---|
| 477 | If everything looks good, then restart Nagios |
|---|
| 478 | |
|---|
| 479 | # service nagios3 restart |
|---|
| 480 | |
|---|
| 481 | 3.) Check that http is running on all the classroom PCs. |
|---|
| 482 | |
|---|
| 483 | - This is almost identical to the previous exercise. Just make the change to the |
|---|
| 484 | HTTP service adding in each PC (no routers or switches). Remember, you don't need |
|---|
| 485 | to add your machine as it is already defined as "localhost". Look for this hostgroup |
|---|
| 486 | in the file hostgroups_nagios2.cfg and update the "members" line appropriately. |
|---|
| 487 | |
|---|
| 488 | If you have questions or are confused feel free to ask an instructor for help. |
|---|
| 489 | |
|---|