| 1 | Nagios Installation and Configuration |
|---|
| 2 | |
|---|
| 3 | Notes: |
|---|
| 4 | ------ |
|---|
| 5 | * Commands preceded with "$" imply that you should execute the command as |
|---|
| 6 | a general user - not as root. |
|---|
| 7 | * Commands preceded with "#" imply that you should be working as root. |
|---|
| 8 | * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") |
|---|
| 9 | imply that you are executing commands on remote equipment, or within |
|---|
| 10 | another program. |
|---|
| 11 | |
|---|
| 12 | Exercises |
|---|
| 13 | --------- |
|---|
| 14 | |
|---|
| 15 | PART I |
|---|
| 16 | ---------------- |
|---|
| 17 | |
|---|
| 18 | 1. Log in to your virtual machine as the sysadm user. |
|---|
| 19 | |
|---|
| 20 | 2. Install Nagios Version 3 |
|---|
| 21 | --------------------------- |
|---|
| 22 | |
|---|
| 23 | $ sudo apt-get install nagios3 nagios3-doc |
|---|
| 24 | |
|---|
| 25 | During installation you will be prompted for the "Nagios web administration password:" - This |
|---|
| 26 | will be for the Nagios user "nagiosadmin". When prompted enter in the password you are using |
|---|
| 27 | your sysadm account. |
|---|
| 28 | |
|---|
| 29 | Note: if you have not already done so, you may be asked to configure |
|---|
| 30 | the Postfix Mail Transport Agent during the Nagios installation process. |
|---|
| 31 | Just accept the default "Internet Site". |
|---|
| 32 | |
|---|
| 33 | 3. See Initial Nagios Configuration |
|---|
| 34 | ------------------------------------ |
|---|
| 35 | |
|---|
| 36 | Open a browser, and go to your machine like this: |
|---|
| 37 | |
|---|
| 38 | http://pcN.ws.nsrc.org/nagios3/ |
|---|
| 39 | |
|---|
| 40 | At the login prompt, login as: |
|---|
| 41 | |
|---|
| 42 | User Name: nagiosadmin |
|---|
| 43 | Password: <CLASS PASSWORD> |
|---|
| 44 | |
|---|
| 45 | Click on the "Hosts" link on the left of the initial Nagios page to see what has |
|---|
| 46 | already been configured. |
|---|
| 47 | |
|---|
| 48 | 4. Update the File hostgroups_nagios2.cfg |
|---|
| 49 | ----------------------------------------- |
|---|
| 50 | |
|---|
| 51 | $ cd /etc/nagios3/conf.d |
|---|
| 52 | $ sudo editor hostgroups_nagios2.cfg |
|---|
| 53 | |
|---|
| 54 | Go to the bottom of the file and add the following entry (we STRONGLY encourage you |
|---|
| 55 | to COPY and PASTE!): |
|---|
| 56 | |
|---|
| 57 | |
|---|
| 58 | define hostgroup { |
|---|
| 59 | hostgroup_name ping-servers |
|---|
| 60 | alias Pingable servers |
|---|
| 61 | members rtrX |
|---|
| 62 | } |
|---|
| 63 | |
|---|
| 64 | Where "rtrX" is the router for your group. That is, if you are in group 1, then |
|---|
| 65 | replace "rtrX" with "rtr1". Now save and exit the from the file. |
|---|
| 66 | |
|---|
| 67 | |
|---|
| 68 | 5. Add Routers, PCs and Switches |
|---|
| 69 | -------------------------------- |
|---|
| 70 | |
|---|
| 71 | We will create three files, routers.cfg, switches.cfg and pcs.cfg and make |
|---|
| 72 | entries for the hardware in our classroom. |
|---|
| 73 | |
|---|
| 74 | 6a. Creating the switches.cfg file |
|---|
| 75 | ---------------------------------- |
|---|
| 76 | |
|---|
| 77 | $ cd /etc/nagios3/conf.d (just to be sure) |
|---|
| 78 | $ sudo editor switches.cfg |
|---|
| 79 | |
|---|
| 80 | In this file add the following entry (COPY and PASTE!): |
|---|
| 81 | |
|---|
| 82 | define host { |
|---|
| 83 | use generic-host |
|---|
| 84 | host_name sw |
|---|
| 85 | alias Backbone Switch |
|---|
| 86 | address 10.10.0.253 |
|---|
| 87 | } |
|---|
| 88 | |
|---|
| 89 | Save the file and exit. |
|---|
| 90 | |
|---|
| 91 | 6b. Creating the "routers.cfg" file |
|---|
| 92 | ----------------------------------- |
|---|
| 93 | |
|---|
| 94 | We have up to 10 total routers. These are rtr1-rtr9 and gw-rtr. And, we have |
|---|
| 95 | 1 or 2 wireless Access Points (ap1, ap2). We will define entries for some of |
|---|
| 96 | these. If any of these devices do not exist in your workshop, then do not |
|---|
| 97 | include them. Remember, COPY and PASTE! |
|---|
| 98 | |
|---|
| 99 | $ sudo editor routers.cfg |
|---|
| 100 | |
|---|
| 101 | |
|---|
| 102 | define host { |
|---|
| 103 | use generic-host |
|---|
| 104 | host_name gw-rtr |
|---|
| 105 | alias Classrooom Gateway Router |
|---|
| 106 | address 10.10.0.254 |
|---|
| 107 | } |
|---|
| 108 | |
|---|
| 109 | define host { |
|---|
| 110 | use generic-host |
|---|
| 111 | host_name rtr1 |
|---|
| 112 | alias Group 1 Gateway Router |
|---|
| 113 | address 10.10.1.254 |
|---|
| 114 | } |
|---|
| 115 | |
|---|
| 116 | define host { |
|---|
| 117 | use generic-host |
|---|
| 118 | host_name rtr2 |
|---|
| 119 | alias Group 2 Gateway Router |
|---|
| 120 | address 10.10.2.254 |
|---|
| 121 | } |
|---|
| 122 | |
|---|
| 123 | # Note: you do not need to add definitions for all routers now = you can |
|---|
| 124 | # always come back and add the rest later! |
|---|
| 125 | |
|---|
| 126 | define host { |
|---|
| 127 | use generic-host |
|---|
| 128 | host_name ap1 |
|---|
| 129 | alias Wireless Access Point 1 |
|---|
| 130 | address 10.10.0.251 |
|---|
| 131 | } |
|---|
| 132 | |
|---|
| 133 | define host { |
|---|
| 134 | use generic-host |
|---|
| 135 | host_name ap2 |
|---|
| 136 | alias Wireless Access Point 2 |
|---|
| 137 | address 10.10.0.252 |
|---|
| 138 | } |
|---|
| 139 | |
|---|
| 140 | |
|---|
| 141 | Now save the file and exit the editor. |
|---|
| 142 | |
|---|
| 143 | |
|---|
| 144 | 6c. Creating the pcs.cfg File |
|---|
| 145 | ----------------------------- |
|---|
| 146 | |
|---|
| 147 | Now we will create entries for some of the Virtual Machines in our classroom |
|---|
| 148 | Below we give you the first few entries. You should complete the file with as |
|---|
| 149 | many PCs as you wish to add. We recommend that, at least, you add the 4 PCs |
|---|
| 150 | that are members of your group as well as an entry for the classroom NOC, and |
|---|
| 151 | at least one PC from another group (remember to COPY and PASTE!): |
|---|
| 152 | |
|---|
| 153 | $ sudo editor pcs.cfg |
|---|
| 154 | |
|---|
| 155 | |
|---|
| 156 | define host { |
|---|
| 157 | use generic-host |
|---|
| 158 | host_name noc |
|---|
| 159 | alias Workshop NOC machine |
|---|
| 160 | address 10.10.0.250 |
|---|
| 161 | } |
|---|
| 162 | |
|---|
| 163 | # |
|---|
| 164 | # Group 1 |
|---|
| 165 | # |
|---|
| 166 | |
|---|
| 167 | define host { |
|---|
| 168 | use generic-host |
|---|
| 169 | host_name pc1 |
|---|
| 170 | alias pc1 |
|---|
| 171 | address 10.10.1.1 |
|---|
| 172 | } |
|---|
| 173 | |
|---|
| 174 | define host { |
|---|
| 175 | use generic-host |
|---|
| 176 | host_name pc2 |
|---|
| 177 | alias pc2 |
|---|
| 178 | address 10.10.1.2 |
|---|
| 179 | } |
|---|
| 180 | |
|---|
| 181 | # |
|---|
| 182 | # Another PC (example only!) |
|---|
| 183 | # |
|---|
| 184 | |
|---|
| 185 | define host { |
|---|
| 186 | use generic-host |
|---|
| 187 | host_name pc20 |
|---|
| 188 | alias pc20 |
|---|
| 189 | address 10.10.5.20 |
|---|
| 190 | } |
|---|
| 191 | |
|---|
| 192 | You can save and exit from the file now. You can add more PC entries later. |
|---|
| 193 | |
|---|
| 194 | |
|---|
| 195 | STEPS 7a - 7c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION! |
|---|
| 196 | ======================================================================= |
|---|
| 197 | |
|---|
| 198 | 7a. Verify that your configuration files are OK |
|---|
| 199 | ----------------------------------------------- |
|---|
| 200 | |
|---|
| 201 | $ sudo nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 202 | |
|---|
| 203 | |
|---|
| 204 | You will get some warnings like the ones below. You can ignore them for |
|---|
| 205 | now. |
|---|
| 206 | |
|---|
| 207 | Checking services... |
|---|
| 208 | Checked 7 services. |
|---|
| 209 | Checking hosts... |
|---|
| 210 | Warning: Host 'gw-rtr' has no services associated with it! |
|---|
| 211 | Warning: Host 'rtr1' has no services associated with it! |
|---|
| 212 | Warning: Host 'rtr2' has no services associated with it! |
|---|
| 213 | |
|---|
| 214 | etc.... |
|---|
| 215 | ... |
|---|
| 216 | Total Warnings: N |
|---|
| 217 | Total Errors: 0 |
|---|
| 218 | |
|---|
| 219 | Things look okay - No serious problems were detected during the check. |
|---|
| 220 | Nagios is saying that it's unusual to monitor a device just for its |
|---|
| 221 | existence on the network, without also monitoring some service. |
|---|
| 222 | |
|---|
| 223 | |
|---|
| 224 | 7b. Reload/Restart Nagios |
|---|
| 225 | ------------------------- |
|---|
| 226 | |
|---|
| 227 | $ sudo service nagios3 restart |
|---|
| 228 | |
|---|
| 229 | HINT: You will be doing this a lot. If you do it all on one line, like this, |
|---|
| 230 | then you can use arrow-up and call back the command: |
|---|
| 231 | |
|---|
| 232 | $ sudo nagios3 -v /etc/nagios3/nagios.cfg && sudo /etc/init.d/nagios3 restart |
|---|
| 233 | |
|---|
| 234 | The '&&' ensures that the restart only happens if the config is valid. |
|---|
| 235 | |
|---|
| 236 | |
|---|
| 237 | 7c. Verify via the Web Interface |
|---|
| 238 | -------------------------------- |
|---|
| 239 | |
|---|
| 240 | Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that the hosts |
|---|
| 241 | you just added are now visible in the interface. Click on the "Hosts" item on the |
|---|
| 242 | left of the Nagios screen to see this. You may see it in "PENDING" status until the |
|---|
| 243 | check is carried out. |
|---|
| 244 | |
|---|
| 245 | |
|---|
| 246 | 8. View Status Map |
|---|
| 247 | -------------------- |
|---|
| 248 | |
|---|
| 249 | Go to http://pcN.ws.nsrc.org/nagios3 |
|---|
| 250 | |
|---|
| 251 | Click on the "Map" item on the left. You should see all your hosts with the Nagios |
|---|
| 252 | process in the middle. The "?" are because we have not told Nagios what type of host |
|---|
| 253 | each items is (router, switch, AP, PC running Linux, etc...) |
|---|
| 254 | |
|---|
| 255 | |
|---|
| 256 | |
|---|
| 257 | PART II |
|---|
| 258 | Configure Service check for the classroom NOC |
|---|
| 259 | ----------------------------------------------------------------------------- |
|---|
| 260 | |
|---|
| 261 | 0. Configuring |
|---|
| 262 | |
|---|
| 263 | Now that we have our hardware configured we can start telling Nagios what services to monitor |
|---|
| 264 | on the configured hardware, how to group the hardware in interesting ways, how to group |
|---|
| 265 | services, etc. |
|---|
| 266 | |
|---|
| 267 | 1. Associate a service check for our classroom NOC |
|---|
| 268 | |
|---|
| 269 | $ sudo editor hostgroups_nagios2.cfg |
|---|
| 270 | |
|---|
| 271 | - Find the hostgroup named "ssh-servers". In the members section of the defintion |
|---|
| 272 | change the line: |
|---|
| 273 | |
|---|
| 274 | members localhost |
|---|
| 275 | |
|---|
| 276 | to |
|---|
| 277 | |
|---|
| 278 | members localhost,noc |
|---|
| 279 | |
|---|
| 280 | Exit and save the file. |
|---|
| 281 | |
|---|
| 282 | Verify that your changes are OK: |
|---|
| 283 | |
|---|
| 284 | $ sudo nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 285 | |
|---|
| 286 | Restart Nagios to see the new service assocation with your host: |
|---|
| 287 | |
|---|
| 288 | $ sudo service nagios3 restart |
|---|
| 289 | |
|---|
| 290 | In the Nagios web interface, find the "Services" link (left menu), and click |
|---|
| 291 | on it. |
|---|
| 292 | |
|---|
| 293 | You should be able to find your recent change: |
|---|
| 294 | |
|---|
| 295 | noc SSH PENDING ... |
|---|
| 296 | |
|---|
| 297 | |
|---|
| 298 | |
|---|
| 299 | PART III |
|---|
| 300 | Defining Services for all PCs |
|---|
| 301 | ----------------------------------------------------------------------------- |
|---|
| 302 | |
|---|
| 303 | Note: The default normal_check_interval is 5 (minutes) for checking services. |
|---|
| 304 | This is defined in "generic-service_nagios2.cfg". You may wish to change |
|---|
| 305 | this to 1 (1 minute) to speed up how quickly service issues are detected, |
|---|
| 306 | at least during this workshop. |
|---|
| 307 | |
|---|
| 308 | 1. Determine what services to define for what devices |
|---|
| 309 | |
|---|
| 310 | - This is a central concept in using Nagios and network monitoring tools |
|---|
| 311 | in general. So far we are simply using ping to verify that physical hosts |
|---|
| 312 | are up on our network and we have started monitoring a single service on |
|---|
| 313 | a single host (your PC). The next step is to decide what services (web |
|---|
| 314 | server, SSH, etc.) you wish to monitor for each host in the classroom. |
|---|
| 315 | |
|---|
| 316 | - In this particular class we have: |
|---|
| 317 | |
|---|
| 318 | routers: running ssh and snmp |
|---|
| 319 | switches: running telnet and possibly ssh as well as snmp |
|---|
| 320 | pcs: All PCs are running ssh and http and should be running snmp |
|---|
| 321 | The NOC is currently running an snmp daemon |
|---|
| 322 | |
|---|
| 323 | So, let's configure Nagios to check for these services on these |
|---|
| 324 | devices. |
|---|
| 325 | |
|---|
| 326 | 2.) Verify that SSH is running on the routers and workshop PCs images |
|---|
| 327 | |
|---|
| 328 | - In the file "services_nagios2.cfg" there is already an entry for the SSH |
|---|
| 329 | service check, so you do not need to create this step. Instead, you |
|---|
| 330 | simply need to re-define the "ssh-servers" entry in the file |
|---|
| 331 | /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file |
|---|
| 332 | looked like: |
|---|
| 333 | |
|---|
| 334 | # A list of your ssh-accessible servers |
|---|
| 335 | define hostgroup { |
|---|
| 336 | hostgroup_name ssh-servers |
|---|
| 337 | alias SSH servers |
|---|
| 338 | members localhost |
|---|
| 339 | } |
|---|
| 340 | |
|---|
| 341 | What do you think you should change? Correct, the "members" line. You |
|---|
| 342 | should add in entries for all the classroom pcs, routers and the |
|---|
| 343 | switches that run ssh. With this information and the network diagram |
|---|
| 344 | you should be able complete this entry. |
|---|
| 345 | |
|---|
| 346 | The entry will look something like this: |
|---|
| 347 | |
|---|
| 348 | define hostgroup { |
|---|
| 349 | hostgroup_name ssh-servers |
|---|
| 350 | alias SSH servers |
|---|
| 351 | members localhost,pc1,pc2,...,ap1,noc,rtr1,rtr2,...,gw-rtr |
|---|
| 352 | } |
|---|
| 353 | |
|---|
| 354 | Note: do not remove "localhost" - This is your PC and represents |
|---|
| 355 | Nagios' network point of view. So, for instance, if you are on "pc3" |
|---|
| 356 | you would NOT list "pc3" in the list of all the classroom pcs as |
|---|
| 357 | it is represented by the "localhost" entry. |
|---|
| 358 | |
|---|
| 359 | The "members" entry will be a long line and will likely wrap on the |
|---|
| 360 | screen. If you want to start additional entries on newline then use |
|---|
| 361 | "\" to indicate a newline like this: |
|---|
| 362 | |
|---|
| 363 | Remember to include all the PCs and routers that you have defined in |
|---|
| 364 | the files "pcs.cfg", "switches.cfg" and "routers.cfg". Only add entries |
|---|
| 365 | from these files (i.e.: don't add "pc8" in your hostgroup list if "pc8" |
|---|
| 366 | isn't defined in "pcs.cfg" as well). |
|---|
| 367 | |
|---|
| 368 | - Once you are done, run the pre-flight check and restart Nagios: |
|---|
| 369 | |
|---|
| 370 | $ sudo nagios3 -v /etc/nagios3/nagios.cfg && sudo /etc/init.d/nagios3 restart |
|---|
| 371 | |
|---|
| 372 | ... and view your changes in the Nagios web interface. |
|---|
| 373 | |
|---|
| 374 | To continue with hostgroups you can add additional groups for later use, such as all our virtual |
|---|
| 375 | routers. Go ahead and edit the file hostgroups_nagios2.cfg again: |
|---|
| 376 | |
|---|
| 377 | $ sudo editor hostgroups_nagios2.cfg |
|---|
| 378 | |
|---|
| 379 | and add the following to the end of the file (COPY and PASTE this): |
|---|
| 380 | |
|---|
| 381 | # A list of our virtual routers |
|---|
| 382 | |
|---|
| 383 | define hostgroup { |
|---|
| 384 | hostgroup_name routers |
|---|
| 385 | alias Cisco 7200 Routers |
|---|
| 386 | members rtr1,rtr2,... |
|---|
| 387 | } |
|---|
| 388 | |
|---|
| 389 | Only list the routers you have defined in the "routers.cfg". |
|---|
| 390 | |
|---|
| 391 | Save and exit from the file. Verify that everything is OK: |
|---|
| 392 | |
|---|
| 393 | $ sudo nagios3 -v /etc/nagios3/nagios.cfg |
|---|
| 394 | |
|---|
| 395 | If everything looks good, then restart Nagios |
|---|
| 396 | |
|---|
| 397 | $ sudo service nagios3 restart |
|---|
| 398 | |
|---|
| 399 | 3.) Check that http is running on all the classroom PCs. |
|---|
| 400 | |
|---|
| 401 | - This is almost identical to the previous exercise. Just make the change |
|---|
| 402 | to the HTTP service adding in each PC (no routers or switches). Remember, |
|---|
| 403 | you don't need to add your machine as it is already defined as |
|---|
| 404 | "localhost". Look for this hostgroup in the file hostgroups_nagios2.cfg |
|---|
| 405 | and update the "members" line appropriately. |
|---|
| 406 | |
|---|
| 407 | If you have questions or are confused please ask an instructor for help. |
|---|