Track2Virt: ex-ganeti-failure-scenarios.htm

File ex-ganeti-failure-scenarios.htm, 51.1 KB (added by Brian Candler, 6 years ago)
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2<html xmlns="http://www.w3.org/1999/xhtml">
3<head>
4  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
5  <meta http-equiv="Content-Style-Type" content="text/css" />
6  <meta name="generator" content="pandoc" />
7  <title>Ganeti: failures and recovery scenarios</title>
8  <style type="text/css">code{white-space: pre;}</style>
9  <link href="data:text/css,%2F%2A%0A%20%20%20%20Buttondown%0A%20%20%20%20A%20Markdown%2FMultiMarkdown%2FPandoc%20HTML%20output%20CSS%20stylesheet%0A%20%20%20%20Author%3A%20Ryan%20Gray%0A%20%20%20%20Date%3A%2015%20Feb%202011%0A%20%20%20%20Revised%3A%2021%20Feb%202012%0A%20%20%20%0A%20%20%20%20General%20style%20is%20clean%2C%20with%20minimal%20re%2Ddefinition%20of%20the%20defaults%20or%20%0A%20%20%20%20overrides%20of%20user%20font%20settings%2E%20The%20body%20text%20and%20header%20styles%20are%20%0A%20%20%20%20left%20alone%20except%20title%2C%20author%20and%20date%20classes%20are%20centered%2E%20A%20Pandoc%20TOC%20%0A%20%20%20%20is%20not%20printed%2C%20URLs%20are%20printed%20after%20hyperlinks%20in%20parentheses%2E%20%0A%20%20%20%20Block%20quotes%20are%20italicized%2E%20Tables%20are%20lightly%20styled%20with%20lines%20above%20%0A%20%20%20%20and%20below%20the%20table%20and%20below%20the%20header%20with%20a%20boldface%20header%2E%20Code%20%0A%20%20%20%20blocks%20are%20line%20wrapped%2E%20%0A%20%0A%20%20%20%20All%20elements%20that%20Pandoc%20and%20MultiMarkdown%20use%20should%20be%20listed%20here%2C%20even%20%0A%20%20%20%20if%20the%20style%20is%20empty%20so%20you%20can%20easily%20add%20styling%20to%20anything%2E%0A%20%20%20%20%0A%20%20%20%20There%20are%20some%20elements%20in%20here%20for%20HTML5%20output%20of%20Pandoc%2C%20but%20I%20have%20not%20%0A%20%20%20%20gotten%20around%20to%20testing%20that%20yet%2E%0A%2A%2F%0A%20%0A%2F%2A%20NOTES%3A%0A%20%0A%20%20%20%20Stuff%20tried%20and%20failed%3A%0A%20%20%20%20%0A%20%20%20%20It%20seems%20that%20specifying%20font%2Dfamily%3Aserif%20in%20Safari%20will%20always%20use%20%0A%20%20%20%20Times%20New%20Roman%20rather%20than%20the%20user%27s%20preferences%20setting%2E%0A%20%20%20%20%0A%20%20%20%20Making%20the%20font%20size%20different%20or%20a%20fixed%20value%20for%20print%20in%20case%20the%20screen%20%0A%20%20%20%20font%20size%20is%20making%20the%20print%20font%20too%20big%3A%20Making%20font%2Dsize%20different%20for%20%0A%20%20%20%20print%20than%20for%20screen%20causes%20horizontal%20lines%20to%20disappear%20in%20math%20when%20using%20%0A%20%20%20%20MathJax%20under%20Safari%2E%0A%2A%2F%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Front%20Matter%20%2D%2D%2D%2D%20%2A%2F%0A%20%0A%2F%2A%20Pandoc%20header%20DIV%2E%20Contains%20%2Etitle%2C%20%2Eauthor%20and%20%2Edate%2E%20Comes%20before%20div%23TOC%2E%20%0A%20%20%20Only%20appears%20if%20one%20of%20those%20three%20are%20in%20the%20document%2E%0A%2A%2F%0A%20%0Adiv%23header%2C%20header%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Put%20border%20on%20bottom%2E%20Separates%20it%20from%20TOC%20or%20body%20that%20comes%20after%20it%2E%20%2A%2F%0A%20%20%20%20border%2Dbottom%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%2Etitle%20%2F%2A%20Pandoc%20title%20header%20%28h1%2Etitle%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20%7D%0A%20%0A%2Eauthor%2C%20%2Edate%20%2F%2A%20Pandoc%20author%28s%29%20and%20date%20headers%20%28h2%2Eauthor%20and%20h3%2Edate%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20Pandoc%20table%20of%20contents%20DIV%20when%20using%20the%20%2D%2Dtoc%20option%2E%0A%20%20%20NOTE%3A%20this%20doesn%27t%20support%20Pandoc%27s%20%2D%2Did%2Dprefix%20option%20for%20%23TOC%20and%20%23header%2E%20%0A%20%20%20Probably%20would%20need%20to%20use%20div%5Bid%24%3D%27TOC%27%5D%20and%20div%5Bid%24%3D%27header%27%5D%20as%20selectors%2E%0A%2A%2F%0A%20%0Adiv%23TOC%2C%20nav%23TOC%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Put%20border%20on%20bottom%20to%20separate%20it%20from%20body%2E%20%2A%2F%0A%20%20%20%20border%2Dbottom%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20div%23TOC%2C%20nav%23TOC%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20Don%27t%20display%20TOC%20in%20print%20%2A%2F%0A%20%20%20%20%20%20%20%20display%3A%20none%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Headers%20and%20sections%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Ah1%2C%20h2%2C%20h3%2C%20h4%2C%20h5%2C%20h6%0A%7B%0A%20%20%20%20font%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20%22Liberation%20Sans%22%2C%20Calibri%2C%20Arial%2C%20sans%2Dserif%3B%20%2F%2A%20Sans%2Dserif%20headers%20%2A%2F%0A%20%0A%20%20%20%20%2F%2A%20font%2Dfamily%3A%20%22Liberation%20Serif%22%2C%20%22Georgia%22%2C%20%22Times%20New%20Roman%22%2C%20serif%3B%20%2F%2A%20Serif%20headers%20%2A%2F%0A%20%0A%20%20%20%20page%2Dbreak%2Dafter%3A%20avoid%3B%20%2F%2A%20Firefox%2C%20Chrome%2C%20and%20Safari%20do%20not%20support%20the%20property%20value%20%22avoid%22%20%2A%2F%0A%7D%0A%20%0A%2F%2A%20Pandoc%20with%20%2D%2Dsection%2Ddivs%20option%20%2A%2F%0A%20%0Adiv%20div%2C%20section%20section%20%2F%2A%20Nested%20sections%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dleft%3A%202em%3B%20%2F%2A%20This%20will%20increasingly%20indent%20nested%20header%20sections%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Ap%20%7B%7D%0A%20%0Ablockquote%0A%20%20%20%20%7B%20%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20%7D%0A%20%0Ali%20%2F%2A%20All%20list%20items%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ali%20%3E%20p%20%2F%2A%20Loosely%20spaced%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dtop%3A%201em%3B%20%2F%2A%20IE%3A%20lack%20of%20space%20above%20a%20%3Cli%3E%20when%20the%20item%20is%20inside%20a%20%3Cp%3E%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Aul%20%2F%2A%20Whole%20unordered%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aul%20li%20%2F%2A%20Unordered%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aol%20%2F%2A%20Whole%20ordered%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aol%20li%20%2F%2A%20Ordered%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ahr%20%7B%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Some%20span%20elements%20%2D%2D%2D%20%2A%2F%0A%20%0Asub%20%2F%2A%20Subscripts%2E%20Pandoc%3A%20H%7E2%7EO%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Asup%20%2F%2A%20Superscripts%2E%20Pandoc%3A%20The%202%5End%5E%20try%2E%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aem%20%2F%2A%20Emphasis%2E%20Markdown%3A%20%2Aemphasis%2A%20or%20%5Femphasis%5F%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aem%20%3E%20em%20%2F%2A%20Emphasis%20within%20emphasis%3A%20%2AThis%20is%20all%20%2Aemphasized%2A%20except%20that%2A%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dstyle%3A%20normal%3B%0A%20%20%20%20%7D%0A%20%0Astrong%20%2F%2A%20Markdown%20%2A%2Astrong%2A%2A%20or%20%5F%5Fstrong%5F%5F%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Links%20%28anchors%29%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aa%20%2F%2A%20All%20links%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Keep%20links%20clean%2E%20On%20screen%2C%20they%20are%20colored%3B%20in%20print%2C%20they%20do%20nothing%20anyway%2E%20%2A%2F%0A%20%20%20%20text%2Ddecoration%3A%20none%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20screen%0A%20%20%20%20%7B%0A%20%20%20%20a%3Ahover%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20On%20hover%2C%20we%20indicate%20a%20bit%20more%20that%20it%20is%20a%20link%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20text%2Ddecoration%3A%20underline%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20a%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20In%20print%2C%20a%20colored%20link%20is%20useless%2C%20so%20un%2Dstyle%20it%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20color%3A%20black%3B%0A%20%20%20%20%20%20%20%20background%3A%20transparent%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%0A%20%20%20%20a%5Bhref%5E%3D%22http%3A%2F%2F%22%5D%3Aafter%2C%20a%5Bhref%5E%3D%22https%3A%2F%2F%22%5D%3Aafter%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20However%2C%20links%20that%20go%20somewhere%20else%2C%20might%20be%20useful%20to%20the%20reader%2C%0A%20%20%20%20%20%20%20%20%20%20%20so%20for%20http%20and%20https%20links%2C%20print%20the%20URL%20after%20what%20was%20the%20link%20%0A%20%20%20%20%20%20%20%20%20%20%20text%20in%20parens%0A%20%20%20%20%20%20%20%20%2A%2F%0A%20%20%20%20%20%20%20%20content%3A%20%22%20%28%22%20attr%28href%29%20%22%29%20%22%3B%0A%20%20%20%20%20%20%20%20font%2Dsize%3A%2090%25%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Images%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aimg%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Let%20it%20be%20inline%20left%2Fright%20where%20it%20wants%20to%20be%2C%20but%20verticality%20make%20%0A%20%20%20%20%20%20%20it%20in%20the%20middle%20to%20look%20nicer%2C%20but%20opinions%20differ%2C%20and%20if%20in%20a%20multi%2Dline%20%0A%20%20%20%20%20%20%20paragraph%2C%20it%20might%20not%20be%20so%20great%2E%20%0A%20%20%20%20%2A%2F%0A%20%20%20%20vertical%2Dalign%3A%20middle%3B%0A%20%20%20%20%7D%0A%20%0Adiv%2Efigure%20%2F%2A%20Pandoc%20figure%2Dstyle%20image%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Center%20the%20image%20and%20caption%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20auto%3B%0A%20%20%20%20margin%2Dright%3A%20auto%3B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20%7D%0A%20%0Ap%2Ecaption%20%2F%2A%20Pandoc%20figure%2Dstyle%20caption%20within%20div%2Efigure%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Inherits%20div%2Efigure%20props%20by%20default%20%2A%2F%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Code%20blocks%20and%20spans%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Apre%2C%20code%20%0A%20%20%20%20%7B%0A%20%20%20%20background%2Dcolor%3A%20%23fdf7ee%3B%0A%20%20%20%20%2F%2A%20BEGIN%20word%20wrap%20%2A%2F%0A%20%20%20%20%2F%2A%20Need%20all%20the%20following%20to%20word%20wrap%20instead%20of%20scroll%20box%20%2A%2F%0A%20%20%20%20%2F%2A%20This%20will%20override%20the%20overflow%3Aauto%20if%20present%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20pre%2Dwrap%3B%20%2F%2A%20css%2D3%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Dmoz%2Dpre%2Dwrap%20%21important%3B%20%2F%2A%20Mozilla%2C%20since%201999%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Dpre%2Dwrap%3B%20%2F%2A%20Opera%204%2D6%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Do%2Dpre%2Dwrap%3B%20%2F%2A%20Opera%207%20%2A%2F%0A%20%20%20%20word%2Dwrap%3A%20break%2Dword%3B%20%2F%2A%20Internet%20Explorer%205%2E5%2B%20%2A%2F%0A%20%20%20%20%2F%2A%20END%20word%20wrap%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Apre%20%2F%2A%20Code%20blocks%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Distinguish%20pre%20blocks%20from%20other%20text%20by%20more%20than%20the%20font%20with%20a%20background%20tint%2E%20%2A%2F%0A%20%20%20%20padding%3A%200%2E5em%3B%20%2F%2A%20Since%20we%20have%20a%20background%20color%20%2A%2F%0A%20%20%20%20border%2Dradius%3A%205px%3B%20%2F%2A%20Softens%20it%20%2A%2F%0A%20%20%20%20%2F%2A%20Give%20it%20a%20some%20definition%20%2A%2F%0A%20%20%20%20border%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20%2F%2A%20Set%20it%20off%20left%20and%20right%2C%20seems%20to%20look%20a%20bit%20nicer%20when%20we%20have%20a%20background%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20%200%2E5em%3B%0A%20%20%20%20margin%2Dright%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20screen%0A%20%20%20%20%7B%0A%20%20%20%20pre%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20On%20screen%2C%20use%20an%20auto%20scroll%20box%20for%20long%20lines%2C%20unless%20word%2Dwrap%20is%20enabled%20%2A%2F%0A%20%20%20%20%20%20%20%20white%2Dspace%3A%20pre%3B%0A%20%20%20%20%20%20%20%20overflow%3A%20auto%3B%0A%20%20%20%20%20%20%20%20%2F%2A%20Dotted%20looks%20better%20on%20screen%20and%20solid%20seems%20to%20print%20better%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20border%3A%201px%20dotted%20%23777%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0Acode%20%2F%2A%20All%20inline%20code%20spans%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%2F%2A%20Code%20spans%20in%20paragraphs%20and%20tight%20lists%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Pad%20a%20little%20from%20adjacent%20text%20%2A%2F%0A%20%20%20%20padding%2Dleft%3A%20%202px%3B%0A%20%20%20%20padding%2Dright%3A%202px%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Ali%20%3E%20p%20code%20%2F%2A%20Code%20span%20in%20a%20loose%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20We%20have%20room%20for%20some%20more%20background%20color%20above%20and%20below%20%2A%2F%0A%20%20%20%20padding%3A%202px%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Math%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aspan%2Emath%20%2F%2A%20Pandoc%20inline%20math%20default%20and%20%2D%2Djsmath%20inline%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Tried%20font%2Dstyle%3Aitalic%20here%2C%20and%20it%20messed%20up%20MathJax%20rendering%20in%20some%20browsers%2E%20Maybe%20don%27t%20mess%20with%20at%20all%2E%20%2A%2F%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Emath%20%2F%2A%20Pandoc%20%2D%2Djsmath%20display%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aspan%2ELaTeX%20%2F%2A%20Pandoc%20%2D%2Dlatexmathml%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0Aeq%20%2F%2A%20Pandoc%20%2D%2Dgladtex%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Tables%20%2D%2D%2D%2D%20%2A%2F%0A%20%0A%2F%2A%20%20A%20clean%20textbook%2Dlike%20style%20with%20horizontal%20lines%20above%20and%20below%20and%20under%20%0A%20%20%20%20the%20header%2E%20Rows%20highlight%20on%20hover%20to%20help%20scanning%20the%20table%20on%20screen%2E%0A%2A%2F%0A%20%0Atable%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dcollapse%3A%20collapse%3B%0A%20%20%20%20border%2Dspacing%3A%200%3B%20%2F%2A%20IE%206%20%2A%2F%0A%20%0A%20%20%20%20border%2Dbottom%3A%202pt%20solid%20%23000%3B%0A%20%20%20%20border%2Dtop%3A%202pt%20solid%20%23000%3B%20%2F%2A%20The%20caption%20on%20top%20will%20not%20have%20a%20bottom%2Dborder%20%2A%2F%0A%20%0A%20%20%20%20%2F%2A%20Center%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20auto%3B%0A%20%20%20%20margin%2Dright%3A%20auto%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Athead%20%2F%2A%20Entire%20table%20header%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dbottom%3A%201pt%20solid%20%23000%3B%0A%20%20%20%20background%2Dcolor%3A%20%23eee%3B%20%2F%2A%20Does%20this%20BG%20print%20well%3F%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Atr%2Eheader%20%2F%2A%20Each%20header%20row%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0Atbody%20%2F%2A%20Entire%20table%20%20body%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20Table%20body%20rows%20%2A%2F%0A%20%0Atr%20%20%7B%0A%20%20%20%20%7D%0Atr%2Eodd%3Ahover%2C%20tr%2Eeven%3Ahover%20%2F%2A%20Use%20%2Eodd%20and%20%2Eeven%20classes%20to%20avoid%20styling%20rows%20in%20other%20tables%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20background%2Dcolor%3A%20%23eee%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20Odd%20and%20even%20rows%20%2A%2F%0Atr%2Eodd%20%7B%7D%0Atr%2Eeven%20%7B%7D%0A%20%0Atd%2C%20th%20%2F%2A%20Table%20cells%20and%20table%20header%20cells%20%2A%2F%0A%20%20%20%20%7B%20%0A%20%20%20%20vertical%2Dalign%3A%20top%3B%20%2F%2A%20Word%20%2A%2F%0A%20%20%20%20vertical%2Dalign%3A%20baseline%3B%20%2F%2A%20Others%20%2A%2F%0A%20%20%20%20padding%2Dleft%3A%20%20%200%2E5em%3B%0A%20%20%20%20padding%2Dright%3A%20%200%2E5em%3B%0A%20%20%20%20padding%2Dtop%3A%20%20%20%200%2E2em%3B%0A%20%20%20%20padding%2Dbottom%3A%200%2E2em%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20Removes%20padding%20on%20left%20and%20right%20of%20table%20for%20a%20tight%20look%2E%20Good%20if%20thead%20has%20no%20background%20color%2A%2F%0A%2F%2A%0Atr%20td%3Alast%2Dchild%2C%20tr%20th%3Alast%2Dchild%0A%20%20%20%20%7B%0A%20%20%20%20padding%2Dright%3A%200%3B%0A%20%20%20%20%7D%0Atr%20td%3Afirst%2Dchild%2C%20tr%20th%3Afirst%2Dchild%20%0A%20%20%20%20%7B%0A%20%20%20%20padding%2Dleft%3A%200%3B%0A%20%20%20%20%7D%0A%2A%2F%0A%20%0Ath%20%2F%2A%20Table%20header%20cells%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%20%0A%20%20%20%20%7D%0A%20%0Atfoot%20%2F%2A%20Table%20footer%20%28what%20appears%20here%20if%20caption%20is%20on%20top%3F%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Acaption%20%2F%2A%20This%20is%20for%20a%20table%20caption%20tag%2C%20not%20the%20p%2Ecaption%20Pandoc%20uses%20in%20a%20div%2Efigure%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20caption%2Dside%3A%20top%3B%0A%20%20%20%20border%3A%20none%3B%0A%20%20%20%20font%2Dsize%3A%200%2E9em%3B%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E3em%3B%20%2F%2A%20Good%20for%20when%20on%20top%20%2A%2F%0A%20%20%20%20padding%2Dbottom%3A%200%2E2em%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Definition%20lists%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Adl%20%2F%2A%20The%20whole%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%202pt%20solid%20black%3B%0A%20%20%20%20padding%2Dtop%3A%200%2E5em%3B%0A%20%20%20%20border%2Dbottom%3A%202pt%20solid%20black%3B%0A%20%20%20%20%7D%0A%20%0Adt%20%2F%2A%20Definition%20term%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%0A%20%20%20%20%7D%0A%20%0Add%2Bdt%20%2F%2A%202nd%20or%20greater%20term%20in%20the%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%201pt%20solid%20black%3B%0A%20%20%20%20padding%2Dtop%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Add%20%2F%2A%20A%20definition%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0Add%2Bdd%20%2F%2A%202nd%20or%20greater%20definition%20of%20a%20term%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%201px%20solid%20black%3B%20%2F%2A%20To%20separate%20multiple%20definitions%20%2A%2F%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20%2D%2D%2D%2D%20Footnotes%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aa%2Efootnote%2C%20a%2EfootnoteRef%20%7B%20%2F%2A%20Pandoc%2C%20MultiMarkdown%20footnote%20links%20%2A%2F%0A%20%20%20%20font%2Dsize%3A%20small%3B%20%0A%20%20%20%20vertical%2Dalign%3A%20text%2Dtop%3B%0A%7D%0A%20%0Aa%5Bhref%5E%3D%22%23fnref%22%5D%2C%20a%2Ereversefootnote%20%2F%2A%20Pandoc%2C%20MultiMarkdown%2C%20%3F%3F%20footnote%20back%20links%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20a%5Bhref%5E%3D%22%23fnref%22%5D%2C%20a%2Ereversefootnote%20%2F%2A%20Pandoc%2C%20MultiMarkdown%20%2A%2F%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20Don%27t%20display%20these%20at%20all%20in%20print%20since%20the%20arrow%20is%20only%20something%20to%20click%20on%20%2A%2F%0A%20%20%20%20%20%20%20%20display%3A%20none%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Efootnotes%20%2F%2A%20Pandoc%20footnotes%20div%20at%20end%20of%20the%20document%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Efootnotes%20li%5Bid%5E%3D%22fn%22%5D%20%2F%2A%20A%20footnote%20item%20within%20that%20div%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20You%20can%20class%20stuff%20as%20%22noprint%22%20to%20not%20print%2E%20%0A%20%20%20Useful%20since%20you%20can%27t%20set%20this%20media%20conditional%20inside%20an%20HTML%20element%27s%20%0A%20%20%20style%20attribute%20%28I%20think%29%2C%20and%20you%20don%27t%20want%20to%20make%20another%20stylesheet%20that%20%0A%20%20%20imports%20this%20one%20and%20adds%20a%20class%20just%20to%20do%20this%2E%0A%2A%2F%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20%2Enoprint%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20display%3Anone%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A" rel="stylesheet" type="text/css" />
10</head>
11<body>
12<div id="header">
13<h1 class="title">Ganeti: failures and recovery scenarios</h1>
14</div>
15<div id="TOC">
16<ul>
17<li><a href="#initial-setup"><span class="toc-section-number">1</span> Initial setup</a></li>
18<li><a href="#scenario-planned-node-maintenance"><span class="toc-section-number">2</span> Scenario: Planned Node Maintenance</a><ul>
19<li><a href="#step-0-mark-nodeb-as-drained"><span class="toc-section-number">2.1</span> Step 0: Mark nodeB as drained</a></li>
20<li><a href="#step-1-migrate-primary-instances-away-from-nodeb"><span class="toc-section-number">2.2</span> Step 1: Migrate primary instances away from nodeB</a></li>
21<li><a href="#step-2-move-secondary-instances-disks-on-nodeb-to-another-node"><span class="toc-section-number">2.3</span> Step 2: Move secondary instances' disks on nodeB to another node</a></li>
22<li><a href="#step-3-move-plain-instances-away-from-nodeb-to-another-node."><span class="toc-section-number">2.4</span> Step 3: Move plain instances away from nodeB to another node.</a></li>
23</ul></li>
24<li><a href="#scenario-loss-of-a-slave-node"><span class="toc-section-number">3</span> Scenario: Loss of a Slave Node</a><ul>
25<li><a href="#initial-state"><span class="toc-section-number">3.1</span> Initial state</a><ul>
26<li><a href="#instance-recovery"><span class="toc-section-number">3.1.1</span> Instance recovery</a></li>
27<li><a href="#re-adding-the-failed-node"><span class="toc-section-number">3.1.2</span> Re-adding the failed node</a></li>
28</ul></li>
29<li><a href="#completely-removing-nodeb-from-the-cluster"><span class="toc-section-number">3.2</span> Completely removing nodeB from the cluster</a></li>
30</ul></li>
31<li><a href="#scenario-planned-master-failover-node-maintenance"><span class="toc-section-number">4</span> Scenario: Planned master failover (node maintenance)</a></li>
32<li><a href="#scenario-loss-of-master-node"><span class="toc-section-number">5</span> Scenario: Loss of Master Node</a><ul>
33<li><a href="#promoting-slave"><span class="toc-section-number">5.1</span> Promoting slave</a></li>
34</ul></li>
35<li><a href="#additional-commands"><span class="toc-section-number">6</span> Additional commands</a><ul>
36<li><a href="#moving-all-instances-away-from-a-node"><span class="toc-section-number">6.1</span> Moving all instances away from a node</a></li>
37<li><a href="#making-a-node-online-after-it-has-been-marked-as-offline"><span class="toc-section-number">6.2</span> Making a node online after it has been marked as offline</a></li>
38</ul></li>
39</ul>
40</div>
41<p>We are going to simulate a number of failure situations, and recover from them.</p>
42<p>Try and replicate the scenarios on your cluster.</p>
43<p><em>In this entire exercise you are working at the cluster level, not the individual nodes</em>. This means you will need to sit together as a group and work together.</p>
44<h1 id="initial-setup"><a href="#initial-setup"><span class="header-section-number">1</span> Initial setup</a></h1>
45<ul>
46<li>Cluster with 3 or more Nodes</li>
47<li>Master is up (nodeA)</li>
48<li>Slaves are up (nodeB, nodeC, etc.)</li>
49<li>DRBD instance &quot;debianX&quot; (dX) is primary on nodeA, secondary on nodeB</li>
50<li>DRBD instance &quot;debianZ&quot; (dY) is primary on nodeB, secondary on nodeC</li>
51<li>Plain instance &quot;debianY&quot; (dZ) is on nodeB</li>
52</ul>
53<pre><code>        node A                      node B                    node C
54       (master)
55  +-----------------+        +-----------------+        +-----------------+     
56  |                 |        |                 |        |                 |     
57  | +====+          |  drbd  | ......   +====+ |  drbd  | ......          |     
58  | | dX |.....................: dX :   | dY |............: dY :          |     
59  | +====+          |        | :....:   +====+ |        | :....:          |
60  |                 |        |                 |        |                 |     
61  |                 |        |        +----+   |        |                 |
62  |                 |        |  plain | dZ |   |        |                 |     
63  |                 |        |        +----+   |        |                 |     
64  +--------+--------+        +-------+---------+        +-------+---------+     
65           |                         |                          |             
66-----------+-------------------------+--------------------------+-----------</code></pre>
67<p>The command <code>gnt-instance list -o name,pnode,snodes,status</code> is useful to see which instances you have running where.</p>
68<p>Choose three of your existing instances to be dX, dY and dZ and if necessary move them around to look like the diagram. Commands you may need include:</p>
69<ul>
70<li><code>gnt-instance migrate</code></li>
71<li><code>gnt-instance replace-disks</code></li>
72<li><code>gnt-instance move</code></li>
73</ul>
74<h1 id="scenario-planned-node-maintenance"><a href="#scenario-planned-node-maintenance"><span class="header-section-number">2</span> Scenario: Planned Node Maintenance</a></h1>
75<p>Let's imagine that we want to take down nodeB for maintenance: more RAM, a disk replacement, etc.</p>
76<p>You have probably many instances running on your cluster by now.</p>
77<ul>
78<li><p>We need to make sure nodeB is not hosting any instances, primary, secondary or plain</p></li>
79<li><p>We can use nodeC and nodeA to move instances away from nodeB</p></li>
80</ul>
81<p>Here's the process:</p>
82<ol start="0" style="list-style-type: decimal">
83<li><p>Mark the node as &quot;drained&quot; to prevent new instances being created on it.</p></li>
84<li><p>DRBD instances for which nodeB is primary will need to migrate to their secondary, leaving nodeB to only be secondary for any instances</p></li>
85<li><p>We need to move the disks of secondary DRBD instances from nodeB to another node.</p></li>
86</ol>
87<p>(if A is primary for debianX, we move its secondary disks from B to C)</p>
88<ol start="3" style="list-style-type: decimal">
89<li>Plain instances running on nodeB will need to be moved to another node (A or C)</li>
90</ol>
91<p>Below are the commands we'll be using for each of the steps above.</p>
92<h2 id="step-0-mark-nodeb-as-drained"><a href="#step-0-mark-nodeb-as-drained"><span class="header-section-number">2.1</span> Step 0: Mark nodeB as drained</a></h2>
93<p>command: <code>gnt-node modify --drained=yes nodeB</code></p>
94<p>check using: <code>gnt-node list -o name,drained</code></p>
95<h2 id="step-1-migrate-primary-instances-away-from-nodeb"><a href="#step-1-migrate-primary-instances-away-from-nodeb"><span class="header-section-number">2.2</span> Step 1: Migrate primary instances away from nodeB</a></h2>
96<p>command: <code>gnt-instance migrate</code></p>
97<p>We've used this command before - we have to make sure that if nodeB is primary for any instances, we migrate them to the secondary node.</p>
98<p>In the example above, nodeB is primary for <code>dY</code>. Let's migrate it over to nodeC.</p>
99<pre><code># gnt-instance migrate dY</code></pre>
100<p>After this is done, we are now in the following situation: nodeB is only running the plain instance <code>dZ</code>.</p>
101<pre><code>        node A                      node B                    node C
102       (master)
103  +-----------------+        +-----------------+        +-----------------+     
104  |                 |        |                 |        |                 |     
105  | +====+          |  drbd  | ......   ...... |  drbd  | +====+          |     
106  | | dX |.....................: dX :   : dY :............| dY |          |     
107  | +====+          |        | :....:   :....: |        | +====+          |
108  |                 |        |                 |        |                 |     
109  |                 |        |        +----+   |        |                 |
110  |                 |        |  plain | dZ |   |        |                 |     
111  |                 |        |        +----+   |        |                 |     
112  +--------+--------+        +-------+---------+        +-------+---------+     
113           |                         |                          |             
114-----------+-------------------------+--------------------------+-----------</code></pre>
115<h2 id="step-2-move-secondary-instances-disks-on-nodeb-to-another-node"><a href="#step-2-move-secondary-instances-disks-on-nodeb-to-another-node"><span class="header-section-number">2.3</span> Step 2: Move secondary instances' disks on nodeB to another node</a></h2>
116<p>command: <code>gnt-instance replace-disks</code></p>
117<pre><code># gnt-instance replace-disks -n nodeC debianX</code></pre>
118<p>If you prefer, you can let ganeti's instance allocator choose the new secondary node for you using its instance allocator (dot means use the default instance allocator)</p>
119<pre><code># gnt-instance replace-disks -I . debianX</code></pre>
120<p>Repeat for debianY of course.</p>
121<h2 id="step-3-move-plain-instances-away-from-nodeb-to-another-node."><a href="#step-3-move-plain-instances-away-from-nodeb-to-another-node."><span class="header-section-number">2.4</span> Step 3: Move plain instances away from nodeB to another node.</a></h2>
122<p>command: <code>gnt-instance move</code></p>
123<p>Note that this will require shutting down the instance, as its disk(s) will first have to be copied to node C before it can be restarted there.</p>
124<pre><code># gnt-instance move -n nodeC debianY
125
126Instance debianY will be moved. This requires a shutdown of the instance.
127Continue?
128y/[n]/?: y
129Fri Sep 19 14:31:44 2014  - INFO: Shutting down instance debianY on source node nodeB
130Fri Sep 19 14:32:01 2014 disk/0 sent 450M, 77.2 MiB/s, 21%, ETA 21s
131Fri Sep 19 14:32:37 2014 disk/0 finished receiving data
132Fri Sep 19 14:32:37 2014 disk/0 finished sending data
133Fri Sep 19 14:32:37 2014  - INFO: Removing the disks on the original node
134Fri Sep 19 14:32:38 2014  - INFO: Starting instance debianY on node nodeC</code></pre>
135<p>The nodeB is ready to be shut down. <em>Don't do this!</em></p>
136<p>Instead, let's imagine our maintenance is over and nodeB is ready for use again. Remove the &quot;drained&quot; flag to make it able to accept instances again.</p>
137<pre><code>gnt-node modify --drained=no nodeB</code></pre>
138<h1 id="scenario-loss-of-a-slave-node"><a href="#scenario-loss-of-a-slave-node"><span class="header-section-number">3</span> Scenario: Loss of a Slave Node</a></h1>
139<h2 id="initial-state"><a href="#initial-state"><span class="header-section-number">3.1</span> Initial state</a></h2>
140<ul>
141<li>Now arrange that <code>debianX</code> (or what the name of the DRBD VM you are using is) is running (primary) on nodeB, and debianY is secondary on nodeB, so that it looks like this:</li>
142</ul>
143<pre><code># gnt-instance list -o name,pnode,snodes,status</code></pre>
144<pre><code>Instance Primary_node           Secondary_Nodes        Status
145debianX  nodeB.virt.nsrc.org    nodeA.virt.nsrc.org    running
146debianY  nodeC.virt.nsrc.org    nodeB.virt.nsrc.org    running
147debianZ  nodeC.virt.nsrc.org                           running</code></pre>
148<blockquote>
149<p>Work out for yourself what commands are necessary to do this. Ask for help if you need it.</p>
150</blockquote>
151<ul>
152<li>Shut down (halt) nodeB, the (<em>make sure you run this on nodeB, the primary node for instance debianX!</em>)</li>
153</ul>
154<pre><code># halt -p</code></pre>
155<ul>
156<li>The VM goes down as a result (confirm this using ping / console)</li>
157</ul>
158<pre><code># gnt-instance list -o name,pnode,snodes,status
159
160Instance Primary_node           Secondary_Nodes        Status
161debianX  nodeB.virt.nsrc.org    nodeA.virt.nsrc.org    ERROR_nodedown
162debianY  nodeC.virt.nsrc.org    nodeB.virt.nsrc.org    running
163debianZ  nodeC.virt.nsrc.org                           running</code></pre>
164<ul>
165<li><p>Run <code>gnt-cluster verify</code> (will take a while), and look at the output.</p></li>
166<li><p>Run <code>gnt-node list</code>, and look at the output, too.</p></li>
167</ul>
168<p>As you notice, things are quite slow. This is because Ganeti is trying to contact the <code>gnt-noded</code> daemon on nodeB, and it's timing out.</p>
169<p>If this were a production environment, we'd have to examine nodeB, and determine whether nodeB was likely to come back online soon. If not, say, because of some hardware failure, we would decide to take the node &quot;offline&quot;, so Ganeti would stop trying to talk to it.</p>
170<p>Let's start by marking nodeB as offline:</p>
171<pre><code># gnt-node modify --offline=yes nodeB.virt.nsrc.org
172
173Modified node nodeB.virt.nsrc.org
174 - master_candidate -&gt; False
175 - offline -&gt; True</code></pre>
176<p>It will take a little while, but now most commands will run faster as Ganeti stops trying to contact the other nodes in the cluster.</p>
177<p>Try running <code>gnt-instance list</code> and <code>gnt-node list</code> again.</p>
178<p>Also re-run <code>gnt-cluster verify</code></p>
179<h3 id="instance-recovery"><a href="#instance-recovery"><span class="header-section-number">3.1.1</span> Instance recovery</a></h3>
180<ul>
181<li>We cannot live-migrate the node (nodeB is down), so we need to <em>failover</em></li>
182</ul>
183<p>If you attempt to migrate, you will be told:</p>
184<pre><code># gnt-instance migrate debianX
185
186Failure: prerequisites not met for this operation:
187error type: wrong_state, error details:
188Can't migrate, please use failover: Node is marked offline</code></pre>
189<ul>
190<li>Attempt failover</li>
191</ul>
192<pre><code># gnt-instance failover debianX</code></pre>
193<p>Hopefully you will see messages ending with:</p>
194<pre><code>...
195Sat Jan 18 15:58:11 2014 * activating the instance's disks on target node nodeA.virt.nsrc.org
196Sat Jan 18 15:58:11 2014  - WARNING: Could not prepare block device disk/0 on node nodeB.virt.nsrc.org (is_primary=False, pass=1): Node is marked offline
197Sat Jan 18 15:58:11 2014 * starting the instance on the target node nodeA.virt.nsrc.org</code></pre>
198<p>If so, skip to the section &quot;Confirm that the VM is now up on nodeA&quot;</p>
199<p>If you see this message:</p>
200<pre><code>Sat Jan 18 20:57:55 2014 Failover instance debianX
201Sat Jan 18 20:57:55 2014 * checking disk consistency between source and target
202Failure: command execution error:
203Disk 0 is degraded on target node, aborting failover</code></pre>
204<p>... you will need to <em>force</em> the operation. This should normally not happen when the node is marked offline. However, if you do get the message:</p>
205<ul>
206<li>Read man page on <code>gnt-instance</code>, find the section about <code>failover</code>:</li>
207</ul>
208<blockquote>
209<p>If you are trying to migrate instances off a dead node, this will fail. Use the --ignore-consistency option for this purpose. Note that this option can be dangerous as errors in shutting down the instance will be ignored, resulting in possibly having the instance running on two machines in parallel (on disconnected DRBD drives).</p>
210</blockquote>
211<ul>
212<li><p>We know that nodeB is down - we halted it ourselves! In a real world scenario, you MUST verify that nodeB really is down. Otherwise you risk ending up with two running instances of <code>VM</code> (if someone force starts it) and you will need to force a resolution.</p></li>
213<li><p>Re-run <code>gnt-instance failover</code> with the '--ignore-consistency' flag. We are in a situation that requires this (nodeB down)</p></li>
214</ul>
215<pre><code># gnt-instance failover --ignore-consistency debianX</code></pre>
216<p>There will be much more output this time, pay attention in particular if you see some warnings - these are normal since the nodeB node is down, but we did it mark it as offline.</p>
217<pre><code>Sat Jan 18 21:03:15 2014 Failover instance debianX
218Sat Jan 18 21:03:15 2014 * checking disk consistency between source and target
219
220[ ... messages ... ]
221
222Sat Jan 18 21:03:27 2014 * activating the instance's disks on target node nodeA.virt.nsrc.org
223
224[ ... messages ... ]
225
226Sat Jan 18 21:03:33 2014 * starting the instance on the target node nodeA.virt.nsrc.org</code></pre>
227<ul>
228<li>Confirm that the VM is now up on nodeA:</li>
229</ul>
230<pre><code># gnt-instance list -o name,pnode,snodes,status
231
232Instance Primary_node           Secondary_Nodes        Status
233debianX  nodeA.virt.nsrc.org    nodeB.virt.nsrc.org    running
234debianY  nodeC.virt.nsrc.org    nodeB.virt.nsrc.org    running
235debianZ  nodeC.virt.nsrc.org                           running</code></pre>
236<h3 id="re-adding-the-failed-node"><a href="#re-adding-the-failed-node"><span class="header-section-number">3.1.2</span> Re-adding the failed node</a></h3>
237<p>Ok, let's say nodeB has been fixed.</p>
238<ul>
239<li><p>Restart nodeB. (Depending on the class setup, you may need to ask the instructor to do this for you).</p></li>
240<li><p>Make sure you can ping it and can log in to it</p></li>
241</ul>
242<p>We need to re-add it to the cluster. We do this using the <code>gnt-node add --readd</code> command on the cluster master node.</p>
243<p>From the <code>gnt-node</code> man page:</p>
244<blockquote>
245<p>In case you're readding a node after hardware failure, you can use the --readd parameter. In this case, you don't need to pass the secondary IP again, it will reused from the cluster. Also, the drained and offline flags of the node will be cleared before re-adding it.</p>
246</blockquote>
247<pre><code># gnt-node add --readd nodeB.virt.nsrc.org
248
249[ ... question about SSH ...]
250
251Sat Jan 18 22:09:43 2014  - INFO: Readding a node, the offline/drained flags were reset
252Sat Jan 18 22:09:43 2014  - INFO: Node will be a master candidate</code></pre>
253<p>We're good! It could take a while to re-sync the DRBD data if a lot of disk activity (writing) has taken place on <code>debianX</code>, but this will happen in the background.</p>
254<p>Inspect the node list:</p>
255<pre><code># gnt-node list</code></pre>
256<p>Check the cluster configuration.</p>
257<pre><code># gnt-cluster verify</code></pre>
258<p>Probably the DRBD instances on nodeB have not yet been activated by the master daemon. As a result you may see some errors about your instance's disk beging degraded, similar to this:</p>
259<pre><code>Thu Sep 18 18:52:41 2014 * Verifying node status
260Thu Sep 18 18:52:41 2014   - ERROR: node nodeB: drbd minor 0 of instance debianX is not active
261Thu Sep 18 18:52:41 2014 * Verifying instance status
262Thu Sep 18 18:52:41 2014   - ERROR: instance debianX: disk/0 on nodeA is degraded
263Thu Sep 18 18:52:41 2014   - ERROR: instance debianX: couldn't retrieve status for disk/0 on nodeB: Can't find device &lt;DRBD8(hosts=03add4b7-d6d9-40d0-bf6e-74d1683aad49/0-93eef5d9-6b33-4c</code></pre>
264<p>Don't panic! This is normal, as it's possible the disks haven't been re-synchronized yet.</p>
265<p>If so, you can use the command <code>gnt-cluster verify-disks</code> to fix this:</p>
266<pre><code># gnt-cluster verify-disks
267
268Submitted jobs 78
269Waiting for job 78 ...
270Activating disks for instance 'debianX'</code></pre>
271<p>Wait a few seconds, then run:</p>
272<pre><code># gnt-cluster verify</code></pre>
273<p>When all is OK, let's try and migrate debianX back to nodeB:</p>
274<pre><code># gnt-instance migrate debianX</code></pre>
275<p>Test that the migration has worked.</p>
276<h2 id="completely-removing-nodeb-from-the-cluster"><a href="#completely-removing-nodeb-from-the-cluster"><span class="header-section-number">3.2</span> Completely removing nodeB from the cluster</a></h2>
277<p>Let's now imagine that the failure of nodeB wasn't temporary: we imagine that it cannot be fixed, and won't be back online for a while (it needs to be completely replaced). We could decide to remove nodeB from the cluster.</p>
278<p>To do this:</p>
279<ul>
280<li>If nodeB has been restarted, let's shut it down (to simulate a failure)</li>
281</ul>
282<p>Note: RUN THIS ON nodeB !!!</p>
283<pre><code># halt -p</code></pre>
284<ul>
285<li>On the master:</li>
286</ul>
287<p>Mark nodeB as offline:</p>
288<pre><code># gnt-node modify --offline=yes nodeB.virt.nsrc.org</code></pre>
289<p>run <code>gnt-cluster verify</code>, and look at the output.</p>
290<pre><code>Sat Jan 18 21:31:56 2014   - NOTICE: 1 offline node(s) found.</code></pre>
291<ul>
292<li><p>We marked nodeB as down - let's assume nodeB will be down for a while while it's being fixed.</p></li>
293<li><p>We decide to remove nodeB from the cluster:</p></li>
294</ul>
295<pre><code># gnt-node remove nodeB.virt.nsrc.org
296
297Failure: prerequisites not met for this operation:
298error type: wrong_input, error details:
299Instance debianX is still running on the node, please remove first</code></pre>
300<p>Ok, we are not allowed to remove the nodeB, because Ganeti can see that we still have an instance (debianX) associated with nodeB.</p>
301<p>This is different from simply marking the node offline, as it means we are permanently getting rid of nodeB, and we need to take a decision about what to do for DRBD instances that were associated with nodeB.</p>
302<!-- XXX: gnt-node failover goes here -->
303
304<pre><code># gnt-instance failover debianX
305
306Failover will happen to image debianX. This requires a shutdown of
307the instance. Continue?
308y/[n]/?: y
309Thu Sep 18 20:29:32 2014 Failover instance debianX
310Thu Sep 18 20:29:32 2014 * checking disk consistency between source and target
311Thu Sep 18 20:29:32 2014 Node nodeB.virt.nsrc.org is offline, ignoring degraded disk 0 on target node nodeA.virt.nsrc.org
312Thu Sep 18 20:29:32 2014 * shutting down instance on source node
313Thu Sep 18 20:29:32 2014  - WARNING: Could not shutdown instance debianX on node nodeB.virt.nsrc.org, proceeding anyway; please make sure node nodeB.virt.nsrc.org is down; error details: Node is marked offline
314Thu Sep 18 20:29:32 2014 * deactivating the instance's disks on source node
315Thu Sep 18 20:29:33 2014  - WARNING: Could not shutdown block device disk/0 on node nodeB.virt.nsrc.org: Node is marked offline
316Thu Sep 18 20:29:33 2014 * activating the instance's disks on target node nodeA.virt.nsrc.org
317Thu Sep 18 20:29:33 2014  - WARNING: Could not prepare block device disk/0 on node nodeB.virt.nsrc.org (is_primary=False, pass=1): Node is marked offline
318Thu Sep 18 20:29:33 2014 * starting the instance on the target node nodeA.virt.nsrc.org</code></pre>
319<p>Followed by:</p>
320<!-- XXX maybe we need to do a replace-disks instead -->
321
322<pre><code># gnt-node evacuate -s nodeB
323
324Relocate instance(s) debianX from node(s) nodeB?
325y/[n]/?: y
326Thu Sep 18 20:32:37 2014  - INFO: Evacuating instances from node 'nodeB.virt.nsrc.org': debianX
327Thu Sep 18 20:32:37 2014  - INFO: Instances to be moved: debianX (to nodeA.virt.nsrc.org, nodeC.virt.nsrc.org)
328...
329Thu Sep 18 20:32:38 2014 STEP 3/6 Allocate new storage
330Thu Sep 18 20:32:38 2014  - INFO: Adding new local storage on nodeC.virt.nsrc.org for disk/0
331...
332Thu Sep 18 20:32:41 2014 STEP 6/6 Sync devices
333Thu Sep 18 20:32:41 2014  - INFO: Waiting for instance debianX to sync disks
334Thu Sep 18 20:32:41 2014  - INFO: - device disk/0:  1.20% done, 1m 55s remaining (estimated)
335Thu Sep 18 20:33:41 2014  - INFO: Instance debianX's disks are in sync
336All instances evacuated successfully.</code></pre>
337<p>Ok, check out the instance list:</p>
338<pre><code># gnt-instance list -o name,pnode,snodes,status
339
340Instance  Primary_node        Secondary_Nodes      Status
341debianX   nodeA.virt.nsrc.org nodeC.virt.nsrc.org  running
342XXX</code></pre>
343<p>Perfect, nodeB is not used by any instance. We can now re-attempt to remove node nodeB from the cluster:</p>
344<pre><code># gnt-node remove nodeB.virt.nsrc.org</code></pre>
345<p>More WARNINGs! But did it work ?</p>
346<pre><code># gnt-node list
347
348Node                DTotal DFree MTotal MNode MFree Pinst Sinst
349nodeA.virt.nsrc.org  29.1G 12.6G   995M  145M  672M     2     0
350nodeC.virt.nsrc.org  29.0G 12.7G   995M  137M  680M     0     1</code></pre>
351<p>Yes, nodeB is gone.</p>
352<p>Note: Ganeti will modify <code>/etc/hosts</code> on your remaining nodes, and remove the line for nodeB!</p>
353<p>We can restart our debianX instance, by the way! (This may have already happened if you called <code>gnt-instance failover</code>)</p>
354<pre><code># gnt-instance start debianX</code></pre>
355<p>Test that it comes up normally.</p>
356<h1 id="scenario-planned-master-failover-node-maintenance"><a href="#scenario-planned-master-failover-node-maintenance"><span class="header-section-number">4</span> Scenario: Planned master failover (node maintenance)</a></h1>
357<p>Let's imagine that we need to temporarily service the cluster master (in this case, nodeA). It's rather easy. Decide first which of the other nodes will become master.</p>
358<p>Read about <code>master-failover</code>: run <code>man gnt-cluster</code>, find the MASTER-FAILOVER section.</p>
359<p>Then, ON THE NODE YOU PICKED, run this command:</p>
360<pre><code># gnt-cluster master-failover</code></pre>
361<p>If everything goes well, after 5-10 seconds, the node you ran this command on is now the new master.</p>
362<p>Test this! For example, if nodeB is your new master, run these commands on it:</p>
363<p>Verify that the cluster IP is now on this node:</p>
364<pre><code># ifconfig br-lan:0</code></pre>
365<p>Notice that the IP address in br-lan:0 is that of the cluster master.</p>
366<p>This means that next time you log on using SSH using the cluster IP, you will be logged on to nodeB.</p>
367<p>Check which node is the master (this is one of the few commands you can run on any node, not just the master)</p>
368<pre><code># gnt-cluster getmaster
369nodeB.virt.nsrc.org</code></pre>
370<p>All good!</p>
371<h1 id="scenario-loss-of-master-node"><a href="#scenario-loss-of-master-node"><span class="header-section-number">5</span> Scenario: Loss of Master Node</a></h1>
372<p>Let's imagine a slightly more critical scenario: the crash of the master node.</p>
373<p>Let's shut down the master node!</p>
374<p>On nodeB (it's now our master node, remember ?)</p>
375<pre><code># halt -p</code></pre>
376<p>The node is now down. VM still running on other nodes are unaffected, but you are not able to make any changes (stop, start, modify, add VMs, change cluster configuration, etc...)</p>
377<h2 id="promoting-slave"><a href="#promoting-slave"><span class="header-section-number">5.1</span> Promoting slave</a></h2>
378<p>Let's assume that nodeB is not coming back right now, and we need to promote a master.</p>
379<p>You will first need to decide which of the remaining nodes will become the master. Let's pick nodeA.</p>
380<p>To promote the slave:</p>
381<ul>
382<li><p>Log on to the node that will become master (nodeA):</p></li>
383<li><p>Run the following command:</p></li>
384</ul>
385<pre><code># gnt-cluster master-failover</code></pre>
386<p>Note here that you will NOT be asked to confirm the operation!</p>
387<p>If you have 3 or more nodes in the cluster, the operation should be as smooth as in the previous section.</p>
388<p>On the other hand, if you only had 2 nodes in your cluster, you would have to specify <code>--no-voting</code><sup><a href="#fn1" class="footnoteRef" id="fnref1">1</a></sup> as an option. This is because, if one node is down, there is only one node left in the cluster, and no majority election can take place.</p>
389<p>At this point, the chosen node (nodeA) is now master. You can verify this using the <code>gnt-cluster getmaster</code> command.</p>
390<p>From this point, recovering downed machines is similar to what we did in the first scenario. But to be on the safe side:</p>
391<ul>
392<li><p>Restart nodeB, and log in to it as root</p></li>
393<li><p>Try and run <code>gnt-instance list</code></p></li>
394</ul>
395<p>Even though nodeB was down while the promotion of nodeA happened, the <code>ganeti-masterd</code> daemon running on nodeB was informed, on startup, that nodeB was no longer master. The above command should therefore fail with:</p>
396<pre><code>This is not the master node, please connect to node 'nodeA.virt.nsrc.org' and
397rerun the command</code></pre>
398<p>Which means that nodeB is well aware that nodeA is the master now.</p>
399<p>Once you have done this, you may find that nodeA and nodeB have different versions of the cluster database. Type the following on nodeA:</p>
400<pre><code># gnt-cluster verify
401...
402Sat Jan 18 16:11:12 2014   - ERROR: cluster: File /var/lib/ganeti/config.data found with 2 different checksums (variant 1 on nodeB.virt.nsrc.org, nodeC.virt.nsrc.org; variant 2 on nodeA.virt.nsrc.org)
403Sat Jan 18 16:11:12 2014   - ERROR: cluster: File /var/lib/ganeti/ssconf_master_node found with 2 different checksums (variant 1 on nodeB.virt.nsrc.org, nodeC.virt.nsrc.org; variant 2 on nodeA.virt.nsrc.org)</code></pre>
404<p>You can fix this by:</p>
405<pre><code># gnt-cluster redist-conf</code></pre>
406<p>which pushes out the config from the current master to all the other nodes.</p>
407<p>Re-run <code>gnt-cluster verify</code> to check everything is OK again.</p>
408<p>Then to make nodeB take over the master role again, login to nodeB and run:</p>
409<pre><code># gnt-cluster master-failover</code></pre>
410<h1 id="additional-commands"><a href="#additional-commands"><span class="header-section-number">6</span> Additional commands</a></h1>
411<p>For reference, here are some additional useful commands. You should try these out in a test environment before a real problem occurs.</p>
412<h2 id="moving-all-instances-away-from-a-node"><a href="#moving-all-instances-away-from-a-node"><span class="header-section-number">6.1</span> Moving all instances away from a node</a></h2>
413<p>command: <code>gnt-node evacuate</code></p>
414<p>Read the man page for <code>gnt-node</code> and look for the section about the <code>evacuate</code> subcommand.</p>
415<p>Note: for the time being, one needs to explicitly tell the evacuate command to move away either primary (<code>-p</code>) or secondary (<code>-s</code>) instances - it won't work for both at the same time.</p>
416<p>Assuming we have:</p>
417<ul>
418<li>We have debianX running as a DRBD instance on nodeA (primary)</li>
419<li><p>nodeB (secondary).</p></li>
420<li><p>We have debianZ running as a plain instance on node B</p></li>
421</ul>
422<p>What happens if we do:</p>
423<pre><code># gnt-node evacuate -p nodeB
424
425Relocate instance(s) debianY from node(s) nodeB?
426y/[n]/?:</code></pre>
427<p><code>gnt-node evacuate</code> has figured out that the <code>plain</code> debianY instance needs to be moved away. Answer <code>y</code></p>
428<pre><code>Fri Sep 19 14:29:45 2014  - INFO: Evacuating instances from node 'nodeB': debianY
429Fri Sep 19 14:29:46 2014  - WARNING: Unable to evacuate instances debianY (Instances of type plain cannot be relocated)
430Failure: command execution error:
431Unable to evacuate instances debianY (Instances of type plain cannot be relocated)</code></pre>
432<p>Uh oh :(</p>
433<p>What about gnt-node evacuate -s nodeB ?</p>
434<h2 id="making-a-node-online-after-it-has-been-marked-as-offline"><a href="#making-a-node-online-after-it-has-been-marked-as-offline"><span class="header-section-number">6.2</span> Making a node online after it has been marked as offline</a></h2>
435<p>When a node has been marked offline for a short period of time and no other cluster changes have taken place, it <em>is</em> possible just to mark it online again. You could simply do the following (DON'T DO THIS NOW!):</p>
436<pre><code># gnt-node modify --offline=no nodeB.virt.nsrc.org
437
438Sat Jan 18 22:08:45 2014  - INFO: Auto-promoting node to master candidate
439Sat Jan 18 22:08:45 2014  - WARNING: Transitioning node from offline to online state without using re-add. Please make sure the node is healthy!</code></pre>
440<p>If there is any doubt, use <code>gnt-node add --readd</code> instead.</p>
441<p>It's also a good idea to do a <code>gnt-cluster redist-conf</code> after bringing the node back online.</p>
442<div class="footnotes">
443<hr />
444<ol>
445<li id="fn1"><p>Similarly, if you have a 2 node cluster and one of them is down, and you reboot the single working node, the master daemon will fail to start as it is unable to confirm that it is definitely the master. Use <code>service ganeti status</code> to see what is running. Then as required:</p>
446<pre><code>/usr/lib/ganeti/daemon-util start ganeti-masterd --no-voting  # ganeti 2.11
447/usr/lib/ganeti/daemon-util start ganeti-wconfd --no-voting   # ganeti 2.12
448/usr/lib/ganeti/daemon-util start ganeti-luxid --no-voting</code></pre>
449<p>These commands will be rejected unless you also add <code>--yes-do-it</code> to the command lines. Do so, but only if you are <em>sure</em> that you need to force this node to become master. If another node is still running as master, or you later force the other node to become master too, then you could have a &quot;split brain&quot; scenario, i.e. two inconsistent masters.<a href="#fnref1"></a></p></li>
450</ol>
451</div>
452</body>
453</html>