-
Notifications
You must be signed in to change notification settings - Fork 3
/
Troubleshooting.tex
98 lines (75 loc) · 3.38 KB
/
Troubleshooting.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
\chapter{Troubleshooting}
\section{Terminating all active experiments}
To terminate all active experiments a clean-up script is available in the IMUNES
system. The script is invoked by issuing the command \texttt{cleanupAll} with
root privileges. This script will terminate all running experiments.
\texttt{\# cleanupAll}
\subsection{Cleaning up hanging ZFS mounts}
If a machine running an experiment is rebooted the experiment will not be
available after boot but the ZFS mounts needed for network nodes will remain
available after the restart. The \texttt{cleanupAll} tool is also used for
destroying the remaining ZFS mounts.
\section{Restoring original ZFS snapshot}
\textbf{NOTE:} New versions of IMUNES use Unionfs instead of ZFS.
The ZFS snapshot used for replicating on virtual nodes can be corrupted or
deleted. To restore the ZFS snapshot to the original state we first need to
destroy the existing default snapshot named \texttt{vroot/vroot@clean}. Before
destroying the ZFS snapshot make sure that no experiments are currently running
by using the command \texttt{himage -l} and use the \texttt{imunes -b eid}
option or the \texttt{cleanupAll} tool to terminate them. removal of the ZFS
snapshot is done by issuing the following command with root user privileges:
\texttt{\# zfs destroy -fR vroot/vroot@clean}
After the procedure is complete download the IMUNES tarball from the IMUNES
website (\url{http://imunes.tel.fer.hr/dl/imunes-1.0.tar.gz}) and unpack it.
Then enter the directory an run the command \texttt{make vroot} with root
privileges to restore the ZFS snapshot. The procedure done as follows:
\texttt{\# tar xf imunes-1.0.tar.gz}\\
\texttt{\# cd imunes\_\emph{version}}\\
\texttt{\# make vroot\_zfs}
\textbf{NOTE:} To restore the snapshot an Internet connection must be available.
\section{Obtaining kernel panic traces}
In case a specific experiment configuration, workload type or hardware
configuration triggers operating system crashes, obtaining traces of
such events may be instrumental for successful debugging and resolving
the observed operating system level bugs. The following procedure is
recommended for collecting kernel panic traces:
Add the following line to the
\texttt{/etc/rc.conf}
file:
\texttt{dumpdev="AUTO"}
Create a new file
\texttt{/usr/local/etc/rc.d/textdump}
and populate it with the following script:
\begin{verbatim}
#!/bin/sh
#
# PROVIDE: textdump
# REQUIRE: LOGIN
#
ddb script kdb.enter.default="textdump set; capture on; bt;\
show reg; show pcpu; show vnets; call doadump; reset"
\end{verbatim}
Set execution bit on
\texttt{/usr/local/etc/rc.d/textdump}
file:
\begin{verbatim}
# chmod +x /usr/local/etc/rc.d/textdump
\end{verbatim}
Once rebooted, the machine should be from now on configured to automatically
store kernel panic traces in
\texttt{/var/crash}
directory. Here's an example of collection of kernel panic trace files:
\begin{verbatim}
# ls -l /var/crash/
total 96
-rw-r--r-- 1 root wheel 2 Sep 20 00:06 bounds
-rw------- 1 root wheel 440 Sep 19 23:57 info.0
-rw------- 1 root wheel 442 Sep 20 00:00 info.1
-rw------- 1 root wheel 442 Sep 20 00:06 info.2
-rw------- 1 root wheel 24576 Sep 19 23:57 textdump.tar.0
-rw------- 1 root wheel 39424 Sep 20 00:00 textdump.tar.1
-rw------- 1 root wheel 24576 Sep 20 00:06 textdump.tar.2
#
\end{verbatim}
Such "textdump" tarballs should be sumbitted along with any reports of
kernel crashes.