.TL
Rebuilding fossil from venti arenas
.AU
Steve Simon
.AI
steve \fBat\fI quintile \fBdot\fI net
.SH
Machine prerequisites
.LP
The machine used must have an ethernet card (though no active network is
required). The loopback ether device cannot be used as its not currently
built into the 9pccd kernel. A spare disk partition is also needed, which
must be larger than the total size venti index to be build.
.LP
It is very useful to have hardcopies of the following manual pages:
prep(8), plan9.ini(8), venti(8), ventiaux(8), fossilcons(8) fossil(8), and fs(3).
.SH
Example senario
.LP
In this example the fossil and index disks are not damaged but
being replaced, this is actually slightly more complex than just
rebuilding a server as the partitions and SCSI target numbers have to be changed.
.LP
The changes described below result from the replacement of
the two venti index disks with a single index, and the creation of an
secondary fossil filesystem, that is not backed up; by convention
this is called \fBother\fR.
.LP
.TS
center box ;
l s s s s s s
l l l l l l l
l|l|l|l|l|l|l
l|l|l|l|l|l|l
l|l|l|l|l|l|l
l|l|l|l|l|l|l
l s s s s s s
l s s s s s s
l|l|l|l|l|l|l
.
Old layout
=
sd00 sd01 sd02 sd04 sd06 sd08 sd010
_
9fat isect0 isect1 arenas0
nvram
fossil
swap
_
New layout
=
9fat arenas0 isect0 arenas0
nvram other mirror
fossil
swap
.TE
.NH
Boot from CD
.LP
The machine to be modified should be booted from CD.
.LP
Note: The x86 bootable ISO image distributed by Bell Labs boots expects
the CD drive to be attached to the secondary master IDE interface.
.NH 2
Get the last valid Venti archive score
.LP
.DE
To rebuild from a venti archive a score is needed, this must be a score
as printed on the fossil console when the nightly \fBsnap -a\fR occurs,
or a score (as here) extracted from a fossil archive by fossil/last.
.LP
A score as produced by the \fBvac\fR command on the fossil console or
the \fBvac(1)\fR command line tool could be used, however the
directory structure in the rebuilt fossil will NOT CONTAIN the top
level \fI/active\fR, \fI/archive\fR, and \fI/snapshot\fR directories,
so this is not reccomended.
.DS
.CW
cpu% fossil/last /dev/sd00/fossil > /tmp/last.vac
.DE
.NH 2
Dump all VAC scores
.LP
If you don't have a recent VAC score with which to reinitialise your fossil
from then you can extract all of them using /sys/src/cmd/venti/dumpvacroots.
.IP
If you have an old boot CD you may need to compile /sys/src/cmd/venti/8.printarenas
and edit dumpvacroots setting the IP address of your venti server. Newer CDs
have printarenas precompiled and dumpvacroots expects to use the \fIventi=\fR
enviroment variable.
.LP
Dumpvacroots will print the scores of all the recent
venti archives in date order, you most probably want to use the last one
printed (I.E. the most recent). Dumpvacroots will take ten or fifteen mins
to run.
.DS
.CW
cpu% echo $venti
tcp!192.168.0.5!17034
cpu% cd /sys/src/cmd/venti
cpu% ./dumpvacroots | tail
vac:823732...
vac:5628943...
.DE
.NH
Modify the venti config
.LP
Venti's configuration must be changed to reflect the new disk layout.
.LP
Venti's configuration is conventionally stored in a block at the start
of the \fIarenas0\fR partition rather than in a file in the
filessytem, This allows the system to boot directly from fossil/venti.
.DS
.CW
cpu% venti/conf /dev/sd06/arenas0
index main
isect /dev/sd01/isect0
isect /dev/sd02/isect1
arenas /dev/sd06/arenas0
.SM
\fI # Dump old venti layout
.DE
.DS
.CW
cpu% venti/conf -w /dev/sd06/arenas0 < EOF
index main
isect /dev/sd08/isect0
arenas /dev/sd06/arenas0
EOF
.SM
\fI # Write new venti layout
.DE
.NH
Initialise the fossil/nvram/9fat disk
.LP
The \fI9fat\fR partition will contain the low level boot loader and machine
configuration file (Plan9.ini). \fINvram\fR holds the machines key allowing
it to boot unattended. \fIFossil\fI is the write buffer for the filesystem holding
snapshots and modified files not yet archived to venti.
.LP
By convention the \fI9fat\fR partition is the first partition on the
disk, putting it further that 8½Gb into the disk can cause problems with
booting as cylinder/head/sector addressing used by most BIOSs cannot
address further than this into the disk - see the section on LBA in 9load(8).
This partition need only be about 100Mb in length.
.LP
The \fInvram\fR partition requires only a single 512 byte sector.
.LP
The \fIfossil\fR partition need be only big enough to hold the biggest file you will need
to write to the system, and will also limit the number of bytes you can write per day.
The latter is not strictly true ad multiple archival snapshots may be taken per day,
however it is a reasonable rule of thumb; fossil is typically between 2Gb and 8Gb.
.DS
.CW
cpu% disk/mbr -m /386/mbr /dev/sd04/data
cpu% disk/fdisk -baw /dev/sd04/data
cpu% disk/prep /dev/sd04/plan9
.SM
# see manual for usage of prep(8)
.DE
.NH
Initialise isect/other disk
.LP
Venti performance can be improved if the venti indexes are split
across several physical disks, however, this has not been done here.
The total size of all the index slices needs to be only about five percent
of the venti arenas.
.DS
.CW
cpu% disk/mbr -m /386/mbr /dev/sd08/data
cpu% disk/fdisk -baw /dev/sd08/data
cpu% disk/prep /dev/sd08/plan9
.SM
# see manual for usage of prep(8)
.DE
.NH
Format each isect slice
.LP
Each slice must be branded with its name - usually the same
name as the partition's name. This will take about
10 mins per slice. Only one isect slice is used in this example.
.DS
.CW
cpu% venti/fmtisect isect0 /dev/sd08/isect0
.DE
.NH
Combine all isect slices into an index
.LP
All the index slices must now be combined into a single index,
and populated with references into the venti archive.
.DS
.CW
cpu% venti/fmtindex /dev/sd06/arenas0
.DE
.NH
Rebuild the index from the index slices
.LP
Here \fBother's\fR partition is used as temporary space for the index
rebuild, alternatively another disk could have been added for the
duration of the rebuild. The partition used must be bigger that the
combined size of all the index slices. This process takes about 15
mins.
.DS
.CW
cpu% venti/buildindex /dev/sd06/arenas0 /dev/sd08/other
.DE
.NH
Start ethernet
.LP
Fossil and venti to communicate via TCP/IP so the ethernet device
must be initialised.
.DS
.CW
cpu% ip/ipconfig ether /net/ether0 add 192.168.0.5 255.255.255.0
.DE
.NH
Start venti
.LP
The -h attribute is required to start the http server built into venti,
This is necessary only if you want to run dumpvacroots(1) below.
.DS
.CW
cpu% venti/venti -h tcp!192.168.0.5!8000 -c /dev/sd06/arenas0
.DE
.NH
Load fossil's config
.LP
Fossil's configuration is conventionally stored in a block
at the start of the \fIfossil\fR partition rather than a a
file in the filessytem. Like \fIventi\fR this allows the system
to boot from its own disks rather than fossil starting
after the kernel has booted from aonther filesystem (kfs(1) or
via a network connection for example).
.LP
.DS
.CW
cpu% fossil/conf -w /dev/sd04/fossil << EOF
fsys main config /dev/sd04/fossil
fsys other config /dev/sd08/other
fsys main open -c 14848
fsys other open -c 14848
fsys main snaptime -s 15 -a 0400 -t 3600
listen tcp!*!564
EOF
.DE
.NH
Initialise fossil data from venti.
.LP
Here the vac score saved earlier is used, first removing
the leading \fBvac:\fR string.
.LP
The file tree is not actually loaded into fossil, meerly a reference to
the top of the tree is inserted, therefore this takes only a second.
.DS
.CW
cpu% score=`{sed 's/^vac://' /tmp/last.vac}
cpu% fossil/flfmt -h 192.168.0.5 -v $score /dev/sd04/fossil
.DE
.NH
Format other.
.LP
During the rebuild of the venti's indices \fBother\fR was overwritten,
it now needs to be formatted for fossil.
.DS
.CW
cpu% fossil/flfmt /dev/sd08/other
.DE
.NH
Format and initialise the 9fat partition
.LP
Load a kernel, both boot-strap loaders, and and plan9.ini into the 9fat partition.
.DS
.CW
cpu% disk/format -b /386/pbslba -d -r 2 /dev/sd04/9fat
/386/9load /386/9pcf /tmp/plan9.ini
.SM
\fI# This line was wrapped in formatting for this document
.DE
.NH
nvram partition
.LP
As the disk containing the nvram partition is now at target 4 it
is necessary to tell the kernel to find it, by adding
the following to plan9.ini.
.DS
.CW
nvroff=0
nvrlen=512
nvram=#S/sd04/nvram
.DE
.LP
If these enviroment variables are also initialised on the current shell
then \fIwrkey\fR can can be used to setup the nvram, alternatively \fIkeyfs\fR will
generate similar prompts if it discoves an invalid nvram
partition when the machine first boots.
.DS
.CW
cpu% auth/wrkey
auth id: bootes
auth dom: plan9.mydomain.dom
password: xyzzy1
secstore password: xyzzy2
.DE
.LP
If bootes's secstore is populated with a key for sources.cs.bell-labs.com
then these keys may be read into factotum via /rc/bin/cpurc.
.DS
.CW
# This example is taken from a running system
cpu% grep factotum /bin/cpurc
auth/secstore -n -G factotum >> /mnt/factotum/ctl
cpu% grep outside /mnt/factotum/ctl
key proto=p9sk1 dom=outside.plan9.bell-labs.com user=stevesimon !password?
.DE
.NH
Reboot.
.bp
.SH
Appendix A
.PP
Converting Venti to a mirrored pair.
.LP
As the Venti arenas are the only pieces of the system which cannot easily be regenerated
it is prudent to protect them by mirroring with fs(3). Mirrored partitions must be the same size
though the disks on which they reside need not be. Continuing the example above we mirror
the entire venti disk /dev/sd06/data onto /dev/sd010/data. To hold the fs(3) configuration
a separate fscfg partition must be generated, this is most easily done by stealing a sector
from the swap partition on /dev/sd04/swap.
.NH 1
Reboot onto the CDROM
.LP
Though the mirrored disk can be copied live as detailed in fs(3) other parts
config require a reboot so it is safest to make the changes below whilst booted from
a standalone CDROM.
.NH 1
Create the fscfg partition
.LP
Use disk/prep to change the partition table for /dev/sd04/plan9 reducing the size of swap by
one one 512 byte sector and creating a new fscfg partition in this space.
.NH 1
Update plan9.ini
.LP
Edit plan9.ini, changing all references to /dev/sd06/arenas0 with /dev/fs/arenas0. Add a
variable fscfg. The boot processes initialises the fs(3) driver if it sees this definition
in plan9.ini .
Note the spelling of \fBfsconfig\fR .
.DS
.CW
fsconfig=/dev/sd04/fscfg
.DE
.NH 1
Create a fscfg file
.LP
Ensure any
.B mirror
lines list the fastest disk(s) first as reads are
always performed from the first disk listed (assuming
returns no errors).
.DS
.CW
term% cat /tmp/fscfg.txxt
fsdev:
mirror arenas0 /dev/sd06/arenas0 /dev/sd010/arenas0
.DE
.NH 1
Install fscfg
.LP
Put the fscfg info into /dev/sd04/fscfg,
there is no utility to do this but dd(1) will suffice.
.DS
.CW
cpu% dd -if /tmp/fscfg.txt -of /dev/sd04/fscfg -count 1
.DE
.NH 1
Edit venti config
.LP
Use venti/conf to read and write the configuration, replacing
all references to /dev/sd06/arenas0 with /dev/fs/arenas0
.NH 1
Copy the disks
.DS
.CW
cpu% dd -if /dev/sd06/data -of /dev/sd010/data -bs 1024k
.DE
.NH 1
Reboot
.bp
.SH
Appendix B
.PP
On Venti and fossil cache sizes, by Russ Cox
.LP
.I
suppose I have a fossil buffer of 1 Gb, 50 Gb of venti arenas, 0.75 Gb
of ram, and I want the machine to be basically a file server, but still
be able to run rio and a few other things without running out of memory,
how do I use the memory I have in the most efficient way?
.R
.LP
First decide how much memory you want for interactive use.
Suppose this is 256MB. You probably want to set kernelpercent
down to something small given how much memory you have.
Suppose you set it to 20%. Then that leaves you 614MB. Suppose
you keep 102MB for yourself, leaving 512MB for fossil+venti.
.LP
Now the question is how to partition the 512.
If the Venti is used primarily for backing the fossil,
then it makes sense to give fossil most of the memory,
since fossil does its own caching of Venti reads/writes,
and reading even from the Venti cache is noticeably slower
than satisfying requests entirely from the fossil cache.
.LP
I would give 8MB to each of Venti's uses and leave
the rest for fossil:
.DS
.CW
venti -B 8M -C 8M -I 8M
open -c 62424
.DE
62424 is (512-8*3)*1024*1024/8192, assuming you
have an 8k block size. It is probably wrong that -c
takes a block count instead of bytes like the others.
.LP
I've been running with the config suggested in the wiki,
8M for each venti guy and also 8M (the default 1000 blocks)
for fossil. I have been meaning to switch to some small
amount of cache for Venti and more cache for fossil.
I think that will help things a bit.
.DS
.CW
venti -B 1M -C 1M -I 1M
open -c 3712
.DE
seems like a much better use of the 32MB.
.LP
|