<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">Ha!!<br>
<br>
My ultimate plan with the APS project was to cobble all the "waste machines" together into a system-wide distributed storage cluster then run a single-system image cluster tool like mosics or kerrigh. That would make the entire school district a single instance, giant super computer that K12 students use :-)<br>
<br>
Excellent work on the hadoop cluster!<br>
<br>
I have my cluster nodes set to always pxe boot and the pxe boot default is to fall back to local boot drive. That way I can drive a new install/rebuild by twiddling a file on the DHCP server and rebooting a node. Eventually, the nodes will use no local storage as all will be reserved for /tmp (raid0 across all drives for speed) and use an NFS mounted root and remote log file process. Basically a homegrown PAAS set up controlled by job submission defined need. <br>
<br>
My current jobs are compiled matlab and custom python. Hadoop is coming back. Can't run hadoop on the same nodes with the others as it assumes full system control. Not enough demand yet for dedicated hadoop stack. So a reboot to nfs root hadoop cluster with a temp "node offline" status in torque seems to be feasible. Use pre-run scripts in torque to call reboot to hadoop node and post-run to reboot to normal cluster mode.<br><br><div class="gmail_quote">On September 7, 2018 11:54:04 PM EDT, Jeff Hubbs via Ale <ale@ale.org> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="moz-cite-prefix">On 9/7/18 4:52 PM, dev null zero two
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div>
<div dir="auto">that is pretty incredible (never thought to use
Gentoo for this purpose).</div>
</div>
</blockquote>
I use it for everything. :)<br>
<blockquote type="cite" cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">
<div dir="auto"><br>
</div>
<div dir="auto">have you thought about using orchestration tools
for this (Kubernetes etc.)?</div>
</blockquote>
I have been immersed in enough IRC/forum/StackExchange traffic that
I know that people do this, but thus far I haven't seen a need past
the simple crafting of a single readily replicable Linux instance
that justifies the added complexity. In addition to being able to
make changes to that instance in chroot on the edge node and reboot
the workers and any other daemon-running machines (to facilitate
this, I set the machines to boot to disk first and PXE second and
then I have a script that "breaks" the first drive on each worker
node by overwriting the GPT partition table and forcing a reboot), I
can still "fan" changes via ssh across all nodes serially or
simultaneously and I still have the NFS mechanism available to me
(right now, it handles only the Hadoop workers file). <br>
<br>
By the way, I've done this thing where the node setup script opens
the optical media tray, which then automatically closes after a few
minutes when the machine reboots. The on-disk instance is set to
open and immediately close the tray at boot, so if all the machines
have optical drives with trays that can open and close on command
there will be quite an entertaining racket when a whole cluster
starts up. <br>
<br>
Hey, Jim/Aaron - just think; I could have turned Sutton Middle
School into a 500-node Hadoop cluster! There's a 24-seat lab at work
that'd be good for about 3.3TiB of HDFS.<br>
<blockquote type="cite" cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">
<div><br>
<div class="gmail_quote">
<div dir="ltr">On Fri, Sep 7, 2018 at 4:46 PM Jeff Hubbs via
Ale <<a href="mailto:ale@ale.org" moz-do-not-send="true">ale@ale.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>For the past few months, I've been operating an Apache
Hadoop cluster at Emory University's Goizueta Business
School. That cluster is Gentoo-Linux-based and consists
of a dual-homed "edge node" and three 16-GiB-RAM
16-thread two-disk "worker" nodes. The edge node
provides NAT for the active cluster nodes and holds a
complete mirror of the Gentoo package repository that is
updated nightly. There is also an auxiliary edge node (a
one-piece Dell Vostro 320) with xorg and xfce that I
mostly use to display exported instances of xosview from
all of the other nodes so that I can keep an eye on the
cluster's operation. Each of the worker nodes carries a
standalone Gentoo Linux instance that was flown in via
rsync from another node while booted to a liveCD-style
distribution (SystemRescueCD, which happens to be
Gentoo-based). <br>
</p>
<p>I have since set up the main edge node to form a
"shadow cluster" in addition to the one I've been
operating. Via iPXE and dnsmasq on the edge node, any
x86_64 system that is connected to the internal cluster
network and allowed to PXE-boot will download a
stripped-down Gentoo instance via HTTP (served up by
nginx), boot to this instance in RAM, and execute a bash
script that finds, partitions, and formats all of that
system's disks, downloads and writes to those disks a
complete Gentoo Linux instance, installs and configures
the GRUB bootloader, sets a hostname based on the
system's first NIC's MAC address, and reboots the system
into that freshly-written instance. <br>
</p>
<p>At present, there is only one read/write NFS export on
the edge node and it holds a flat file that Hadoop uses
as a list of available worker nodes. The list is
populated by the aforementioned node setup script after
the hostname is generated.</p>
<p>Both the PXE-booted Gentoo Linux instance and the
on-disk instance are managed within a chroot on the edge
node in a manner not unlike how Gentoo Linux is
conventionally installed on a system. Once set up as
desired, these instances are compressed into separate
squashfs files and placed in the nginx doc root. In the
case of the PXE-booted instance, there is an
intermediate step where much of the instance is stripped
away just to reduce the size of the squashfs file, which
is currently 431MiB. The full cluster node distribution
file is 1.6GiB but I sometimes exclude the kernel source
tree and local package meta-repository to bring it down
to 1.1GiB. The on-disk footprint of the complete worker
node instance is 5.9GiB.</p>
<p>The node setup script takes the first drive it finds
and GPT-partitions it six ways: 1) a 2MiB "spacer" for
the bootloader; 2) 256MiB for /boot; 3) 32GiB for root;
4) 2xRAM for swap (this is WAY overkill; it's set by
ratio in the script and a ratio of one or less would
suffice); 5) 64GiB for /tmp/hadoop-yarn (more about this
later); 6) whatever is left for /hdfs1. Any remaining
disks identified are single-partitioned as /hdfs2,
/hdfs3, etc. All partitions are formatted btrfs with the
exception of /boot, which is vfat for UEFI compatibility
(a route I went down because I have one old laptop I
found that was UEFI-only and I expect that will become
more the case than less over time). A quasi-boolean in
the script optionally enables compression at mount time
for /tmp/hadoop-yarn. <br>
</p>
<p>One of Gentoo Linux's strengths is the ability to
compile software specifically for the CPU but the node
instance is set up with the gcc option -mtune=generic.
Another quasi-boolean setting in the node setup script
will change that to -march=native but that change will
only effectuate when packages are built or rebuilt
locally (as opposed to in chroot on the edge node, where
everything must be built generic). I can couple this
feature with another feature to optionally rebuild all
the system's binaries native but that's an operation
that would take a fair bit of time (that's over 500
packages and only some of them would affect cluster
operation). Similarly, in the interest of
run-what-ya-brung flexibility, I'm using Gentoo's
genkernel utility to generate a kernel and initrd
befitting a liveCD-style instance that will boot on
basically any x86-64 along with whatever NICs and disk
controllers it finds. <br>
</p>
<p>I am using the Hadoop binary distribution (currently
3.1.1) as distributed directly by Apache (no
HortonWorks; no Cloudera). Each cluster node has its own
Hadoop distribution and each node's Hadoop distribution
has configuration features both in common and specific
to that node, modified in place by the node setup
script. In the latter case, the amount of available RAM,
the number of available CPU threads, and the list of
available HDFS partitions on a system are flown into the
proper local config files. Hadoop services run in a Java
VM; I am currently using the IcedTea 3.8.0 source
distribution supplied within Gentoo's packaging system.
I have also run it under the IcedTea binary distribution
and the Oracle JVM with equal success. <br>
</p>
<p>Hadoop has three primary constructs that make it up.
HDFS (Hadoop Distributed File System) consists of a
NameNode daemon that runs on a single machine and
controls the filesystem namespace and user access to it;
DataNode daemons run on each worker node and coordinate
between the NameNode daemon and the local machine's
on-disk filesystem. You access the filesystem with
command-line-like options to the hdfs binary like -put,
-get, -ls, -mkdir, etc. but in the on-disk filesystem
underneath /hdfs1.../hdfsN, the files you write are cut
up into "blocks" (default size: 128MiB) and those blocks
are replicated (default: three times) among all the
worker nodes. My initial cluster with standalone workers
reported 7.2TiB of HDFS available spread across six
physical spindles. As you can imagine, it's possible to
accumulate tens of TiB of HDFS across only a handful of
nodes but doing so isn't necessarily helpful. <br>
</p>
<p>YARN (Yet Another Resource Negotiator) is the construct
that manages the execution of work among the nodes. Part
of the whole point behind Hadoop is to <i>move the
processing to where the data is </i>and it's YARN
that coordinates all that. It consists of a
ResourceManager daemon that communicates with all the
worker nodes and NodeManager daemons that run on each of
the worker nodes. You can run the ResourceManager daemon
and HDFS' NameNode daemon on the same machines that act
as worker nodes but past a point you won't want to and
past <i>that</i> point you'd want to run each of
NameNode and ResourceManager on two separate machines.
In that regime, you'd have two machines dedicated to
those roles (their names would be taken out of the
centrally-located workers file) and the rest would run
both the DataNode and NodeManager daemons, forming the
HDFS storage subsystem and the YARN execution subsystem.</p>
<p>There is another construct, MapReduce, whose
architecture I don't fully understand yet; it comes into
play as a later phase in Hadoop computations and there
is a JobHistoryServer daemon associated with it.</p>
<p>Another place where the bridge is out with respect to
my understanding of Hadoop is coding for it - but I'll
get there eventually. There are other apps like Apache's
Spark and Hive that use HDFS and/or YARN that I have
better mental insight into, and I have successfully
gotten Python/Spark demo programs to run on YARN in my
cluster. <br>
</p>
<p>One thing I have learned is that Hadoop clusters do not
"genericize" well. When I first tried running the
Hadoop-supplied teragen/terasort example (goal: make a
file of 10^10 100-character lines and sort it), it
failed for want of space available in /tmp/hadoop-yarn
but it ran perfectly when the file was cut down to
1/100th that size. For my PXE-boot-based cluster, I gave
my worker nodes a separate partition for
/tmp/hadoop-yarn and gave it optional transparent
compression. There are a lot of parameters for
controlling things like minimum size and minimum size
increment of memory containers and JVM parameters that I
haven't messed with but to optimize the cluster for a
given job, one would expect to.<br>
</p>
<p>What I have right now - basically, a single Gentoo
Linux instance for installation on a dual-homed edge
node - is able to generate a working Hadoop cluster with
an arbitrary number of nodes, limited primarily by
space, cooling, and electric power (the Dell Optiplex
desktops I'm using right now max out at about an amp, so
you have to be prepared to supply at least N amps for N
nodes). They can be purpose-built rack-mount servers, a
lab environment full of thin clients, or wire shelf
units full of discarded desktops and laptops. <br>
</p>
<p>- Jeff<br>
</p>
<p><br>
</p>
</div>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org" target="_blank" moz-do-not-send="true">Ale@ale.org</a><br>
<a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote>
</div>
</div>
-- <br>
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my mobile. Please
excuse the brevity, spelling, and punctuation.</div>
</blockquote>
<p><br>
</p>
</blockquote></div><br>
-- <br>
Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.</body></html>