<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 9/7/18 4:52 PM, dev null zero two

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      <div>

        <div dir="auto">that is pretty incredible (never thought to use

          Gentoo for this purpose).</div>

      </div>

    </blockquote>

    I use it for everything. :)<br>

    <blockquote type="cite"

cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">

      <div dir="auto"><br>

      </div>

      <div dir="auto">have you thought about using orchestration tools

        for this (Kubernetes etc.)?</div>

    </blockquote>

    I have been immersed in enough IRC/forum/StackExchange traffic that

    I know that people do this, but thus far I haven't seen a need past

    the simple crafting of a single readily replicable Linux instance

    that justifies the added complexity. In addition to being able to

    make changes to that instance in chroot on the edge node and reboot

    the workers and any other daemon-running machines (to facilitate

    this, I set the machines to boot to disk first and PXE second and

    then I have a script that "breaks" the first drive on each worker

    node by overwriting the GPT partition table and forcing a reboot), I

    can still "fan" changes via ssh across all nodes serially or

    simultaneously and I still have the NFS mechanism available to me

    (right now, it handles only the Hadoop workers file). <br>

    <br>

    By the way, I've done this thing where the node setup script opens

    the optical media tray, which then automatically closes after a few

    minutes when the machine reboots. The on-disk instance is set to

    open and immediately close the tray at boot, so if all the machines

    have optical drives with trays that can open and close on command

    there will be quite an entertaining racket when a whole cluster

    starts up. <br>

    <br>

    Hey, Jim/Aaron - just think; I could have turned Sutton Middle

    School into a 500-node Hadoop cluster! There's a 24-seat lab at work

    that'd be good for about 3.3TiB of HDFS.<br>

    <blockquote type="cite"

cite="mid:CABmokzAEgNTa99n0uA1xiS5nKC92MxZ5E4xRmmHHTYFaB_Mx0A@mail.gmail.com">

      <div><br>

        <div class="gmail_quote">

          <div dir="ltr">On Fri, Sep 7, 2018 at 4:46 PM Jeff Hubbs via

            Ale <<a href="mailto:ale@ale.org" moz-do-not-send="true">ale@ale.org</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <p>For the past few months, I've been operating an Apache

                Hadoop cluster at Emory University's Goizueta Business

                School. That cluster is Gentoo-Linux-based and consists

                of a dual-homed "edge node" and three 16-GiB-RAM

                16-thread two-disk "worker" nodes. The edge node

                provides NAT for the active cluster nodes and holds a

                complete mirror of the Gentoo package repository that is

                updated nightly. There is also an auxiliary edge node (a

                one-piece Dell Vostro 320) with xorg and xfce that I

                mostly use to display exported instances of xosview from

                all of the other nodes so that I can keep an eye on the

                cluster's operation. Each of the worker nodes carries a

                standalone Gentoo Linux instance that was flown in via

                rsync from another node while booted to a liveCD-style

                distribution (SystemRescueCD, which happens to be

                Gentoo-based). <br>

              </p>

              <p>I have since set up the main edge node to form a

                "shadow cluster" in addition to the one I've been

                operating. Via iPXE and dnsmasq on the edge node, any

                x86_64 system that is connected to the internal cluster

                network and allowed to PXE-boot will download a

                stripped-down Gentoo instance via HTTP (served up by

                nginx), boot to this instance in RAM, and execute a bash

                script that finds, partitions, and formats all of that

                system's disks, downloads and writes to those disks a

                complete Gentoo Linux instance, installs and configures

                the GRUB bootloader, sets a hostname based on the

                system's first NIC's MAC address, and reboots the system

                into that freshly-written instance. <br>

              </p>

              <p>At present, there is only one read/write NFS export on

                the edge node and it holds a flat file that Hadoop uses

                as a list of available worker nodes. The list is

                populated by the aforementioned node setup script after

                the hostname is generated.</p>

              <p>Both the PXE-booted Gentoo Linux instance and the

                on-disk instance are managed within a chroot on the edge

                node in a manner not unlike how Gentoo Linux is

                conventionally installed on a system. Once set up as

                desired, these instances are compressed into separate

                squashfs files and placed in the nginx doc root. In the

                case of the PXE-booted instance, there is an

                intermediate step where much of the instance is stripped

                away just to reduce the size of the squashfs file, which

                is currently 431MiB. The full cluster node distribution

                file is 1.6GiB but I sometimes exclude the kernel source

                tree and local package meta-repository to bring it down

                to 1.1GiB. The on-disk footprint of the complete worker

                node instance is 5.9GiB.</p>

              <p>The node setup script takes the first drive it finds

                and GPT-partitions it six ways: 1) a 2MiB "spacer" for

                the bootloader; 2) 256MiB for /boot; 3) 32GiB for root;

                4) 2xRAM for swap (this is WAY overkill; it's set by

                ratio in the script and a ratio of one or less would

                suffice); 5) 64GiB for /tmp/hadoop-yarn (more about this

                later); 6) whatever is left for /hdfs1. Any remaining

                disks identified are single-partitioned as /hdfs2,

                /hdfs3, etc. All partitions are formatted btrfs with the

                exception of /boot, which is vfat for UEFI compatibility

                (a route I went down because I have one old laptop I

                found that was UEFI-only and I expect that will become

                more the case than less over time). A quasi-boolean in

                the script optionally enables compression at mount time

                for /tmp/hadoop-yarn. <br>

              </p>

              <p>One of Gentoo Linux's strengths is the ability to

                compile software specifically for the CPU but the node

                instance is set up with the gcc option -mtune=generic.

                Another quasi-boolean setting in the node setup script

                will change that to -march=native but that change will

                only effectuate when packages are built or rebuilt

                locally (as opposed to in chroot on the edge node, where

                everything must be built generic). I can couple this

                feature with another feature to optionally rebuild all

                the system's binaries native but that's an operation

                that would take a fair bit of time (that's over 500

                packages and only some of them would affect cluster

                operation). Similarly, in the interest of

                run-what-ya-brung flexibility, I'm using Gentoo's

                genkernel utility to generate a kernel and initrd

                befitting a liveCD-style instance that will boot on

                basically any x86-64 along with whatever NICs and disk

                controllers it finds. <br>

              </p>

              <p>I am using the Hadoop binary distribution (currently

                3.1.1) as distributed directly by Apache (no

                HortonWorks; no Cloudera). Each cluster node has its own

                Hadoop distribution and each node's Hadoop distribution

                has configuration features both in common and specific

                to that node, modified in place by the node setup

                script. In the latter case, the amount of available RAM,

                the number of available CPU threads, and the list of

                available HDFS partitions on a system are flown into the

                proper local config files. Hadoop services run in a Java

                VM; I am currently using the IcedTea 3.8.0 source

                distribution supplied within Gentoo's packaging system.

                I have also run it under the IcedTea binary distribution

                and the Oracle JVM with equal success. <br>

              </p>

              <p>Hadoop has three primary constructs that make it up.

                HDFS (Hadoop Distributed File System) consists of a

                NameNode daemon that runs on a single machine and

                controls the filesystem namespace and user access to it;

                DataNode daemons run on each worker node and coordinate

                between the NameNode daemon and the local machine's

                on-disk filesystem. You access the filesystem with

                command-line-like options to the hdfs binary like -put,

                -get, -ls, -mkdir, etc. but in the on-disk filesystem

                underneath /hdfs1.../hdfsN, the files you write are cut

                up into "blocks" (default size: 128MiB) and those blocks

                are replicated (default: three times) among all the

                worker nodes. My initial cluster with standalone workers

                reported 7.2TiB of HDFS available spread across six

                physical spindles. As you can imagine, it's possible to

                accumulate tens of TiB of HDFS across only a handful of

                nodes but doing so isn't necessarily helpful. <br>

              </p>

              <p>YARN (Yet Another Resource Negotiator) is the construct

                that manages the execution of work among the nodes. Part

                of the whole point behind Hadoop is to <i>move the

                  processing to where the data is </i>and it's YARN

                that coordinates all that. It consists of a

                ResourceManager daemon that communicates with all the

                worker nodes and NodeManager daemons that run on each of

                the worker nodes. You can run the ResourceManager daemon

                and HDFS' NameNode daemon on the same machines that act

                as worker nodes but past a point you won't want to and

                past <i>that</i> point you'd want to run each of

                NameNode and ResourceManager on two separate machines.

                In that regime, you'd have two machines dedicated to

                those roles (their names would be taken out of the

                centrally-located workers file) and the rest would run

                both the DataNode and NodeManager daemons, forming the

                HDFS storage subsystem and the YARN execution subsystem.</p>

              <p>There is another construct, MapReduce, whose

                architecture I don't fully understand yet; it comes into

                play as a later phase in Hadoop computations and there

                is a JobHistoryServer daemon associated with it.</p>

              <p>Another place where the bridge is out with respect to

                my understanding of Hadoop is coding for it - but I'll

                get there eventually. There are other apps like Apache's

                Spark and Hive that use HDFS and/or YARN that I have

                better mental insight into, and I have successfully

                gotten Python/Spark demo programs to run on YARN in my

                cluster. <br>

              </p>

              <p>One thing I have learned is that Hadoop clusters do not

                "genericize" well. When I first tried running the

                Hadoop-supplied teragen/terasort example (goal: make a

                file of 10^10 100-character lines and sort it), it

                failed for want of space available in /tmp/hadoop-yarn

                but it ran perfectly when the file was cut down to

                1/100th that size. For my PXE-boot-based cluster, I gave

                my worker nodes a separate partition for

                /tmp/hadoop-yarn and gave it optional transparent

                compression. There are a lot of parameters for

                controlling things like minimum size and minimum size

                increment of memory containers and JVM parameters that I

                haven't messed with but to optimize the cluster for a

                given job, one would expect to.<br>

              </p>

              <p>What I have right now - basically, a single Gentoo

                Linux instance for installation on a dual-homed edge

                node - is able to generate a working Hadoop cluster with

                an arbitrary number of nodes, limited primarily by

                space, cooling, and electric power (the Dell Optiplex

                desktops I'm using right now max out at about an amp, so

                you have to be prepared to supply at least N amps for N

                nodes). They can be purpose-built rack-mount servers, a

                lab environment full of thin clients, or wire shelf

                units full of discarded desktops and laptops. <br>

              </p>

              <p>- Jeff<br>

              </p>

              <p><br>

              </p>

            </div>

            _______________________________________________<br>

            Ale mailing list<br>

            <a href="mailto:Ale@ale.org" target="_blank"

              moz-do-not-send="true">Ale@ale.org</a><br>

            <a href="https://mail.ale.org/mailman/listinfo/ale"

              rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.ale.org/mailman/listinfo/ale</a><br>

            See JOBS, ANNOUNCE and SCHOOLS lists at<br>

            <a href="http://mail.ale.org/mailman/listinfo"

              rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.ale.org/mailman/listinfo</a><br>

          </blockquote>

        </div>

      </div>

      -- <br>

      <div dir="ltr" class="gmail_signature"

        data-smartmail="gmail_signature">Sent from my mobile. Please

        excuse the brevity, spelling, and punctuation.</div>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>