<div><div dir="auto">that is pretty incredible (never thought to use Gentoo for this purpose).</div></div><div dir="auto"><br></div><div dir="auto">have you thought about using orchestration tools for this (Kubernetes etc.)?</div><div><br><div class="gmail_quote"><div dir="ltr">On Fri, Sep 7, 2018 at 4:46 PM Jeff Hubbs via Ale <<a href="mailto:ale@ale.org">ale@ale.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <p>For the past few months, I've been operating an Apache Hadoop

      cluster at Emory University's Goizueta Business School. That

      cluster is Gentoo-Linux-based and consists of a dual-homed "edge

      node" and three 16-GiB-RAM 16-thread two-disk "worker" nodes. The

      edge node provides NAT for the active cluster nodes and holds a

      complete mirror of the Gentoo package repository that is updated

      nightly. There is also an auxiliary edge node (a one-piece Dell

      Vostro 320) with xorg and xfce that I mostly use to display

      exported instances of xosview from all of the other nodes so that

      I can keep an eye on the cluster's operation. Each of the worker

      nodes carries a standalone Gentoo Linux instance that was flown in

      via rsync from another node while booted to a liveCD-style

      distribution (SystemRescueCD, which happens to be Gentoo-based). <br>

    </p>

    <p>I have since set up the main edge node to form a "shadow cluster"

      in addition to the one I've been operating. Via iPXE and dnsmasq

      on the edge node, any x86_64 system that is connected to the

      internal cluster network and allowed to PXE-boot will download a

      stripped-down Gentoo instance via HTTP (served up by nginx), boot

      to this instance in RAM, and execute a bash script that finds,

      partitions, and formats all of that system's disks, downloads and

      writes to those disks a complete Gentoo Linux instance, installs

      and configures the GRUB bootloader, sets a hostname based on the

      system's first NIC's MAC address, and reboots the system into that

      freshly-written instance. <br>

    </p>

    <p>At present, there is only one read/write NFS export on the edge

      node and it holds a flat file that Hadoop uses as a list of

      available worker nodes. The list is populated by the

      aforementioned node setup script after the hostname is generated.</p>

    <p>Both the PXE-booted Gentoo Linux instance and the on-disk

      instance are managed within a chroot on the edge node in a manner

      not unlike how Gentoo Linux is conventionally installed on a

      system. Once set up as desired, these instances are compressed

      into separate squashfs files and placed in the nginx doc root. In

      the case of the PXE-booted instance, there is an intermediate step

      where much of the instance is stripped away just to reduce the

      size of the squashfs file, which is currently 431MiB. The full

      cluster node distribution file is 1.6GiB but I sometimes exclude

      the kernel source tree and local package meta-repository to bring

      it down to 1.1GiB. The on-disk footprint of the complete worker

      node instance is 5.9GiB.</p>

    <p>The node setup script takes the first drive it finds and

      GPT-partitions it six ways: 1) a 2MiB "spacer" for the bootloader;

      2) 256MiB for /boot; 3) 32GiB for root; 4) 2xRAM for swap (this is

      WAY overkill; it's set by ratio in the script and a ratio of one

      or less would suffice); 5) 64GiB for /tmp/hadoop-yarn (more about

      this later); 6) whatever is left for /hdfs1. Any remaining disks

      identified are single-partitioned as /hdfs2, /hdfs3, etc. All

      partitions are formatted btrfs with the exception of /boot, which

      is vfat for UEFI compatibility (a route I went down because I have

      one old laptop I found that was UEFI-only and I expect that will

      become more the case than less over time). A quasi-boolean in the

      script optionally enables compression at mount time for

      /tmp/hadoop-yarn. <br>

    </p>

    <p>One of Gentoo Linux's strengths is the ability to compile

      software specifically for the CPU but the node instance is set up

      with the gcc option -mtune=generic. Another quasi-boolean setting

      in the node setup script will change that to -march=native but

      that change will only effectuate when packages are built or

      rebuilt locally (as opposed to in chroot on the edge node, where

      everything must be built generic). I can couple this feature with

      another feature to optionally rebuild all the system's binaries

      native but that's an operation that would take a fair bit of time

      (that's over 500 packages and only some of them would affect

      cluster operation). Similarly, in the interest of

      run-what-ya-brung flexibility, I'm using Gentoo's genkernel

      utility to generate a kernel and initrd befitting a liveCD-style

      instance that will boot on basically any x86-64 along with

      whatever NICs and disk controllers it finds. <br>

    </p>

    <p>I am using the Hadoop binary distribution (currently 3.1.1) as

      distributed directly by Apache (no HortonWorks; no Cloudera). Each

      cluster node has its own Hadoop distribution and each node's

      Hadoop distribution has configuration features both in common and

      specific to that node, modified in place by the node setup script.

      In the latter case, the amount of available RAM, the number of

      available CPU threads, and the list of available HDFS partitions

      on a system are flown into the proper local config files. Hadoop

      services run in a Java VM; I am currently using the IcedTea 3.8.0

      source distribution supplied within Gentoo's packaging system. I

      have also run it under the IcedTea binary distribution and the

      Oracle JVM with equal success. <br>

    </p>

    <p>Hadoop has three primary constructs that make it up. HDFS (Hadoop

      Distributed File System) consists of a NameNode daemon that runs

      on a single machine and controls the filesystem namespace and user

      access to it; DataNode daemons run on each worker node and

      coordinate between the NameNode daemon and the local machine's

      on-disk filesystem. You access the filesystem with

      command-line-like options to the hdfs binary like -put, -get, -ls,

      -mkdir, etc. but in the on-disk filesystem underneath

      /hdfs1.../hdfsN, the files you write are cut up into "blocks"

      (default size: 128MiB) and those blocks are replicated (default:

      three times) among all the worker nodes. My initial cluster with

      standalone workers reported 7.2TiB of HDFS available spread across

      six physical spindles. As you can imagine, it's possible to

      accumulate tens of TiB of HDFS across only a handful of nodes but

      doing so isn't necessarily helpful. <br>

    </p>

    <p>YARN (Yet Another Resource Negotiator) is the construct that

      manages the execution of work among the nodes. Part of the whole

      point behind Hadoop is to <i>move the processing to where the

        data is </i>and it's YARN that coordinates all that. It

      consists of a ResourceManager daemon that communicates with all

      the worker nodes and NodeManager daemons that run on each of the

      worker nodes. You can run the ResourceManager daemon and HDFS'

      NameNode daemon on the same machines that act as worker nodes but

      past a point you won't want to and past <i>that</i> point you'd

      want to run each of NameNode and ResourceManager on two separate

      machines. In that regime, you'd have two machines dedicated to

      those roles (their names would be taken out of the

      centrally-located workers file) and the rest would run both the

      DataNode and NodeManager daemons, forming the HDFS storage

      subsystem and the YARN execution subsystem.</p>

    <p>There is another construct, MapReduce, whose architecture I don't

      fully understand yet; it comes into play as a later phase in

      Hadoop computations and there is a JobHistoryServer daemon

      associated with it.</p>

    <p>Another place where the bridge is out with respect to my

      understanding of Hadoop is coding for it - but I'll get there

      eventually. There are other apps like Apache's Spark and Hive that

      use HDFS and/or YARN that I have better mental insight into, and I

      have successfully gotten Python/Spark demo programs to run on YARN

      in my cluster. <br>

    </p>

    <p>One thing I have learned is that Hadoop clusters do not

      "genericize" well. When I first tried running the Hadoop-supplied

      teragen/terasort example (goal: make a file of 10^10 100-character

      lines and sort it), it failed for want of space available in

      /tmp/hadoop-yarn but it ran perfectly when the file was cut down

      to 1/100th that size. For my PXE-boot-based cluster, I gave my

      worker nodes a separate partition for /tmp/hadoop-yarn and gave it

      optional transparent compression. There are a lot of parameters

      for controlling things like minimum size and minimum size

      increment of memory containers and JVM parameters that I haven't

      messed with but to optimize the cluster for a given job, one would

      expect to.<br>

    </p>

    <p>What I have right now - basically, a single Gentoo Linux instance

      for installation on a dual-homed edge node - is able to generate a

      working Hadoop cluster with an arbitrary number of nodes, limited

      primarily by space, cooling, and electric power (the Dell Optiplex

      desktops I'm using right now max out at about an amp, so you have

      to be prepared to supply at least N amps for N nodes). They can be

      purpose-built rack-mount servers, a lab environment full of thin

      clients, or wire shelf units full of discarded desktops and

      laptops. <br>

    </p>

    <p>- Jeff<br>

    </p>

    <p><br>

    </p>

  </div>

_______________________________________________<br>

Ale mailing list<br>

<a href="mailto:Ale@ale.org" target="_blank">Ale@ale.org</a><br>

<a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>

See JOBS, ANNOUNCE and SCHOOLS lists at<br>

<a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sent from my mobile. Please excuse the brevity, spelling, and punctuation.</div>