One of the reasons I set up my cluster was that I’m running out of space on my NAS. I don’t want to buy a whole new chassis, and while I could have put individual file shares on each cluster node, that would be both inconvenient and not provide any data redundancy without a lot of brittle home-rolled hacks to rsync data from node to node. And since distributed file systems are a thing, I’d rather not resort to hacks.

Alternatives Considered

To make a long story short, I considered ceph, glusterfs, lizardfs & moosefs.

ceph

Ceph’s published specs show that it basically won’t fit in my ODROID nodes. They say metadata servers need 1GB, OSDs need another 500MB, and the HC2s in my cluster only have 2GB of onboard RAM, and I’m running k8s on them too, so Ceph is out.

GlusterFS

GlusterFS looked nice at first, but it’s both less flexible (you can’t add single drives to the storage cluster) and less performant (by a factor of 2 compared to MooseFS) on my HC2 hardware.

LizardFS

LizardFS is a fork of MooseFS, and when I was trying to install it in my cluster I had a hard time finding debs for it. When I was looking for them online, I found a lot of complaints about performance issues on ARM, so that eliminated it from consideration.

MooseFS

There were a lot of things to like about MooseFS for my use case on my hardware.

  • There were prebuilt ARM debs on their site, in a handy PPA.
  • It was twice as fast on my hardware as GlusterFS.
  • I can add individual drive bricks to the cluster, even though my storage policies (more on them below) all require multiple replicas of data in the filesystem they’re applied to.
  • The memory requirements are small enough that I can run it on the same nodes that I’m running kubernetes on.
  • It dynamically balances disk usage across the bricks - when I added a third brickserver to my cluster, moosefs shuffled replica chunks over to it until all three servers had a roughly equal usage percentage.
  • It allows custom storage policies:
    • You can label storage bricks (say SSD as label A, spinning disks as label B) and use the labels in policy definitions.
    • It’s flexible - you can create policies with different replication requirements and assign those on a per-directory or even per-file basis.
    • By referring to brick labels, you can do things like create a policy that requires that at file creation, one replica be written to SSD and one to spinning disk, and then after that initial write is complete (so it can report back to the writing process that the write is done), that it then try to make sure there is a third copy so that there are two copies on spinning disks and one on SSD.
    • You can make policies that change replication policies after user-specified amounts of time - so maybe your policy is that new files get one copy on SSD and 2 on spinning disk, but after 30 days, switch to one copy on a regular spinning disk and two on bigger slower drives.

Installing

You’re going to need a moosefs master, chunkservers for each machine that hosts drives, and should run a metadata backup server. You may also want to run the cgi to visualize the cluster status.

Pre-Requisites

  1. Static IPs for the cluster nodes, preferably with DNS entries. Setting this up is out of scope for this post.
  2. Decide which nodes will just be chunkservers, which will be the master, and optionally which are going to be metadata servers and cgi servers. You can run the master, metadata and cgi servers on machines that are also chunkservers.

All servers

Add the MooseFS PPA

  1. Add a file, /etc/apt/sources.list.d/moosefs.list, with the following contents
deb http://ppa.moosefs.com/moosefs-3/apt/raspbian/jessie jessie main
  1. Run wget -O - https://ppa.moosefs.com/moosefs.key | sudo apt-key add - to add the moosefs PPA key

  2. Run apt-get update

Master Server

Install the master server software on one of your nodes with apt install moosefs-master. Do this first, the chunkservers will need to communicate with it.

Optionally install the cgi server with apt install moosefs-cgiserv

Configure /etc/mfs/mfsmaster.cfg and /etc/mfs/mfsexports.cfg. Start by copying /etc/mfs/mfsmaster.cfg.sample and /etc/mfs/mfsexports.cfg.sample.

Set the master software to start on boot with systemctl enable moosefs-master. If you installed the cgi server, enable it too with systemctl enable moosefs-cgiserv.

Start the master and cgi server with systemctl start moosefs-master && systemctl start moosefs-cgiserv.

Chunkservers

For each of your chunkservers, take the following steps:

Install moosefs software

Install the software with apt install moosefs-chunkserver.

Configure which drives to use for storage

Make a directory to store the moosefs data. On my HC2 instances, I mount the data drives on /mnt/sata, and keep the raw mfs data in /mnt/sata/moosefs

Configure /etc/mfs/mfshdd.cfg. There’s an example in /etc/mfs/mfshdd.cfg.sample - add one line per directory to be shared.

In my case, I want to keep 50 gigs free on the drive, so my entry in mfshdd.cfg is

/mnt/sata/moosefs -50GB

Configure the chunkserver options

Copy /etc/mfs/mfschunkserver.cfg.sample to /etc/mfs/mfschunkserver.cfg and edit it to meet your needs - at a minimum, you’ll need to set MASTER_HOST = yourmaster.example.com

Enable and start the chunkserver

Set up the chunkserver to start at boot, and start it now -

systemctl enable moosefs-chunkserver && systemctl start moosefs-chunkserver

Mounting the filesystem

Now that the chunkservers are talking to the master, you can set up automounting it on your nodes.

First, install the client software - sudo apt install -y moosefs-client

Second, make a mountpoint. On my nodes, I’m using /data/squirrel, so sudo mkdir -p /data/squirrel

Finally, create a systemd unit file so that the filesystem mounts every boot. I want to be able to use hostPath directives in my kubernetes deployments, so I want it to start before docker and kubelet. Make a file, /etc/systemd/system/yourcluster-mfsmount.service with the following content (replace /mountpoint with whatever mountpoint you’re using):

# Original source: https://sourceforge.net/p/moosefs/mailman/message/29522468/
[Unit]
Description=MooseFS mounts
After=syslog.target network.target ypbind.service moosefs-chunkserver.service moosefs-master.service
Before=docker.service kubelet.service

[Service]
Type=forking
TimeoutSec=600
ExecStart=/usr/bin/mfsmount /mountpoint -H YOUR_MASTER_SERVER
ExecStop=/usr/bin/umount /mountpoint

[Install]
WantedBy=multi-user.target

Enable it so it starts every boot:

systemctl enable yourcluster-mfsmount && systemctl start yourcluster-mfsmount