Using a Standalone Elastic File System with EC2 Instances

May 17th, 2017 Permalink

Amazon's Elastic File System (EFS) can in some circumstances be very useful tool when deploying any sort of application stack that expects to have access to a persistent file system across deployments and restarts. This is true for small internal applications in which there is just one server instance behind an Auto Scaling Group and Elastic Load Balancer, set up to make sure that the instance will be replaced should it fall into a bad state. It is just as true for large, complex application stacks that automatically scale up and down, and where all of the instances maintain some portion of the application state in a common file system.

Setting up and maintaining your own network file system installation is a real pain if you're not very experienced in this particular area of operations, and I'd say that EFS is worth the cost for most organizations and use cases. It makes this portion of the application infrastructure a lot easier to deal with. That said, the documentation, like much of Amazon's documentation, is really only helpful in hindsight, after you have become familiar with EFS. Prior to that point, it can be a little challenging to come to an understanding of the best way to work with this tool.

Cautions on Costs and Limits

EFS has rate-limiting on throughput, and this is determined by the size of the file system. Something like, say, a WordPress or other PHP application webroot is tiny in the grand scheme of things, but needs to conduct a large number of file operations per request. It will thus not have a high enough throughput to support its use under most circumstances. The alternatives here are to accept that this use case falls outside the intended use of EFS, or write a large enough amount of data (a terabyte) to the filesystem to bump the throughput limits into an acceptable tier. Sadly, this costs enough to make it worth using some other approach if you can't share space on an existing EFS containing a large amount of data.

EFS in its own CloudFormation Template

My preferred approach to EFS as a persistent file system for an application is to put its definition into a CloudFormation template on its own, and deploy and manage it separately from the rest of the application infrastructure. In fact, I like to do this for Elastic Load Balancers as well. In both cases it is a resource that should be specified in code and thus under version control, but that one is unlikely to redeploy frequently, if ever. If you do, that redeployment is as likely to be a manual and careful migration process as it is to be a straight update.

Here is an example template for a standalone EFS, assuming deployment into an existing VPC - which certainly should be the case if using AWS in earnest. It takes the required pieces from the perhaps overly comprehensive example application template provided by Amazon. For the sake of simplicity it only illustrates deployment of one mount target to a single VPC subnet, though in larger installations it is probable that there will be multiple subnets, all of which will need distinct mount targets for the EFS. As can be seen, it is quite short and simple by CloudFormation standards:

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Application EFS",
  "Parameters": {
    "ApplicationId": {
      "Type": "String",
      "Description": "Used to name and and tag application resources."
    },
    "SubnetId": {
      "Type": "String",
      "Description": "ID of the deployment subnet."
    },
    "VpcId": {
      "Type": "String",
      "Description": "ID of the deployment VPC."
    }
  },
  "Resources": {,
    "ElasticFileSystem": {
      "Type": "AWS::EFS::FileSystem",
      "Properties": {
        "PerformanceMode": "generalPurpose",
        "FileSystemTags": [
          {
            "Key": "Name",
            "Value": {
              "Fn::Join": [
                "",
                [
                  {
                    "Ref": "ApplicationId"
                  },
                  "-efs"
                ]
              ]
            }
          },
          {
            "Key": "ApplicationId",
            "Value": { "Ref": "ApplicationId" }
          }
        ]
      }
    },
    "MountTargetSecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "VpcId": { "Ref": "VpcId" },
        "GroupDescription": "Allow VPC traffic to the EFS mount target.",
        "SecurityGroupIngress": [
          {
            "IpProtocol": "tcp",
            "FromPort": "2049",
            "ToPort": "2049",
            "CidrIp": "10.0.0.0/8"
          }
        ]
      }
    },
    "MountTarget": {
      "Type": "AWS::EFS::MountTarget",
      "Properties": {
        "FileSystemId": { "Ref": "FileSystem" },
        "SubnetId": { "Ref": "SubnetId" },
        "SecurityGroups": [
          { "Ref": "MountTargetSecurityGroup" }
        ]
      }
    }
  }
}

Provision Instances with a Mount Script

An instance in the application stack should be provisioned with a script that mounts the appropriate EFS. That can be a part of the rest of the provisioning scripts and other resources loaded up into a new instance. This example requires an application identifier - that can be used to derive the name of the EFS - to be set as an environment variable, and the AWS CLI to be installed on the instance.

#!/bin/bash
#
# Create the mount point directory /mnt/efs and mount an Elastic File
# System (EFS) to it.
#
# The EFS instance is located by its name.
#
# This script can only run on an AWS instance. It should run as root, and be
# invoked at instance startup. The script expects the following environment 
# variable to be set:
#
# APPLICATION_ID - the identifier used to find resources for the application.
#

set -o errexit

# ----------------------------------------------------------------------------
# Create mount point directory
# ----------------------------------------------------------------------------

AVAILABILITY_ZONE=$(
  curl -sf http://169.254.169.254/latest/meta-data/placement/availability-zone/
)
REGION=$(
  echo "${AVAILABILITY_ZONE}" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'"
)
EFS_NAME="${APPLICATION_ID}-efs"
EFS_MOUNT_DIR="/mnt/efs"

mkdir -p "${EFS_MOUNT_DIR}"

EFS_ID=$(
  aws efs describe-file-systems \
    --query 'FileSystems[*].[Name,FileSystemId]' \
    --region "${REGION}" \
    --output text \
      | grep "${EFS_NAME}" \
      | head -n1 \
      | cut -f 2
)

if [ -z "${EFS_ID}" ]; then
  >&2 echo "Unable to find EFS instance."
  exit 1
fi

echo "Mounting EFS ${EFS_NAME} / ${EFS_ID} to ${EFS_MOUNT_DIR}..."

# Find the mount targets for the desired EFS. This will be a
# newline delimited list in the format "SubnetID(tab)IPs".
MOUNT_TARGETS=$(
  aws efs describe-mount-targets \
    --file-system-id ${EFS_ID} \
    --region ${REGION} \
    --output text \
    --query 'MountTargets[*].{ID:SubnetId,IP:IpAddress}'
)
# Turn it into a space-delimited list of subnet IDs.
SUBNET_IDS=$(
  echo "${MOUNT_TARGETS}" | awk '{printf "%s ",$1}'
)

# Obtain the availability zone for each subnet. The result is a newline
# delimited list in the format "SubnetID(tab)SubnetAZ".
SUBNETS=$(
  aws ec2 describe-subnets \
    --subnet-ids ${SUBNET_IDS} \
    --region ${REGION} \
    --output text \
    --query 'Subnets[*].{ID:SubnetId,AZ:AvailabilityZone}'
)

# Pick the subnet in the same availability zone. If only
# setting up mount targets in some subnets / availability zones,
# then the deployment of instances has to structured such that they
# are launched in those subnets / availability zones.
MOUNT_SUBNET=$(
  echo "${SUBNETS}" | grep ${AVAILABILITY_ZONE} | awk '{printf "%s",$2}'
)

if [ -z "${MOUNT_SUBNET}" ]; then
  >& echo "Unable to find mount target in the current subnet."
  exit 1
fi

MOUNT_IP=$(
  echo "$MOUNT_POINTS" | grep ${MOUNT_SUBNET} | awk '{printf "%s",$2}'
)

echo "Using mounting point with IP ${MOUNT_IP} from subnet ${MOUNT_SUBNET}..."

# In this case, mount read/write, and mount at the root of the elastic 
# file system rather than to a subdirectory. This is suitable for EFS that
# is dedicated to the application, and shared across all application
# instances.
mount -t nfs4 \
  -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
  ${MOUNT_IP}:/ \
  "${EFS_MOUNTPOINT}"

Invoke the Mount Script on Startup

The easiest way to invoke the mount script at an appropriate time is to do so in the userdata script or other cloud-init configuration passed in to the instance. This is a partial example of an application CloudFormation template to illustrate the point:

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "DefectDojo Server Stack.",

  "Parameters": { 
    "ApplicationId": {
      "Type": "String",
      "Description": "Used to name and and tag application resources."
    },
    "ImageId": {
      "Type": "String",
      "Description": "The AMI ID."
    },
    "InstanceType": {
      "Type": "String",
      "Description": "The instance type to use."
    },
    "KeyName": {
      "Type": "String",
      "Description": "The EC2 key pair used for instances."
    },

    ...

  },
  "Resources": {

    ...

    "LaunchConfiguration": {
      "Type": "AWS::AutoScaling::LaunchConfiguration",
      "Properties": {
        "KeyName": { "Ref": "KeyName" },
        "ImageId": { "Ref": "ImageId" },
        "IamInstanceProfile": ...,
        "InstanceType": "m3.large",
        "InstanceMonitoring": "false",
        "SecurityGroups": [
          ...
        ],
        "UserData":{
          "Fn::Base64": {
            "Fn::Join": [
              "\n",
              [
                "#!/bin/bash",
                "set -o errexit -o nounset -o xtrace",
                "# Export this environment variable for the mount script.",
                {
                  "Fn::Join": [
                    "",
                    [
                      "export APPLICATION_ID=\"",
                      { "Ref": "ApplicationId" },
                      "\""
                    ]
                  ]
                },
                "export PATH=\"${PATH}:/usr/local/bin\"",

                "# Mount the EFS.",
                "/root/deploy/mount-efs.sh",

                "# Now get on with the rest of provisioning...",

                ...

              ]
            ]
          }
        }
      }
    }, 

  ...
}