Friday, 21 October 2016

AWS CLI - Switching to and from regional EC2 reserved instances

AWS recently announced the availability of regional reserved instances, this post explains how to switch a reservation from AZ specific to regional (and back) using the AWS CLI.

Step 1, find the reservation to modify

$ aws ec2 describe-reserved-instances --filters Name=state,Values=active
{
    "ReservedInstances": [
        {
            "ReservedInstancesId": "c416aeaf-fb64-4218-970f-7426f6f32377", 
            "OfferingType": "No Upfront", 
            "AvailabilityZone": "eu-west-1c", 
            "End": "2017-10-21T08:45:55.000Z", 
            "ProductDescription": "Linux/UNIX", 
            "Scope": "Availability Zone", 
            "UsagePrice": 0.0, 
            "RecurringCharges": [
                {
                    "Amount": 0.01, 
                    "Frequency": "Hourly"
                }
            ], 
            "OfferingClass": "standard", 
            "Start": "2016-10-21T08:45:56.708Z", 
            "State": "active", 
            "FixedPrice": 0.0, 
            "CurrencyCode": "USD", 
            "Duration": 31536000, 
            "InstanceTenancy": "default", 
            "InstanceType": "t2.micro", 
            "InstanceCount": 1
        }
    ]
}

The "Scope" field in the response shows that this reservation is currently specific to an Availability Zone, eu-west-1c in this case.

Step 2, request the modification

$ aws ec2 modify-reserved-instances --reserved-instances-ids c416aeaf-fb64-4218-970f-7426f6f32377 --target-configurations Scope=Region,InstanceCount=1
{
    "ReservedInstancesModificationId": "rimod-aaada6ed-fec9-47c7-92e2-6edf7e61f2ce"
}

The Scope=Region indicates that this reservation should be converted to a regional reservation, InstanceCount is a required parameter to indicate the number of reservations the modification should be applied to.

Step 3, monitor progress

$ aws ec2 describe-reserved-instances-modifications
{
    "ReservedInstancesModifications": [
        {
            "Status": "processing", 
            "ModificationResults": [
                {
                    "ReservedInstancesId": "35f9b908-ae36-41ca-ac0b-4c67c887135b", 
                    "TargetConfiguration": {
                        "InstanceCount": 1
                    }
                }
            ], 
            "EffectiveDate": "2016-10-21T08:45:57.000Z", 
            "CreateDate": "2016-10-21T08:50:28.585Z", 
            "UpdateDate": "2016-10-21T08:50:31.098Z", 
            "ReservedInstancesModificationId": "rimod-aaada6ed-fec9-47c7-92e2-6edf7e61f2ce", 
            "ReservedInstancesIds": [
                {
                    "ReservedInstancesId": "c416aeaf-fb64-4218-970f-7426f6f32377"
                }
            ]
        }
    ]
}

The "Status" in the response will show "processing" until the modification has completed successfully, at which time it will change to "fulfilled":

$ aws ec2 describe-reserved-instances-modifications
{
    "ReservedInstancesModifications": [
        {
            "Status": "fulfilled", 
            "ModificationResults": [
                {
                    "ReservedInstancesId": "35f9b908-ae36-41ca-ac0b-4c67c887135b", 
                    "TargetConfiguration": {
                        "InstanceCount": 1
                    }
                }
            ], 
            "EffectiveDate": "2016-10-21T08:45:57.000Z", 
            "CreateDate": "2016-10-21T08:50:28.585Z", 
            "UpdateDate": "2016-10-21T09:11:33.454Z", 
            "ReservedInstancesModificationId": "rimod-aaada6ed-fec9-47c7-92e2-6edf7e61f2ce", 
            "ReservedInstancesIds": [
                {
                    "ReservedInstancesId": "c416aeaf-fb64-4218-970f-7426f6f32377"
                }
            ]
        }
    ]
}

Step 4, success!

The new reservation is now regional (Scope=Region):

$ aws ec2 describe-reserved-instances --filters Name=state,Values=active
{
    "ReservedInstances": [
        {
            "ReservedInstancesId": "35f9b908-ae36-41ca-ac0b-4c67c887135b", 
            "OfferingType": "No Upfront", 
            "FixedPrice": 0.0, 
            "End": "2017-10-21T08:45:55.000Z", 
            "ProductDescription": "Linux/UNIX", 
            "Scope": "Region", 
            "UsagePrice": 0.0, 
            "RecurringCharges": [
                {
                    "Amount": 0.01, 
                    "Frequency": "Hourly"
                }
            ], 
            "OfferingClass": "standard", 
            "Start": "2016-10-21T08:45:57.000Z", 
            "State": "active", 
            "InstanceCount": 1, 
            "CurrencyCode": "USD", 
            "Duration": 31536000, 
            "InstanceTenancy": "default", 
            "InstanceType": "t2.micro"
        }
    ]
}

Switching back

Follows the same process with the added requirement of specifying which AZ the reservation should be linked to:

$ aws ec2 modify-reserved-instances --reserved-instances-ids 35f9b908-ae36-41ca-ac0b-4c67c887135b --target-configurations Scope="Availability Zone",InstanceCount=1,AvailabilityZone=eu-west-1b
{
    "ReservedInstancesModificationId": "rimod-9e490be9-55a3-48cf-81e9-2662b13db2f8"
}

$ aws ec2 describe-reserved-instances --filters Name=state,Values=active
{
    "ReservedInstances": [
        {
            "ReservedInstancesId": "df70d097-2f33-4962-bca6-37af15ca819e", 
            "OfferingType": "No Upfront", 
            "AvailabilityZone": "eu-west-1b", 
            "End": "2017-10-21T08:45:55.000Z", 
            "ProductDescription": "Linux/UNIX", 
            "Scope": "Availability Zone", 
            "UsagePrice": 0.0, 
            "RecurringCharges": [
                {
                    "Amount": 0.01, 
                    "Frequency": "Hourly"
                }
            ], 
            "OfferingClass": "standard", 
            "Start": "2016-10-21T08:45:58.000Z", 
            "State": "active", 
            "FixedPrice": 0.0, 
            "CurrencyCode": "USD", 
            "Duration": 31536000, 
            "InstanceTenancy": "default", 
            "InstanceType": "t2.micro", 
            "InstanceCount": 1
        }
    ]
}

Wednesday, 31 August 2016

AWS troubleshooting - Lamba deployment package file permissions

When creating your own Lambda deployment packages be aware of the permissions on the files before zipping them. Lambda requires the files to have read access for all users, particularly "other", if this is missing you will receive a non-obvious error when trying to call the function. The fix is simple enough, perform a 'chmod a+r *' before creating your zip file. If the code is visible in the inline editor adding an empty line and saving will also fix the problem, presumably by overwriting the file with the correct permissions.

Below are some examples of errors you will see in the various languages if read permissions are missing. Hopefully this post will have saved you some time debugging.

Java CloudWatch logs:
--
Class not found: example.Hello: class java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: example.Hello
at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
Caused by: java.io.FileNotFoundException: /var/task/example/Hello.class (Permission denied)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1251)
at sun.misc.Resource.cachedInputStream(Resource.java:77)
at sun.misc.Resource.getByteBuffer(Resource.java:160)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:454)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
... 7 more
--

Java execution result (testing from console):
{
  "errorMessage": "Class not found: example.Hello",
  "errorType": "class java.lang.ClassNotFoundException"
}

Python CloudWatch logs:
--
Unable to import module 'python-hi': No module named python-hi
--

Python execution result (testing from console):
{
  "errorMessage": "Unable to import module 'python-hi'"
}

Node CloudWatch logs:
--
module initialization error: Error
    at Error (native)
    at Object.fs.openSync (fs.js:549:18)
    at Object.fs.readFileSync (fs.js:397:15)
    at Object.Module._extensions..js (module.js:415:20)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
--

Node execution result (testing from console):
{
  "errorMessage": "EACCES: permission denied, open '/var/task/node-hi.js'",
  "errorType": "Error",
  "stackTrace": [
    "Object.fs.openSync (fs.js:549:18)",
    "Object.fs.readFileSync (fs.js:397:15)",
    "Object.Module._extensions..js (module.js:415:20)",
    "Module.load (module.js:343:32)",
    "Function.Module._load (module.js:300:12)",
    "Module.require (module.js:353:17)",
    "require (internal/module.js:12:17)"
  ]
}

Friday, 12 August 2016

AWS Tip of the day: Tagging EC2 reserved instances

A quick post pointing out that EC2 reserved instances actually support tagging. This functionality is only available on the command line of via the API and not via the console but it still allows to you tag your reservations making it easier to keep track of why a reserved instance was purchased and what component it was intended for. Of course the reservation itself is not actually tied to a running instance in any way, it is merely a billing construct that is applied to any matching instances running in your account but if you are making architectural changes or considering different instance types for specific workloads or components the tags allow you (and your team) to see why the reservation was originally purchased. So for example if you are scaling up the instance sizes of a specific component, lets say from m4.large to m4.xlarge, you can check your reserved instance tags and modify the reservations associated with the component to ensure you continue to benefit from the purchase.

The tagging of reserved instances works the same as tagging other EC2 resources through the AWS CLI's ec2 create-tags command and specifying the reserved instances ID as the resource ID. You can find the reserved instance ID using the CLI's ec2 describe-reserved-instances command. Using an actual example, lets start off finding a reservation:

$ aws ec2 describe-reserved-instances
{
    "ReservedInstances": [
        {
            "ReservedInstancesId": "3d092b71-5243-4e5e-b409-86df342282ab", 
            "OfferingType": "No Upfront", 
            "AvailabilityZone": "eu-west-1c", 
            "End": "2017-08-12T04:48:58.000Z", 
            "ProductDescription": "Linux/UNIX", 
            "UsagePrice": 0.0, 
            "RecurringCharges": [
                {
                    "Amount": 0.01, 
                    "Frequency": "Hourly"
                }
            ], 
            "Start": "2016-08-12T04:48:59.763Z", 
            "State": "active", 
            "FixedPrice": 0.0, 
            "CurrencyCode": "USD", 
            "Duration": 31536000, 
            "InstanceTenancy": "default", 
            "InstanceType": "t2.micro", 
            "InstanceCount": 1
        }
    ]
}

Next, lets add a tag indicating that this reservation is intended for the "production" stack.:
$ aws ec2 create-tags --resources 3d092b71-5243-4e5e-b409-86df342282ab --tags Key=Stack,Value=production


Checking the result:
$ aws ec2 describe-reserved-instances
{
    "ReservedInstances": [
        {
            "ReservedInstancesId": "3d092b71-5243-4e5e-b409-86df342282ab", 
            "OfferingType": "No Upfront", 
            "AvailabilityZone": "eu-west-1c", 
            "End": "2017-08-12T04:48:58.000Z", 
            "ProductDescription": "Linux/UNIX", 
            "Tags": [
                {
                    "Value": "production", 
                    "Key": "Stack"
                }
            ], 
            "UsagePrice": 0.0, 
            "RecurringCharges": [
                {
                    "Amount": 0.01, 
                    "Frequency": "Hourly"
                }
            ], 
            "Start": "2016-08-12T04:48:59.763Z", 
            "State": "active", 
            "FixedPrice": 0.0, 
            "CurrencyCode": "USD", 
            "Duration": 31536000, 
            "InstanceTenancy": "default", 
            "InstanceType": "t2.micro", 
            "InstanceCount": 1
        }
    ]
}

Great we have a tag but what if we have hundreds of reservations, a long list of reservations is not particularly useful for quickly identifying the reservations related to a component or stack. The CLI's query and output functionality can help here:

$ aws ec2 describe-reserved-instances --query 'ReservedInstances[*].{AZ:AvailabilityZone,Type:InstanceType,Expiry:End,stack:Tags[?Key==`Stack`][?Value==`production`]}' --output=table
--------------------------------------------------------
|               DescribeReservedInstances              |
+-------------+----------------------------+-----------+
|     AZ      |          Expiry            |   Type    |
+-------------+----------------------------+-----------+
|  eu-west-1c |  2017-08-12T04:48:58.000Z  |  t2.micro |
+-------------+----------------------------+-----------+

Not quite the console view but easy enough to see that we have one reservation for the "production" Stack.

Tuesday, 7 June 2016

AWS Tip: Save S3 costs with abort multipart lifecycle policy

Introduction

S3 multipart uploads provide a number of benefits -- better throughput, recovery from network errors -- and a number of tools will automatically use multipart uploads for larger uploads. The AWS CLI cp, mv, and sync commands all make use of multipart uploads and make a note that "If the process is interrupted by a kill command or system failure, the in-progress multipart upload remains in Amazon S3 and must be cleaned up manually..."

The reason you would want to clean up these failed multipart uploads is because you will be charged for the storage they use while waiting for the upload to be completed (or aborted). This post provides some detail on how to find the incomplete uploads and options for removing them to save storage costs.

Finding incomplete multipart uploads

If you have relatively few buckets or only want to check your biggest buckets (CloudWatch S3 metrics are useful for finding these) the AWS CLI s3api list-multipart-uploads command is a simple check:
aws s3api list-multipart-uploads --bucket [bucket-name]
No output indicates that the bucket does not contain any incomplete uploads, see the list-multipart-uploads documentation linked above for an example of the output on a bucket that does contain an incomplete upload. A simple bash script to check all your buckets:

This script will list your buckets and the first page (out of possibly many more) incomplete multipart upload keys along with the date they were initiated. The region lookup is required to handle bucket names containing dots and buckets in eu-central-1 (SigV4).

Cleaning up

Once you have identified the buckets containing incomplete uploads it is worth investigating some of the recent failed uploads to see whether there is an underlying issue that needs to be addressed, particularly if the uploads relate to backups or important log files. The most typical cause is instances being terminated before completing uploads (look at Lifecycle Hooks to fix this if you are using Auto Scaling) but they may also be result of applications not performing cleanup on failures or not handling errors correctly.

A multipart upload can be aborted using the abort-multipart-upload s3api command in the AWS CLI using the object key and upload ID returned by list-multipart-uploads command. This can be scripted but will take time to complete for buckets containing large numbers of incomplete uploads, fortunately there is an easier way. S3 now supports a bucket lifecycle policy to automatically delete incomplete uploads after a specified period of time. Enabling the policy in the AWS console is fairly quick and easy, see Jeff's blog post for details. A rather messy boto3 example script for enabling the policy on all buckets can be found here, it should work with most bucket configurations but it comes with no guarantees and you use it at your own risk.

Conclusion (why you should do this)

If you are using S3 to store large (> 5MB) objects and you are spending more than a few dollars a month on S3 storage then there is a fairly good chance that you are paying unnecessarily for failed/incomplete multipart uploads. It should only take a few minutes to review your buckets and could potentially have significant monthly savings.

Thursday, 5 May 2016

Enabling longer AWS IDs in a region using an IAM role

AWS is moving towards longer EC2 and EBS IDs and you can enable them for an IAM user or at an account level using the root credentials. You can avoid using the root credentials by using an IAM role instead. This is a quick post to explain the steps needed to use an IAM role on an instance to enable the longer IDs at an account level.

Update: You can now use the --principal-arn argument to to make this change to the root account, see the AWS support post here.

Update 2: Another alternative, using this script.

1. Create a policy allowing modify and describe of ID format (IAM console -> Policies -> Create Policy -> Create Your Own Policy). The following policy document (originally from here with the Deny changed to Allow) provides the necessary permissions,

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:ModifyIdFormat", "ec2:DescribeIdFormat"
      ],
      "Resource": "*"
    }
  ]
}

2. Create an EC2 role (IAM console -> Roles -> Create New Role -> [Pick a name] -> Amazon EC2). Select the policy name you created in step 1 and attach it to the role.

3. Launch an instance with the role. It is probably easiest to use an Amazon Linux AMI as the AWS CLI is installed by default. Instance size and region don't really matter.

4. SSH to the instance and use the CLI to enable the longer IDs in the region that you would like to test against:
$ aws --region eu-west-1 ec2 modify-id-format --resource instance --use-long-ids

If you don't specify a region the CLI will prompt you to run 'aws configure' to set a default. To check the status of the ID format you can use describe-id-format:
$ aws --region eu-west-1 ec2 describe-id-format

{
    "Statuses": [
        {
            "UseLongIds": false,
            "Resource": "reservation"
        },
        {
            "UseLongIds": true,
            "Resource": "instance"
        },
        {
            "UseLongIds": false,
            "Resource": "volume"
        },
        {
            "UseLongIds": false,
            "Resource": "snapshot"
        }
    ]
}

5. Longer IDs are now enabled for the account. Resources created by the root account (such as Auto Scaled instances) will be launched with longer IDs.

Wednesday, 4 May 2016

'Connection to s3.amazonaws.com timed out' using VPC S3 Endpoint

A quick post to point you in the right direction if you are getting connection time out errors using the AWS CLI or Boto with AWS S3 VPC Endpoints in a private subnet. The two most likely causes of this are:

1. Your bucket name contains dots.
To work around this you can specify the region of the bucket, for example:
aws --region eu-west-1 s3 ls s3://bucket.with.dots/

2. You are trying to access a bucket in a different region.
This is not supported by VPC Endpoints, see the restrictions here.

Thursday, 18 February 2016

Memcached item memory usage

Introduction

I was recently involved with an investigation into a Memcached cluster running on AWS ElastiCache and discovered a few interesting details about Memcached. This post looks at the memory overhead of storing items in Memcached and demonstrates the impact of disabling CAS (check and set) on memory utilisation.

Background - Memcached item overhead

Before getting into the details some background might be useful. Starting with the basics, how much memory does Memcached actually need to store an item? Lets spin up a cluster to check, for simplicity I am using a single m3.large ElastiCache node (running memcached 1.4.24) which according to the AWS docs provides around 6.2GB of memory for cache items.

The easiest way to look at our new cache is by telnetting to the cluster (from within the VPC) and running commands directly. To start, what does the slab allocation look like:

> stats slabs
< STAT active_slabs 0
< STAT total_malloced 0
< END

As expected, our empty cache has 0 active slabs and no memory allocated. Lets add a value and see how this changes things:

> set k 0 500 1
> v
< STORED

Above, we have set an item with a key of 'k', no flags, a 500 second expiry time and a 1 byte value of 'v'. Looking at the slab stats we now see:

> stats slabs
< STAT 1:chunk_size 96
< STAT 1:chunks_per_page 10922
< STAT 1:total_pages 1
< STAT 1:total_chunks 10922
< STAT 1:used_chunks 1
< STAT 1:free_chunks 10921
< STAT 1:free_chunks_end 0
< STAT 1:mem_requested 67
< STAT 1:get_hits 0
< STAT 1:cmd_set 1
< STAT 1:delete_hits 0
< STAT 1:incr_hits 0
< STAT 1:decr_hits 0
< STAT 1:cas_hits 0
< STAT 1:cas_badval 0
< STAT 1:touch_hits 0
< STAT active_slabs 1
< STAT total_malloced 1048512
< END

Lots of information here but lets focus on the chunks and memory stats for the moment. We now have 1 active slab with a chunk size of 96 bytes and  one of these chunks (used_chunks) has been used to store our item, all expected except the relatively large size of the chunk used. We can see the actual memory requested was 67 bytes which means that 65 bytes of overhead are required to store the 2 byte item. Looking into the memcached code we can see that the item struct defined in memcached.h requires 48 bytes plus another 8 bytes for CAS. The remaining 11 bytes consist of:
  • The null terminated key (2 bytes, 1 byte of overhead, "k\0" in our example)
  • The CRLF terminated data (3 bytes, 2 bytes of overhead, "v\r\n" in our example)
  • The item header which is a formatted string containing the flags, data length and terminated with CRLF (6 bytes, the minimum, in our example "_0_1\r\n" with underscores to make the spaces more visible)
The header is created in the item_make_header function in items.c, the relevant part being:

    *nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes - 2);

The header formats the flags and data length as integers in a string so this overhead will increase as the data size and flag values increase. For example, setting the flag to 10 will require an extra byte of to store the extra digit in the header:

> set k 10 500 1
> v
< STORED
> stats slabs
< STAT 1:chunk_size 96
< STAT 1:used_chunks 1
< STAT 1:free_chunks 10921
< STAT 1:mem_requested 68
...
< STAT active_slabs 1
< STAT total_malloced 1048512
< END

So you can expect at least 65 bytes of overhead per item.

Disable CAS if you don't need it

CAS stands for 'Check And Set' which the Memcached protocol docs describe as: "store this data but only if no one else has updated since I last fetched it."

For a fair number of Memcached use cases CAS will not be used and as a result can be disabled with no impact. You can check your cluster using the stats command and looking at the cas_* values:

> stats
...
< STAT cas_misses 0
< STAT cas_hits 0
< STAT cas_badval 0
...

Sounds simple enough, lets test it out. After setting cas_disabled to 1 on our cluster and rebooting for the change to take effect, we set our key again:

> set k 0 500 1
> v
< STORED
> stats slabs
< STAT 1:chunk_size 96
< STAT 1:used_chunks 1
< STAT 1:free_chunks 10921
< STAT 1:mem_requested 59
...
< STAT active_slabs 1
< STAT total_malloced 1048512
< END

Not much difference except for the requested memory reducing by 8 bytes. Disabling CAS will NOT reduce the wasted memory on its own but it will allow an item to contain an extra 8 bytes of data where previously it would have needed to use a larger chunk which might be wasting significantly more memory. Lets look at an example.

Assume that you are wanting to cache an item of 32 bytes, a 16 byte key and a 16 byte value. From the background section above we know that this will take 66 bytes of overhead pushing the total item size to 98 bytes, 2 bytes more than the chunk size of the first slab (using Memcached defaults), on a CAS enabled node this shows:

> set 0123456789ABCDEF 0 500 16
> 0123456789ABCDEF
< STORED
> stats slabs
< STAT 2:chunk_size 120
< STAT 2:used_chunks 1
< STAT 2:free_chunks 8737
< STAT 2:mem_requested 98
...
< STAT active_slabs 1
< STAT total_malloced 1048560
< END

Whereas with CAS disabled the overhead is reduced to 58 bytes, allowing the item to fit into the original slab:

> set 0123456789ABCDEF 0 500 16
> 0123456789ABCDEF
< STORED
> stats slabs
> STAT 1:chunk_size 96
> STAT 1:used_chunks 1
> STAT 1:free_chunks 10921
> STAT 1:mem_requested 90
...
> STAT active_slabs 1
> STAT total_malloced 1048512
> END

This means that with CAS enabled we are wasting 18% of the memory in each chunk of this slab (22 out of 120 bytes lost per item) compared to just 6% (6 out of 96 bytes) lost with CAS disabled. If the entire cache consisted of items this size, it would result in more than 1 GB of wasted memory compared to 372 MB wasted with CAS disabled. To put this differently, disabling CAS would allow you to store approximately 7 million additional 90 byte items resulting in better cache hit rates and lower eviction rates. This is of course an artificial example so lets look at a more realistic example where the cache item sizes are not quite so uniform.

The following stats are based on a production CAS enabled node's stats adjusted by a factor of ten to fit on an m3.large (the production node had around 59 million items in slab 11 this was scaled to 5.9 million for testing). The test code used to generate this data is relatively simple and distributes the items in a slab fairly evenly across the item sizes of a slab, each item's size is slightly weighted to try and match the average item size in production.

SlabChunk SizeChunks UsedMemory AllocatedMemory RequestedAverage Item SizeMemory Wasted
4192139,81026,843,52025,706,5461841,136,974
524055,92413,421,76011,399,7762042,021,984
10752232,025174,482,800162,673,51570111,809,285
119445,985,7355,650,533,8404,643,546,6397761,006,987,201
121,18410,95112,965,98411,429,9651,0441,536,019
131,4807,17710,621,9609,421,6131,3131,200,347
141,8566,07111,267,77610,148,6641,6721,119,112
152,3205,32512,354,00011,094,2912,0831,259,709
162,9044,62113,419,38411,947,5742,5851,471,810
173,6323,69513,420,24011,906,6343,2221,513,606
184,5445,20123,633,34421,025,6094,0432,607,735
195,6804,52325,690,64023,038,0125,0942,652,628
207,1043,77826,838,91223,778,0176,2943,060,895
218,8803,02226,835,36023,630,9797,8203,204,381
2211,1042,41726,838,36823,754,9449,8283,083,424
2313,8801,93326,830,04023,092,97011,9473,737,070
2417,3521,10719,208,66415,554,07514,0513,654,589
6,473,3156,115,206,5925,063,149,8231,052,056,769

The table shows that we are caching around 6.4 million items with the majority of them falling in slabs 4, 10, and 11 and that around 1 GB of memory is being wasted. Average item size is computed from (memory requested / chunks used, rounded to the nearest byte) and is an indicator of the actual item sizes (including overhead) within each slab. An interesting point relevant to the CAS discussion is that the average item size for slab 11 is 776 bytes which is only marginally bigger than the chunk size for slab 10.

Running the test code against a node with CAS disabled yields the following results:

SlabChunk SizeChunks UsedMemory AllocatedMemory RequestedAverage Item SizeMemory Wasted
4192165,99031,870,08029,471,7291782,398,351
524029,7447,138,5606,068,7212041,069,839
107523,365,9262,531,176,3522,496,542,13274234,634,220
119442,853,0792,693,306,5762,261,103,795793432,202,781
121,18410,33112,231,90410,910,8291,0561,321,075
131,4806,6039,772,4408,703,3011,3181,069,139
141,8566,11611,351,29610,202,5071,6681,148,789
152,3205,46912,688,08011,429,1952,0901,258,885
162,9044,65113,506,50412,137,6202,6101,368,884
173,6323,62913,180,52811,834,3843,2611,346,144
184,5445,08923,124,41620,661,8494,0602,462,567
195,6804,55725,883,76023,299,1945,1132,584,566
207,1043,78626,895,74423,983,9656,3352,911,779
218,8803,05627,137,28024,146,4697,9012,990,811
2211,1042,37626,383,10423,654,2359,9552,728,869
2313,8802,29731,882,36028,479,83012,3993,402,530
2417,35261610,688,8328,733,54814,1781,955,284
6,473,3155,508,217,8165,011,363,303496,854,513

As expected, the extra 8 bytes per slab has allowed a large number of items to be moved to slabs having smaller chunk sizes with more than half of the items in slab 11 now fitting into slab 10 and saving almost 600 MB of memory. The average item size being closer to the slab chunk size is another indicator that we are making more efficient use of the memory in each slab.

For reference, the size of the items each slab can contain are below. The chunk sizes are displayed when running 'memcached -vv', reformatting this into a table and calculating the maximum item size gives:

SlabChunk sizeMax dataMin data
196301
21205431
31528655
419212587
5240173126
6304237174
7384317238
8480413318
9600533414
10752685534
11944877686
1211841116878
13148014121117
14185617881413
15232022521789
16290428362253
17363235642837
18454444763565
19568056124477
20710470365613
21888088127037
2211104110358813
23138801381111036
24173521728313812
25216962162717284
26271202705121628
27339043383527052
28423844231533836
29529845291542316
30662326616352916
31827928272366164
3210349610342682724
33129376129306103427
34161720161650129307
35202152202082161651
36252696252626202083
37315872315802252627
38394840394770315803
39493552493482394771
40616944616874493483
41771184771114616875
4210485761048505771115

This table is using the default Memcached config settings (48 min item, 1.25 growth factor, cas enabled). The "Max data" column is the maximum number of bytes that can be used to store both the key and the value for an item in this slab. The "Min data" column is the minimum data size required for an item to be put in a slab and is simply the maximum data size from the previous slab + 1. With CAS disabled the max and min data values would be increased by 8 bytes for all items with the exception of the minimum data for slab 1 remaining at 1 byte. The following Python function can be used for calculating the maximum item size (key + value) per slab:



Conclusion

In this post we have had a look at the details of Memcached item overhead and identified disabling CAS as a simple mechanism of improving memory utilisation which will reduce evictions and increase hit rate. In the next post we will look at how adjusting the growth factor and minimum item size can further improve memory utilisation.