Thursday, 5 December 2013

The MWEB uncapped ADSL myth

I was the unlucky recipient of an acceptable usage notification email from MWEB last night. It turns out that their 10Mbs uncapped account is actually a 125GB capped account and that R749 (my last MWEB bill) every month only actually buys a 1.5Mbps line. I can't say that I am terribly surprised, there is a trend with South African ADSL providers to start clamping down on "high usage" users, MWEB is merely notifying users via email when they have reached this lofty state.

My only real gripe is that I am being lumped into the probably-a-pirate-downloading-24-7 group and as a result my (legal) video streaming is throttled to tetris block quality despite only using my line for a few hours a day. The current price war in the uncapped ADSL market is rather pointless if an uncapped account is actually subject to a throttling limit. As a user that is spending a fair amount of money on high speed internet access I would prefer if the ISP's kept their prices the same but increased their capacity. To put this in perspective lets do some theoretical bandwidth calculations.

My ADSL line is rated as 10Mbps but in practice I only really get 8.5Mbps which is generally sufficient for streaming HD content (obviously depending on a number of other factors). For simplicity sake lets round this down to 8Mbs, please note the small b in Mb indicates megabit (transfer rate) not megabyte (storage). So:
8Mb per second * 60 seconds = 480Mb per minute
480Mb per minute * 60 minutes = 28,800Mb per hour
28,800Mb per hour / 1000Mb per Gb = 28.8Gb per hour
28.8Gb per hour * 24 hours = 691.2Gb per day
691.2Gb per day * 30 days = 20,736Gb per month
20,736Gb per month / 1000Gb per Tb = 20.736Tb per month

And finally converting data transmission rate (small b) to data storage (big B) thanks to Google's handy converter:
20.763Tb per month * 0.125 Tb to TB factor = 2.592TB per month

So at 8Mbps the theoretical maximum I could download is 2.5 terabytes of data every month. This is what a truly uncapped service would look like and lets be honest, that is a massive amount of data. I am not expecting to be able to download this amount of data and my maximum usage is unlikely to exceed 500GB per month but I am definitely not going to get that at 10Mbps with MWEB. To be fair MWEB does not cap the account, they just throttle it to such an extent that it is effectively useless for streaming video content.

Lets see what you are actually getting with an uncapped 10Mbps MWEB account. First a few assumptions:
  1. The MWEB throttle level is 125GB on a 10Mb line (in other words 4GB per day, supported by anecdotal evidence)
  2. The throttle speed will be 1.5Mbps
  3. The throttle will be removed when the rolling 30 day average drops below 100GB (no evidence for this but taking 80% of the throttle level as it makes a nice round number)
  4. Average daily usage of 4 hours (14.4GB at 28.8Gb/h)
  5. Starting at the beginning of the month with a 0GB 30 day average
Using these assumptions you will reach the throttle level after 9 days (8.68 rounded up). From that point onwards you are throttled to 1.5Mbs resulting in a maximum download rate of 5.4Gb/h and data usage of 2.7GB per day (maintaining the 4 hours usage per day). At this rate the rolling average will reduce at 11.7GB per day (14.4GB - 2.7GB) and take just over 2 days to fall below the assumed throttle removal level. Reverting back to the non-throttled level it will then take another 3 days before the throttle is reactivated. This cycle will repeat and effectively means that you are paying for a 10Mb line but getting a 1.5Mb line 40% of the time, I don't like those odds.

If you drop the 4 hour daily usage assumption and try to maximise data usage the calculations are even more bleak. After two days at 86.4GB per day the account will be throttled, after that the maximum daily data transfer is limited to 16.2GB (again assuming the 1.5Mbs throttle stays constant). This means the total data capacity of the account is 626.4GB made up of 172.8GB (2 days at 86.4GB per day) + 453.6 (28 days at 16.2GB per day). Using the line capacity calculation detailed earlier this means you can only actually get 23.58% (626GB out of 2654GB) of the "uncapped" capacity that you are paying for. 

To summarise:
  • The MWEB 10Mb uncapped account is actually a throttled account with a maximum 5GB daily usage limit. 
  • The account will be throttled after 2 days of uninterrupted use at maximum capacity
  • The account only provides 24% of the capacity suggested by the "uncapped" label
  • The account will only provide full speed connectivity 60% of the time under normal usage (after the first month)
  • This account is not suited for consistent high quality video streaming
  • If you want to use your line at full speed for more than 2 hours a day this account is not for you
Needless to say I am in the process of cancelling my "uncapped" MWEB account.

Thursday, 28 November 2013

Java WAR deployment options on AWS Elastic Beanstalk

Introduction

Elastic Beanstalk deployment is pretty cool for interpreted languages like Python, Ruby, or PHP where all you need to do is a "git aws.push" to deploy the latest version of your application. The AWS Toolkit for Eclipse is also great for Java deployments but what if you want to deploy a manually built WAR? This post lists a few of the options available. It assumes that you have already created an Elastic Beanstalk application using these instructions. The example uses an application name of war-test and a WAR file named sample.war (kindly provided by Apache Tomcat here).

Option 1 - AWS Console

Pretty simple, select your Beanstalk application and click the "Upload and Deploy" button. Select the WAR file and give it a version name, a few minutes later your new version is up and running.

In the background the file is uploaded to an S3 bucket, a new application version linked to the bucket/object and a deployment triggered.

Option 2 - Command line

This is pretty much the same approach as for the AWS Console upload except that you need to manually perform each of the steps using the AWS CLI tools.

Step 1: Find the S3 bucket to upload the file to, this can be done as follows:

aws elasticbeanstalk describe-application-versions --application-name war-test

Which will print something like:
{
    "ApplicationVersions": [
        {
            "ApplicationName": "war-test",
            "VersionLabel": "git-db96ef73b33ba5ae515907c586d133b26b3489b6-1385637920942",
            "Description": "First commit",
            "DateCreated": "2013-11-28T11:25:21.596Z",
            "DateUpdated": "2013-11-28T11:25:21.596Z",
            "SourceBundle": {
                "S3Bucket": "elasticbeanstalk-us-east-1-XXXXXX",
                "S3Key": "git-db96ef73b33ba5ae515907c586d133b26b3489b6-1385637920942.zip"
            }
        },
        {
            "ApplicationName": "war-test",
            "VersionLabel": "Sample Application",
            "SourceBundle": {
                "S3Bucket": "elasticbeanstalk-us-east-1",
                "S3Key": "GenericSampleApplication"
            },
            "DateUpdated": "2013-11-28T11:25:12.781Z",
            "DateCreated": "2013-11-28T11:25:12.781Z"
        }
    ]
}


You are interested in is the "elasticbeanstalk-us-east-1-XXXXXX" bucket (where XXXXXX represents your bucket identifier), use this as the destination for your WAR file. You can probably also use your own (custom bucket) but I have not checked what permissions are needed to allow Elastic Beanstalk to access the files.

Step 2: Copy your WAR file to the S3 bucket identified in Step 1. Using the CLI tools again:

aws s3 cp ./sample.war s3://elasticbeanstalk-us-east-1-XXXXXX/s3-sample.war

Step 3: Create an application version:

aws elasticbeanstalk create-application-version --application-name war-test --version-label s3-upload --source-bundle S3Bucket=elasticbeanstalk-us-east-1-XXXXXX,S3Key=s3-sample.war

Step 4: Identify the environment to update

aws elasticbeanstalk describe-environments --application-name war-test

Which returns something like:

{
    "Environments": [
        {
            "ApplicationName": "war-test",
            "EnvironmentName": "war-test-env",
            "VersionLabel": "git-dd7d815e8251acae3560158b169d652d66479bc1-1385644793798",
            "Status": "Ready",
            "EnvironmentId": "e-bwjydyebw9",
            "EndpointURL": "54.204.44.36",
            "SolutionStackName": "64bit Amazon Linux 2013.09 running Tomcat 7 Java 7",
            "CNAME": "war-test-env-bffkznzafh.elasticbeanstalk.com",
            "Health": "Green",
            "DateUpdated": "2013-11-28T13:20:51.828Z",
            "DateCreated": "2013-11-28T11:25:29.308Z"
        }
    ]
}


Note the value of the EnvironmentName (war-test-env in this example)

Step 5: Update the environment

aws elasticbeanstalk update-environment --environment-name war-test-env --version-label s3-upload

Which will return something like:


{
    "ApplicationName": "war-test",
    "EnvironmentName": "war-test-env",
    "VersionLabel": "s3-upload",
    "Status": "Updating",
    "EnvironmentId": "e-bwjydyebw9",
    "EndpointURL": "54.204.44.36",
    "SolutionStackName": "64bit Amazon Linux 2013.09 running Tomcat 7 Java 7",
    "CNAME": "war-test-env-bffkznzafh.elasticbeanstalk.com",
    "Health": "Grey",
    "DateUpdated": "2013-11-28T14:13:42.836Z",
    "DateCreated": "2013-11-28T11:25:29.308Z",
    "Resources": {}
}


Rerun step 4 until the Health reflects as "Green" which indicates your updated application has been deployed (but does not neccessarily mean it is working).


Option 3 - git aws.push

This is a bit of hack (storing compiled binary files in git) but is still quite cool:

Step 1: Unzip your WAR file into the directory that you ran "eb init" in

unzip sample.war -d war-test/

Step 2: Add and commit code to git (in the eb init directory again

cd war-test
git add .
git commit -m "Some comment" .

Step 3: Deploy

git aws.push

Conclusion

There are probably other ways of doing this, feel free to add a comment if you find a more elegant solution. Some of the approaches may also work with .NET applications although I have not tested them.

Wednesday, 27 November 2013

Self-contained JAR file creation using Maven Shade Plugin

Trick for the day. Adding the Maven Shade Plugin (as described here) to your pom.xml allows you to generate a self-contained executable JAR file including all libraries and dependencies.

"mvn package" to generate the JAR and then "java -jar mypackage.jar" to run. Useful for creating self-contained artifacts for offline distribution or for locking dependency versions.

Wednesday, 20 November 2013

Python cx_Oracle on Ubuntu 12.04

There are some older articles on getting cx_Oracle working using RPMs and alien but it seems Oracle are now providing non-RPM downloads. Below are the (quick and dirty) steps I followed to get it installed on Ubuntu 12.04 x64.

  1. Download both the "Instant Client Package - Basic Lite" and "Instant Client Package - SDK" ZIP files from the Oracle Instant Client download page (taking note of the version of Oracle you are connecting to)
  2. Unzip both the files, they will create a directory correponding to the Oracle version, instantclient_11_2 for example
  3. Change to the instantclient directory created in the previous step and:
    • Create symbolic links to the version specific files: 
      • libclntsh.so -> libclntsh.so.11.1 
      • libocci.so -> libocci.so.11.1
    • Create a lib directory (mkdir lib)
    • Move lib* to the lib directory (mv lib* lib)
    • Move header files from ./sdk/include to . (mv ./sdk/include/*.h .)
  4. Optionally move the instantclient directory to another location
  5. Sudo to root (sudo -s) and build the module:
    • Install the python-dev package (apt-get install python-dev)
    • Export environment variables: 
      • export ORACLE_HOME="path/to/instantclient"
      • export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$ORACLE_HOME/lib"
    • Install the cx_oracle module using pip (pip install cx_oracle)
    • Hopefully get a "Successfully installed cx-oracle" message
  6. Exit the root shell (Ctrl-D)
  7. Add the ORACLE_HOME and LD_LIBRARY_PATH environment variables to your profile (or just manually export them)
  8. Launch python and test:
>>> import cx_Oracle
>>> dsn = cx_Oracle.makedsn('host', port, 'sid')
>>> connection = cx_Oracle.connect('user','password',dsn)
>>> cursor = connection.cursor()
>>> results = cursor.execute("select id from table where ROWNUM <= 10")
>>> cursor.fetchall()
[(705,), (718,), (719,), (721,), (725,), (726,), (727,), (737,), (748,), (769,)]
>>> cursor.close()
>>> connection.close()

Thursday, 14 November 2013

Fixing compile error for AWS Java sample

If you are getting an error like:

~/aws-java-sample/src/main/java/com/amazonaws/samples/S3Sample.java:[94,31] error: for-each loops are not supported in -source 1.3

When trying to build the aws-java-sample from GitHub you can fix it by adding the maven compile source and target (see link). Something like:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.6</source>
          <target>1.6</target>
        </configuration>
      </plugin>

Fix submitted on GitHub by mvrueden.

Friday, 16 August 2013

Ruby DevKit compilation errors on Windows

Lesson for the day, when installing Ruby DevKit on Windows ensure that the Ruby installation architecture matches the DevKit architecture.

ruby --version
ruby 2.0.0p247 (2013-06-27) [x64-mingw32]

Note the [x64-mingw32], you will get some strange compilation errors trying to install gems using the 32 bit DevKit with 64 bit Ruby (and vice-versa).

Wednesday, 26 June 2013

Yesterday Perl script

A quick script to display yesterday's date in a specified date format:

#!/bin/perl -w

use strict;
use POSIX qw(strftime);
use Time::Local;

{
   my $yesterday=time() - (24 * 60 * 60);
   print strftime("%d%m%Y", localtime($yesterday));
}

Thursday, 9 May 2013

The dumpStack code monster - alive and scary

Tracing through some critical code in a core application I stumbled across the following:


    public boolean process(PSEvent event) {
        Thread.dumpStack();
        return process(event, true);
    }

    public boolean process(PSEvent event, boolean markAsProcessed) {
        boolean success = false;

        try {
            Thread.dumpStack();
            // Do some stuff...

        } catch (Exception e) {
            // Do some other stuff...
            success = false;
        }

        return success;
    }


The javadoc on Thread.dumpStack() is very helpful:
Prints a stack trace of the current thread to the standard error stream. This method is used only for debugging.

But wait, it gets better. Searching the production logs for just one of the ten or so processes that use this code:


$ grep -c "java.lang.Exception: Stack trace" pickme.log
230722

Yes, that is 230,722 stack traces generated for one process in one day. Any surprise at the load averages: 19.33, 16.91, 16.55



Tuesday, 26 March 2013

Missing the A in Asynchronous - Part 1

Few things are more irritating in software development than when people use a kludge rather than taking the time to implement a properly designed and robust solution. The peeve I am exercising today relates to the code snippet that follows:


    private BOTransfer getTransferInefficiently(Trade trade) throws RemoteException {
  // There is some delay between when a trade and corresponsing transfer is created
  BOTransfer transfer = null;
int cnt = 0;
do {
           try {
Thread.sleep(500l);
          } catch (Exception e) {
return null;
          }
          transfer = getFirstTransfer(trade);
} while (cnt++ < 10 && transfer == null);
return transfer;
    }

The code is pretty simple. Wait 500ms and try and load a transfer object from the trade, repeat until the transfer object is loaded or the number of load attempts reaches 10. In a single threaded or single user environment this code might be excusable. In a multi-user production trading system processing hundreds of transactions a second this implementation is totally unacceptable. Some well meaning soul has added a comment further explaining the method:

    // This implementation is fundamentally broken in many ways. The transfer creation is
    // asynchronous and this logic should be triggered off the transfer creation event (instead of
    // looping and sleeping). This approach prevents this engine thread from processing any other
    // events and will not find the transfer in the case where the TransferEngine is under heavy
    // load. useInefficientWaitForTransfer method added to the interface to allow correct
    // implementation on new classes

This person at least understood that the code was a problem and tried to mitigate the risks by extracting it into a separate method and adding the explanatory comment. Unfortunately for whatever reason (time constraints, lack of automated regression testing, indifference, ...) they only raised the visibility of the problem rather than fixing it entirely.

Issues

Increased system load - the call to getFirstTransfer actually does a remote call and a database query each time it is executed. This means that in the worst case (when the system is probably already overloaded) 10 database queries are executed at half a second intervals for 5 seconds (per thread), a great way to further degrade performance.

Poor thread pool usage - the getTransferInefficiently method is run from a static sized thread pool responsible for handling all requests queued for this process (typically between a few hundred and a few thousand per second). For each thread looping in this method there are numerous other requests that could have been processed instead. In the worst case if all threads are handling requests requiring this method then for a few seconds potentially thousands of requests will be queued unnecessarily.

Poor transparency - under heavy load most of the invocations of the method will return null as the transfer object will not have been created and consequently can't be loaded in the timeout (despite adding significant load to the already strained system). There is also no mechanism for measuring the average time taken for the object to be created and no backoff policy to try and reduce server load.

Solution

The correct implementation to achieve the desired behaviour asynchronously is relatively simple. A separate process should register/subscribe to BOTransfer events, on successful transfer creation perform the processing currently performed on the object returned by the getTransferInefficiently method.