In case you were wondering


Yes, I am aware that I have a tendency to raise new points in the summary section of my posts. This works for me.

author: Colin David Scott | posted @ Sunday, November 30, 2008 6:47 PM | Feedback (0)

Check Your Sanity


For any particular development task it is possible to classify it into one of two categories: tasks that you know how to do and just have to actually be done and tasks for which the solution is unknown. The later category is decidedly more prevalent although ideally all tasks eventually decompose to the former.

The implication of this is that you will be developing things that you do not a priori know how to build. This involves design and particularly making design decisions. These decisions will necessarily be made with incomplete knowledge. Unless you are particularly lucky (and lucky consistently) some of these decisions will be wrong. Even if the decisions are not explicitly wrong they may turn out to be sub-optimal and impose unnecessary costs upon development and maintenance.

There are a number of complementary methods for dealing with this including iterative development and refactoring. These are highly valuable techniques I strongly recommend. Additionally, I believe that periodically you need to step back and perform a sanity check upon your system. This is a process where you take a look at your approach and ensure that what you're doing can be justified. I like to look at major design decisions and in essence ask "Why the hell is it doing that?" and evaluating the answer relative to the consequence of the design decision.

As an example I've been experimenting with ASP.NET MVC recently. I've taken a design decision that except for specific excluded content items all string values will be stored HTML encoded in the database. This reduces the scope to forget proper encoding of user data. For this application I'm satisfied that this decision is suitable.

The next design decision relates to how the data will be encoded. Ideally I'd like to automate as much of it as possible. I'm using the UpdateModel infrastructure that ASP.NET provides to bind incoming data so plugging into this seems reasonable. My requirements are to bind some relatively simple classes composed primarily of inbuilt types (String, Int32 etc). As such I started writing a HtmlEncodingModelBinder that implements IModelBinder. This will by default HTML encode string values unless they're marked that they do not require encoding.

After some time building this class I was seeing the required complexity increase rapidly. I had already defined my own IModelUpdater interface plus a related IModelUpdaterFactory interface and was looking at using reflection to examine classes and generate IModelUpdater instances that used collections of delegates to perform the update. It was at about this point that I decided the complexity was getting out of control.

What I was attempting was to replace (in a less functional manner) a significant proportion of the ASP.NET MVC UpdateModel infrastructure. When I asked myself "Why the hell is it doing that?" the answer "so some strings can be automatically HTML encoded" didn't really seem to come close to justifying the complexity involved.

Having decided my approach was essentially untenable I could then use my new knowledge of the framework to come up with an alternate solution. What I've now got is a HtmlEncodedString class that wraps an encoded string. This is coupled with an (enormously simpler) IModelBinder implementation specific to this type, as well as an NHibernate IUserType implementation that allows it to be properly mapped to the database and back. This does have the notable design implication that the domain model now uses HtmlEncodedString which in some applications may be a concern that is not appropriate in this layer. In my application it's a tradeoff that I'm prepared to live with and is significantly better than the previous approach.

In summary:

  • Limited information requires you to make design decisions that may be sub-optimal.
  • You won't find out if your design decisions are sub-optimal until you try them (assuming no other potential information sources)
  • Design decisions need to be reviewed as new information becomes available.
  • When revisiting a design decision the costs of modification must be considered. Not every sub-optimal decision is worth changing.
  • Every decision has tradeoffs. Your job in making design decisions is to make the right tradeoffs not attempting to eliminate them.

author: Colin David Scott | posted @ Sunday, November 30, 2008 6:43 PM | Feedback (0)

Google Maps: Overly optimistic...


I'm heading off to a movie later and decided to use Google Maps to find out how to get there. The movie is at Hoyts Garden City, so I chose "garden city" as the destination. This is the result*:

image

I particular love that it wants me to paddle a kayak across the Pacific. Although it helpfully arranges stops in Japan and Hawaii. The total journey time is estimated at over 26000 km and 56 days which I'm reasonably sure means that I'd miss my movie. At least kayaking is a step up from previous recommendations to swim.

* The start address here is that for my employer, for whom I do not speak.

author: Colin David Scott | posted @ Saturday, November 22, 2008 10:21 AM | Feedback (0)

Why xkcd is the world's greatest webcomic


Because it produces things like this. With stick figures. It's genius in (mostly) monochrome.

author: Colin David Scott | posted @ Monday, November 17, 2008 8:03 PM | Feedback (0)

I have an exclusive new card to demonstrate my superiority


OK, all it does is demonstrate my ability to pay for a MSDN subscription. I'm trying to work out why they're giving them out. It has my subscriber ID and subscription expiry date, but I have them elsewhere. It also has my name and email address but I'm not immediately in danger of forgetting those. So my question is: what does this piece of plastic get me? And how can I use it to lord over those I consider beneath me?

If you have any suggestions post them as a comment. Comments on this blog are currently broken (one day I'll find a blog engine that works for me) but I do get them via email. I may post the best answer at a later date.

author: Colin David Scott | posted @ Saturday, November 08, 2008 4:00 PM | Feedback (0)

Stop looking for something once you've found it


The example I used in my previous post is a highly simplified case I found in actual code. If differs from the code I encountered in a couple of major ways, it's simplified, it deals in test classes not the domain classes from the example (which are not relevant here nor which I am at liberty to post) and that it doesn't contain a bug I find mildly amusing and somewhat informative. The bug is illustrated here:

public static BusinessObject FindByCriteria(IEnumerable<BusinessObject> businessObjects, DecisionCriteria decisionCriteria)
{
    if (businessObjects == null)
    {
        throw new ArgumentNullException("businessObjects");
    }
    if (decisionCriteria == null)
    {
        throw new ArgumentNullException("decisionCriteria");
    }

    BusinessObject result = null;

    foreach (BusinessObject businessObject in businessObjects)
    {
        if (result == null && businessObject.MatchesCriteria(decisionCriteria))
        {
            result = businessObject;
        }
    }

    return result;
}

Keen observers of this example and my previous post (all none of you) will see that the if clause within the foreach has been modified slightly and the break is gone. The method still returns the result as expected. But it's not quite right. Once the expected instance has been found there's no need to continue. We have our result and it's not going to change. The method should end at this point. This is not the behaviour of the above. Instead we continue to loop through every remaining item in the collection.

This is a relatively trivial error but I'm firmly of the opinion that letting the little things slide leads to the big things getting out of hand. If the collection contains a lot of items we will burn a lot of cycles uselessly looping through it. If we express the logic differently then the behaviour gets worse:

public static BusinessObject FindByCriteria(IList<BusinessObject> businessObjects, DecisionCriteria decisionCriteria)
{
    if (businessObjects == null)
    {
        throw new ArgumentNullException("businessObjects");
    }
    if (decisionCriteria == null)
    {
        throw new ArgumentNullException("decisionCriteria");
    }

    BusinessObject result = null;

    foreach (BusinessObject businessObject in businessObjects)
    {
        if (businessObject.MatchesCriteria(decisionCriteria) && result == null)
        {
            result = businessObject;
        }
    }

    return result;
}

This small change to the if statement ordering seems logically insignificant. However this now ensures that the MatchesCriteria method will be invoked on every item in the collection. In this example this may be trivial but this is not always the case. The code is potentially doing significantly more work that is justified.

It is possible that some of the checks have side effects and therefore need to be executed on every object in the collection. If this is the case then an implementation as above would be sub-optimal on a number of levels. This would be coupling two separate operations (apply logic, find instance) which is poor practice in itself. Additionally if the side effect having operation is invoked as part of an if clause changes to the clause may affect whether the logic is applied. This could manifest as a particularly nasty bug that is very difficult to track down. The multiple operations should be implemented separately.

In summary the problem with the method above is that it doesn't stop when it's finished. The implementor likely just forgot the break statement in this case but the use of a single exit point has led the possibility of this defect. Methods should be more efficient. Make sure yours admit when they're done.

author: Colin David Scott | posted @ Saturday, November 08, 2008 3:49 PM | Feedback (0)

The Myth of the Single Exit Point


There's a school of thought that methods should have only a single exit point. This school teaches that all the logic paths through the method should end in a single place from which there is a return of control and the method result (if any) to the caller. The arguments I have seen for this include alleged benefits in readability and correctness. I find these claims to be spurious.

If is my hypothesis (for which I've done no research whatsoever) that single exit point comes from unmanaged environments and this is where I first encountered it. Unmanaged environments require that resources be explicitly cleaned up and a single exit point in this situation means that the cleanup logic need not be duplicated. In .NET resources either do not need cleanup or better constructs (such as the using statement and Standard Dispose Pattern) exist that deal with much of the cleanup requirement in a fashion that does not require code duplication. Hence, much like Hungarian notation I feel single exit point to be a holdover from a previous environment who's (questionable) utility is long past.

In considering the disadvantages of the single exit point if must first be understood that a .NET method will have multiple exit points implicitly. Take for instance this rather simple example:

public static BusinessObject FindByCriteria(IEnumerable<BusinessObject> businessObjects, DecisionCriteria decisionCriteria)
{
    if (businessObjects == null)
    {
        throw new ArgumentNullException("businessObjects");
    }
    if (decisionCriteria == null)
    {
        throw new ArgumentNullException("decisionCriteria");
    }

    BusinessObject result = null;

    foreach (BusinessObject businessObject in businessObjects)
    {
        if (businessObject.MatchesCriteria(decisionCriteria))
        {
            result = businessObject;
            break;
        }
    }

    return result;
}
 

This example uses variable result to store the instance of BusinessObject that it finds. If no object matches the original value that the variable is set to (null) is used. In both cases the method returns from the last line. At least it does if we assume everything is OK.

If either of the parameters is null this method (which validates its arguments) with throw an ArgumentNullException. In these cases the method does not return from the exit point at the end. This is explicit and you could argue (not particularly well in my opinion) that this is an obvious special case. However there are still failure points in the method that are not covered. For instance if businessObjects contains a null value the method will fail with a NullReferenceException. We could guard against that by doing a null check in the if clause, but it may be that the list containing null is invalid, in which case we'd probably want to throw an exception explicitly, meaning another exit point. It is also possible that MatchesCriteria will throw an exception.

You could wrap the entire thing in a try-catch block and swallow any exceptions, but this is hardly the way to make robust software. Unless you want to return to the days of having result codes and checking them on every method call (some people do, please take their compilers away) exceptions are a reality of current development environments.

Even if we allow exceptions as a special case single exit point is still sub-optimal. I can tweak the method above to be:

public static BusinessObject FindByCriteria(IEnumerable<BusinessObject> businessObjects, DecisionCriteria decisionCriteria)
{
    if (businessObjects == null)
    {
        throw new ArgumentNullException("businessObjects");
    }
    if (decisionCriteria == null)
    {
        throw new ArgumentNullException("decisionCriteria");
    }

    foreach (BusinessObject businessObject in businessObjects)
    {
        if (businessObject.MatchesCriteria(decisionCriteria))
        {
            return businessObject;
        }
    }

    return null;
}
 

In this modified example we've made the method simpler. The variable is gone and in my opinion the behaviour is more explicit. This code clearly shows that as soon as we find something we return it to the caller and that null is the default return value (previously this was inferred from the setting of the variable value at declaration). It's not a huge change but not every improvement has to be, the effect of multiple changes is cumulative.

At this point someone will argue that if the method has additional logic then the variable will be required and I haven't improved anything. These people are wrong. This method is reasonably cohesive. It may be reused in any case where a BusinessObject matching a DecisionCriteria needs to be found. I can use the method in other logic and its name will make it apparent what is happening without me having to understand the logic of how the BusinessObject is found. Therefore arguing that the method may have additional logic in this example is wrong because adding additional logic to the method is itself wrong.

If you have multiple distinct paths through a method which you choose the correct option is not to have a nested method that gathers a result and returns it at the end of whichever path you choose. The correct option is to extract each path into a method then select a method to execute and return its result. There are multiple ways to do this, such as this.

Where methods have an execution path that diverges multiple times and includes common sections I'd argue even more strongly that logic needs to be broken into multiple methods (or otherwise refactored, for instance using strategies). The more paths there are through a method the more difficult it is to validate and the more likely you will experience bugs. Real world conditions are an excellent way to produce circumstances your consider impossible and a really terrible place to encounter them. It is by simplifying your logic by breaking it up into smaller more manageable pieces that we can best address this.

In summary, single exit point fails because:

  • It's not necessary anymore due to improvements in how we can clean up resources.
  • It doesn't match the reality of current managed environments.
  • It promotes more complex and involved code that is harder to understand and test and more likely to fail.

As a last note, I am aware that as of C# 3.0 the above method is obsolete. The following is how I would write this code today:

var result = businessObjects.FirstOrDefault(businessObject => businessObject.MatchesCriteria(decisionCriteria));

 

This uses FirstOrDefault from System.Linq.Enumerable to return a value or a default (null) if no matching value is found. The First method is useful if you wish to fail if the collection does not contain a matching value. This code is sufficiently simple that I'd likely use it inline but for more complicated predicates extracting a method to encapsulate it may be appropriate.

author: Colin David Scott | posted @ Saturday, November 08, 2008 3:20 PM | Feedback (0)

ASP.NET MVC doesn't like you reusing controller instances


I've learnt tonight that ASP.NET MVC tends to get annoyed when you reuse a controller instance. I have a (rather primitive) factory creating controllers and doing some simple dependency injection. This is a result of a medium trust target environment.

My first pass at a controller factory reused controller instances. This resulted in InvalidOperationExceptions with the message:

The parameters dictionary does not contain a valid value of type 'System.Int32' for parameter 'id' which is required for method 'System.Web.Mvc.ActionResult EditPosting(Int32)' in '<MyController>'. To make a parameter optional its type should either be a reference type or a Nullable type.

Altering the route so that the id property default was an int32 fixed the issue, but this was an ugly hack that fixed the specific manifestation without resolving the issue. The correct fix was to alter the factory to return a new instance every time. Reusing controller instances was a (slightly) simpler implementation so it's technically not premature optimisation at fault here. I shall remember that base types may have state in future.

author: Colin David Scott | posted @ Monday, November 03, 2008 11:30 PM | Feedback (0)

Small Code Improvements Part 2: Making methods fail fast


My discussion of failing fast as a development philosophy was relatively general in terms of the level to which it applied. It could be read as a suggestion that it is only inputs to a system that should be validated, but this would be inaccurate. It is my opinion that failing fast is a concept that should be applied at all levels of a system. One key area is in validation in methods.

There are a number of advantages to methods performing validation before execution. The reasons for failing fast I discussed in my previous post. Validation also has advantages to a method in terms of responsibility. By performing validation a method can make it clear that the responsibility for a failure lies with its caller. This seems like buck passing but it serves a valid purpose.

If a method is called by something that violates its preconditions then the fault lies with the caller. Having the method fail fast and state this explicitly makes the error easier to identify. In defect resolution the source of an error needs to be identified so that it may be fixed, not so that blame can be attributed.

There is an attitude that validation is generally not required for methods called by your own code. After all, you wrote both, so you know what the method needs. Validating it is useless overhead. This fails for a number of reasons:

  • In team environments others may invoke your code and may not understand all your preconditions, especially if you do not make them explicit.
  • Code written by you six months ago may as well have been written by someone else. Good luck remembering all the preconditions. This period may drop to six minutes if you're distracted.
  • Your code has bugs. Really, it does. Do what you can to protect against them.

Now that I've convinced you to apply validation in your methods (I'm going to assume that I have), there are two primary types of validation your method should perform.

Firstly a method should do is validate its arguments. Typical checks include ensuring that required arguments are not null and that values are within legal ranges. These validations are to check that the arguments are generally legal, even if they may not be legal given the current system state. They generally result in an exception because the caller is not meeting the requirements of the method. You can skip this step for methods without arguments.

Secondly a method should then validate that it is not being called inappropriately. Examples of checks at this point include method calls on disposed objects and attempting to remove an item from a collection when it is not in that collection. These checks go beyond validating that the argument are in legal ranges and check that the method invocation and arguments passed are valid in context. Where there is no state these checks are generally not necessary.

Combined these validations go a long way to protecting methods from their callers. This is not a cure-all, it is still possible for callers to use the methods incorrectly in ways that cannot be validated against. The validations should therefore form part of an overall strategy of protection against defects.

author: Colin David Scott | posted @ Sunday, November 02, 2008 6:18 PM | Feedback (0)

Development Philosophy: Fail Fast


Of the opinions I hold about software development probably the most counter-intuitive is that software that fails fast is more robust. After all, if the software breaks all the time it can hardly be considered robust. However there are a number of reasons I hold to the practice of failing fast and consider it to be a vital practice in building quality software.

The first thing to note is that there is a distinct difference between failing and breaking. Failing is being unable to complete an action because required conditions are not met. Failing is normal and expected. Your application will have to deal with failures cause by users, other systems and even itself. Breaking is different. An application that is broken may not appear incorrect but will produce incorrect behaviour. This incorrect behaviour may not be noticed for some time, at which point it may be extremely expensive or impossible to fix.

Failing fast is therefore about detecting errors as early as possible and dealing with them. Generally this is by preventing actions from being performed if pre-conditions are not met. Validation on forms is an excellent example of this. Attempts to submit the form when it is in an invalid state fail immediately and the user receives a notification. Also important in this example is that the software itself does not crash and the user is able to rectify issues and continue.

In most cases software will be able to deal with a failure case without terminating. In an interactive environment this can be by providing the user with notification and the ability to correct as in the example above. In service applications this can be by cleanly refusing to process an invalid request without affecting the ability of the service to continue processing valid requests. The invalid request may then be retried (to potentially rectify temporal related failures) or logged or otherwise captured for later rectification. Allowing an invalid request to proceed could corrupt the service state in ways that are not immediately apparent. For instance a request that incorrectly sets a negative tax rate could, if not detected and caused to fail, result in significant financial implications up to and including jeopardising the viability of the business.

Although the ideal case is that software recovers from failures and continues this is not always possible. There are times when the software gets to a state from which there is no legitimate recovery path. At this point clean termination of the software is the appropriate path. Once the software cannot be recovered further action will only result in incorrect behaviour. Users hate it when their software terminated. But they hate it far more when it lies to them about how it is behaving.

In my experience there are generally two sub-optimal practices used for dealing with failure conditions. The first is to just ignore the possibility of failure and proceed along the so-called "happy case". This is generally the most popular because it's easy, right up until the point where your assumptions fail and everything falls in a flaming heap. The second is to make your software far too desperate to stay alive. I've seen cases where there are multiple points build into the software where all exceptions are caught and ignored. Everything may be broken, but the users won't be told. They'll just have to find out when they find that the software has misbehaved and all their data is lost or corrupt. These practices are generally combined to produce extremely fragile software that looks solid if you never examine its results.

In my opinion there are generally three types of appropriate response to a failure:

  • If the error is in user input and is correctable then notify them and seek correct input. This may include cases where the input is invalid due to the current state of the system or where the input is invalid in all contexts.
  • Where a request to a service is invalid reject the request before processing it. This ensures the service can continue to process valid requests. Appropriate logging of the failure should be performed. This logging may be nothing more than an error result (for instance from a SQL Server when invalid SQL is provided) or might involve explicit notification and logging of the request (such as in a custom business service). The nature of the response is context dependent but the error should be made known in an explicit fashion.
  • Where software determines that it cannot continue it should terminate gracefully and provide notification. This is not the preferred option but it is important that this decision be made when required. Explicit failure at this point prevents incorrect behaviour. It also allows the error notification to be more explicit as to the cause of the termination which aids in defect resolution.

I am generally opposed to automatic resolution of errors. There are a number of reasons for this:

  • It disguises the fact that the defect exists, making detection and resolution more difficult.
  • In many cases automatic resolution will be a guess as to the correct result and this guess may be wrong. This could result in behaviour worse than the original error.
  • Errors often occur due to edge cases not anticipated in the main code. There is no reason to assume that they will be anticipated and resolvable in the resolution code.

In summary:

  • Errors detected earlier are easier to fix.
  • Detecting errors early prevents incorrect behaviour and data corruption.
  • Assuming everything is OK or ignoring errors produces bad outcomes.
  • Consider the possible failure cases and either prompt for correction, reject the request or terminate the software.

author: Colin David Scott | posted @ Sunday, November 02, 2008 8:08 AM | Feedback (0)