Some applications appear to have trouble accepting the inevitable. They will do almost anything to avoid crashing, including any and all errors that attempt to tell them that maybe it's time to let go. This seems to come from a developer mentality that assumes that the end user is willing to put up with your application being wrong, consuming excessive system resources or breaking the system on which it runs rather than risk being separated from it. These developers really need to get over themselves, their application really isn't that important.
Sometimes when you're caught up in the minutia of developing an application it can be difficult to accept that others, and especially your userbase, don't really care about it as much as you do. The application is not the end, it's a means to that end. If your application is failing to perform the task for which it is intended your users are highly unlikely to care that it's pretending it did. Indeed most users would rather know immediately that the application is incapable of performing the task rather than discovering this much later when the error has had a chance to be magnified.
I've seen too many systems, many business critical handling large amounts of money, that catch all exceptions in critical areas of the code only to ignore them and continue working. As the end user is not aware of any issues they will continue working as though everything succeeded. Depending on the nature of the system they may not even have a mechanism to know that the operation has failed.
Take for instance a hypothetical online store (this example is fictional and not based on any specific client or incident). The store takes orders, interacts with a payment gateway to process the customer credit card and writes the order to the database, where a fulfillment system will take care of it (this is a sub-standard architecture but not uncommon and suitable for this example).
A new version of the application is deployed to the server but the deployment has a flaw. There is a new column in the order table which the code expects but that is not created on the production server. This means no orders can be written to the database. However the application data access layer (DAL) implements a number of try-catch blocks that cause any errors in this module to be ignored. As a consequence the system will bill customers via the payment gateway but will not capture any of the order details. Even worse in this hypothetical example the system captures user information against each order so we have only very limited information on who has been billed. The system has likely still sent confirmation emails to customers, who didn't see any errors because the DAL swallowed them. As there are no obvious signs of error this may not be noticed until customer complains start or someone notices the fulfillment system has nothing to do.
A failure of this type could seriously damage the reputation of a business and be extremely difficult and expensive to recover from. These costs will vastly outweight the costs of informing the user of an error and establishing an error reporting system to detect issues.
These kinds of issues apply to GUI applications as well. An application that for example is unable to save the file being edited to disk needs to inform the end user of this. Such an error is aggrevating to end users. Failure to do can be significantly more problematic. A user who's lost hours or days of work because the application lied when it claimed it was saving it is going to be seriously unhappy (and possibly litigious).
Suppressing errors in an application can also lead to failure in other parts of the application. If the internal state of the application is corrupted by a failure this corruption may not be immediately apparent. It may manifest in other parts of the application resulting in errors that seem impossible in the code where the corruption becomes manifest.
The lesson from this is that errors cannot be simply ignored. In the event of an error, you must do one (or more) of the following:
- Alert the user This gives them an immediate opportunity to resolve the issue (if possible) or work around the failure. At a minimum it prevents them from losing additional work due to the error.
- Log the error Logging the error doesn't help resolve any instance but provides information that can be used to resolve the underlying error that caused it.
- Repair the error This may be possible by using an alternate mechanism if available, but more likely involves returning the system to a consistent state such that the error will not have other consequences.
- Terminate Sometimes an application gets into a state from which there is no recovery. If the internal state is corrupted such that it can not be repaired or essential resources are not available then continuing may not be viable. In this case the only responsible course is for the application to exit (preferably with logging).