Alibistic Error Messages

As it turns out, either "alibistic" is not really an English word or my spellcheckers and the internet has gone haywire - either way let me explain myself what I mean by alibistic:

PC LOAD LETTER1

If you don't know what the mumbo jumbo above means, just follow the footnote, I'll wait right here.

...

Are you still there? Alright, good.
Uhmm, so confusing error messages are just the tip of the iceberg, how about actually clear error messages that are so exceptionally unhelpful that you can feel your blood boiling?

Permission denied  

Seems like another perfect case of the relativity of wrong - program exiting without doing what it was supposed to do is wrong, but program exiting without doing what it was supposed to do without telling the user what happened is just even more wrong.

Consider - error message is whatever the user sees on their screen right before the application abruptly exits with an error code. Simple as that, let's not delve into the semantics and start distinguishing between error codes, exceptions, log messages etc. because all of these share a simple property - are they actually helpful to the user running, or trying to run, the program? Do they help the user to narrow down the problem domain? Or are they simple "in your face" spews telling the user: it broke, deal with it!?

Context is the key

Error messages should be presented to the user because the program happened to be in such a state from which it was unable to recover - most importantly, it is particularly expected from the user to take an action to alleviate the problem! Let's warp back to the example above, compare:

15:25:01 10/21/2016 Permission denied  
15:25:01 10/21/2016 Exit code 1  

Against:

15:25:01 10/21/2016 Unable to open /var/www/myapp.db: Permission denied  
15:25:01 10/21/2016 Exit code 1  

If we wanted to be super nice we could give even more context:

15:25:01 10/21/2016 Unable to open /var/www/myapp.db as O_RDONLY -> owner: www-data:www-data, access: 0600/-rw------- as mysql:mysql  
15:25:01 10/21/2016 Exit code 1  

I think that the difference is clearly visible now - the first example doesn't tell you anything useful, was it bad command line parameter? Was it bad chmod on configuration file? Or a completely different file? Wait, wasn't it a socket? Or ... I could go on, but I digress, the key point here is that it forces you to start debugging the application in order to figure out what changes needs to be carried on the system itself to fix the issue. The second example narrows down the problem domain - the error happened because this exact file couldn't be opened - now you can go on and figure out what UID/GID owns the file and what is the UID/GID of the executing process, as well as access bits for the file. The third gives you all this data upfront.

What information should be included?

Anything that is relevant to the current problem domain because every exception or error is raised based on some condition, think:

if i >= len(array):  
    raise IndexError

IndexError: index was out of range  

What an alibistic error message indeed, let's make it a little bit better:

if i >= len(array):  
    raise IndexError("Index: {} out of range: 0-{}".format(i, len(array)-1))

IndexError: Index: 9 out of range: 0-8  

Apart from including the reason why the specific exception was raised, another useful datapoint might be the self/this reference, if the error is thrown by an object:

if not self.parent:  
    raise OrphanError

OrphanError: current object has no parent  

That really begs the question what is the damned current object?

if not self.parent:  
    raise OrphanError("Object with name: {}, id: {} has no parent".format(self.name, self.id))

OrphanError: Object with name: orphan, id: 0 has no parent  

Semantic Pondering

Consider this - why does a part of code return an error code or raise an exception? Current execution environment or context has encountered an invalid state from which the code was unable to recover.
Each context can be generalized into simple subject and object relation, e.g. what went wrong while doing what. Correctly distinguishing various subjects and objects also requires correctly specifying the problem domain, breaking up a single check into multiple ad-hoc checks:

if self.name is None or self.email is None or self.context is None:  
    raise UserError

UserError: Invalid user  

The check is too broad, we aren't able to figure out anything useful from that without further debugging, consider:

invalid_fields = [self.name is None, self.email is None, self.context is None]  
if any(invalid_fields):  
    raise UserError("User with id: {} has empty fields: name={} email={} context={}", self.id, *invalid_fields)

UserError: User with id: 1 has empty fields: name=False email=False context=True  

Which looks a bit ugly, but we can fix that with a convenience function:

def invalid_fields(obj, fields, check_func=lambda x: x is None):  
    values = [check_func(getattr(obj, field)) for field in fields]
    return map(lambda (f, v): f, filter(lambda (f, v): v, zip(fields, values)))

invalid = invalid_fields(self, ["name", "email", "context"])  
if invalid:  
    raise UserError("User with id: {} has empty fields: {}".format(self.id, ", ".join(which)))

UserError: User with id: 1 has empty fields: context  

So, uhm, yeah, being non-alibistic may entail writing some more code indeed, but since it's essentially write-once code that can be further shared via common helper library it's not really a problem.

Next up

In the next installment in the Error Messages series I'm going to take a look at output format consistency, expectations and subtleties of working with error messages in an automated fashion.