Integrity Best Practice
Best Practices For Expression Validation
There a number of things that will help you build expression validations, each of which is listed below:
Practice good ruby style
Like any programming endeavour the most important part of writing an expression validation is the maintenance of the code. As such the best expression validations will use idiomatic Ruby style. There are a number of style guides in existence but my favourite is [Noel Rappin's ruby style guide]. The time taken to write good ruby comes back in spades because six or more months from when you write a validation you're likely to need to cover changing requirements or difficult corner cases. Then you'll be re-reading the validation and trying to understand what you did but also why you did it is hard without easy to read prose.
Order multi-part validations by complexity
When you're writing expression validations the best way to think of the validation is that it is a filter. The validation process proceeds from the top to the bottom of the validation so you should be aiming to fail as quickly as possible by putting simple elements of the validation at the top of the validation. For instance, the following example shows a two-part validation, the first part is a presence check and the second part is a comparator. Presence is easy and cheap to detect, so failing records that are incomplete is straightforward.
code sample here
Caching is not supported
Do not attempt to pull accumulators along with an expression validation. Each validation runs in it's own closure and the VM garbage collector can step in at any point. If you need to store state use an aggregate validation instead.
Be careful what you lookup
Performance of expression validation is poorer than other parts of Integrity because the innate complexity of expression validations. One item that drags the performance of those validations back is database lookups. If you need to select staged data from Integrity do not perform overly broad searches. Particularly the use of the lookup hash and lookup set methods is difficult if there are large numbers of records in the dataset being examined.
Do you always need to coerce that type?
Integrity brings back row data as strings. this is an extremely flexible format, for instance it's easy to detect patterns in strings. This has an impact for other matters, such as parsing. It is perfectly acceptable to use regular expressions to test the validity of dates for instance. This gives important and quick feedback without converting it to the type Integrity believes it to be by use of the