Speaking with a colleague the other day and he was talking about how he was constantly having to check the data feeds he was getting from another consultancy working on the same project.
Obviously schema validation and the like run transparently but what he was talking about was how correct the data was once it had actually been transformed into the right format for loading into his database.
We started talking about "known and expected values" and being able to write unit tests against them to prove that the upstream transformations are putting the right values into the right fields.
I think there is some mileage in this approach - obviously your type of project or data might not applicable to this (too volatile?), however if you receive regular data updates it's worth writing some simple sanity check unit tests to cut off at the legs any time wasted in tracking a bug down under the assumption that it can't be the data.
Obviously schema validation and the like run transparently but what he was talking about was how correct the data was once it had actually been transformed into the right format for loading into his database.
We started talking about "known and expected values" and being able to write unit tests against them to prove that the upstream transformations are putting the right values into the right fields.
I think there is some mileage in this approach - obviously your type of project or data might not applicable to this (too volatile?), however if you receive regular data updates it's worth writing some simple sanity check unit tests to cut off at the legs any time wasted in tracking a bug down under the assumption that it can't be the data.
Comments