User Data Validation

I’m surprised at the amount of coders that simply trust the data given as “VALID”, without running any tests on it at all.

Or, more to it, those that simply don’t check data completely.

I was having a chat to someone today about our trivia bot, which we are converting to a perl format and enhancing drastically.
One of the topics that came up was that the answers needed to be checked.

But on that point alone, I got a better idea. Searching for the answer inside the guess, rather than simply matching would mean that users couldn’t help other users, and we stop that form of cheating, completely. They can’t type the answer without guessing it, no matter how much they pad the answer.

We also took it further, and checked just how much data validation big giants do.

AOL was bought up, which recently said password lengths were 16 chars. Unfortunately, they chopped them to 8 characters, thus causing the user a lot of hassle, no doubt.

And This favourite came to mind, Big Brother’s GET issue. The user IDs being attached to the URL, and not being validated to check if they are the right user was a rather funny happening for Big Brother 2007.

Anyway, the clear lesson for many programmers, is User Data is the most dirty data your application can ever get. It’s more dirty than a binary memory dump. It’s insecure, and the absolute worst data you can deal with.

Anything the user provides, MUST be checked, validated, and confirmed, otherwise you run a huge risk of your script breaking, or the user compromising data, or accessing data they aren’t supposed to, and so forth.

There’s never a time where you shouldn’t validate user data in one way or another, simply checking length, and that it exists isn’t where it should stop. You should check if available that the data is exactly what you expect, and if not, throw an error. Throw up all over the user. When they are done wiping the vomit off, and you get correct data, you should go on processing.

That way, you protect services behind the user data, you are less prone to having issues, and the data you stick in the database, or use in any way, is usable data.

This entry was posted in Programming, Random. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *