I've been working on a regex-based UK postcode validation and format correction tool - with the aim of creating a list of postcodes that can be readily geocoded. I was wondering if anyone has any suggestions about improving the efficiency of this process? My correction tool is designed to cope with a number of commonly made mistakes when a postcode is inputted in a free text format as such: pc postcode true original_validate correct_pc correct_validate GIR 0AA TRUE TRUE GIR 0AA TRUE M2 0AB TRUE TRUE M2 0AB TRUE M2 OAB FALSE FALSE M2 0AB TRUE M2 0ab FALSE FALSE M2 0AB TRUE M1 1AA TRUE TRUE M1 1AA TRUE M11AA FALSE FALSE M1 1AA TRUE M60 1NW TRUE TRUE M60 1NW TRUE M6O 1NW FALSE FALSE M60 1NW TRUE M601NW FALSE FALSE M60 1NW TRUE CR2 6XH TRUE TRUE CR2 6XH TRUE CR26XH FALSE FALSE CR2 6XH TRUE DN55 1PT TRUE TRUE DN55 1PT TRUE DN551PT FALSE FALSE DN55 1PT TRUE W1A 1HQ TRUE TRUE W1A 1HQ TRUE W1A1HQ FALSE FALSE W1A 1HQ TRUE w1a 1hq FALSE FALSE W1A 1HQ TRUE EC1A 1BB TRUE TRUE EC1A 1BB TRUE EC1A1BB FALSE FALSE EC1A 1BB TRUE Whilst this does work fine, as you can see it takes a slightly torturous way to get there!
Indeed, there is arguably no need for a data.frame until you want to see the results in a nice format, at the very end.
Keeping things in vectors avoid the repetitive Beautifully done, thanks! dataframe - nearly all my work is done in dataframes, so I don't tend to think of other data structures potentially making my life easier!
Quick add on - I have been working on an extended version of your code encompassing more common mistakes (will post as answer when complete)...
As I haven't seen anything like this available as a completed function my intention is to publish it somewhere like R-Bloggers for future use (code review doesn't get very high footfall) 1.
The United Kingdom postcode database is owned by the Royal Mail, who charge high fees for licences.
Resellers such as Postcode Anywhere provide pay-as-you-go access, but even this in an unnecessary expense if only basic validation is required.
is a PHP class that can validate postcode formats, extract parts of postcodes, and determine the post towns corresponding to postcodes.
Download the two files below and upload them to your web server.
The database is only required for the get Post Town function.
You should periodically update your copy of the database to the latest version.
Note that this function only validates the format of a postcode and does not validate that the postcode exists; this would require access to the Royal Mail database.