Tip #4 - Validating an Email Address with Ruby on Rails
April 14th, 2008
Did you know that Rails has inbuilt a strong email handling library called (ahem) TMail? I just so happen to maintain this now (Minero Aoki wrote it), but it gives you a great way to validate email addresses…
If you have ever tried to handle the validation of email addresses in your Rails app, you have probably ended up trying to use something like this to validate the address:
1 2 |
validates_format_of :email, :with => /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i |
Now, this works… in most cases, but there are a few specifics that won’t.
So some of you out there decide that you want a REAL email validation process and so you go for this regular expression instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
EmailAddress = begin qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]' dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]' atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-' + '\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+' quoted_pair = '\\x5c[\\x00-\\x7f]' domain_literal = "\\x5b(?:#{dtext}|#{quoted_pair})*\\x5d" quoted_string = "\\x22(?:#{qtext}|#{quoted_pair})*\\x22" domain_ref = atom sub_domain = "(?:#{domain_ref}|#{domain_literal})" word = "(?:#{atom}|#{quoted_string})" domain = "#{sub_domain}(?:\\x2e#{sub_domain})*" local_part = "#{word}(?:\\x2e#{word})*" addr_spec = "#{local_part}\\x40#{domain}" pattern = /\A#{addr_spec}\z/ end |
And then your validates_format_of becomes:
1 2 |
validates_format_of :email, :with => EmailAddress |
Which is neat, but you have to stuff that Regex away somewhere. When I used this, (by the way) I would make a file in /lib called ‘rfc822.rb’ and then put this in it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# # RFC822 Email Address Regex # -------------------------- # # Originally written by Cal Henderson # c.f. http://iamcal.com/publish/articles/php/parsing_email/ # # Translated to Ruby by Tim Fletcher, with changes suggested by Dan Kubb. # # Licensed under a Creative Commons Attribution-ShareAlike 2.5 License # http://creativecommons.org/licenses/by-sa/2.5/ # module RFC822 EmailAddress = begin qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]' dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]' atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-' + '\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+' quoted_pair = '\\x5c[\\x00-\\x7f]' domain_literal = "\\x5b(?:#{dtext}|#{quoted_pair})*\\x5d" quoted_string = "\\x22(?:#{qtext}|#{quoted_pair})*\\x22" domain_ref = atom sub_domain = "(?:#{domain_ref}|#{domain_literal})" word = "(?:#{atom}|#{quoted_string})" domain = "#{sub_domain}(?:\\x2e#{sub_domain})*" local_part = "#{word}(?:\\x2e#{word})*" addr_spec = "#{local_part}\\x40#{domain}" pattern = /\A#{addr_spec}\z/ end end |
And then in my model:
1 2 3 |
include RFC822 validates_format_of :email, :with => EmailAddress |
Which is getting better. But the problem with that now is that you need to maintain that email address regular expression… which is quite… umm… “complex” I think is a good word :)
Enter TMail!
TMail has an “Address” class. It will throw an invalid address exception if given an address it can’t handle (and it has about 2,000 test cases of email addresses it can handle, so you are pretty safe.)
Plus, as the maintainer, whenever I find a valid email that TMail can not handle, I fix it, and then you benefit at the next gem install tmail… so it all turns out good :)
To use it, dump this in your user model somewhere (assuming your user model has the attribute ‘email_address’)
1 2 3 4 5 6 7 |
def valid_email? begin TMail::Address.parse(email) rescue errors.add_to_base("Must be a valid email") end end |
Then your validates_format_of becomes:
validate :valid_email? |
Which is even nicer.. as in… less code == nicer :)
Next tip => How to validate the address fully before accepting it…
blogLater
Mikel
May 12th, 2008 at 05:00 AM
What advantages does the TMail Address class have over the RFC regexp shown in terms of validation? The format of email addresses hasn’t changed in years, and is unlikely to change for the forseeable future, so for me maintenance of the Regexp isn’t an issue.
The only thing I can see the regexp missing is that it doesn’t check the domain’s TLD to see if its one of the valid ones issued by ICANN.
May 12th, 2008 at 03:01 PM
Dan, I think the biggest thing is that you don’t have to maintain or keep your own RFC on it.
TMail handles some nasty email addresses that can trip up other regexs as it uses a parser to figure out what is good and bad.
You could use the RFC regex, I do on one ruby script where I don’t want to include TMail, but if you are using other mail functions of TMail, the free verification stuff in TMail works and works well.
May 12th, 2008 at 10:26 PM
Can Tmail operate behind Gmail? I want to be able to track all the email that the app sends by passing it through Gmail. If I do this, I get all the cool Gmailness that allows me to have conversations.
Thanks!
May 12th, 2008 at 10:38 PM
@James: Sure, TMail just handles the Raw email. You need to use something like Net::POP3 or Net::IMAP etc to get the email down, and then you can parse it with TMail.
Let me know how you go with it :)
Also, you can try the TMail Talk mailing list (go to the tmail.rubyforge.org website to find it). Bunch of hard cord TMail users there.
Mikel
May 16th, 2008 at 01:18 AM
What would be the best way to use TMail and also validate that the email address is just A-Za-z0-9+._- ?
Because TMail accepts accent chairs.
May 29th, 2008 at 09:43 PM
Hi,
Nice tip on using TMail, thanks Mikel.
Note you don’t need to wrap the rescue in a begin/end block. You can just do this:
def valid_email? TMail::Address.parse(email) rescue errors.add_to_base(“Must be a valid email”) end
Two lines fewer!
Cheers, Dave
May 29th, 2008 at 09:56 PM
Gah! Sorry, it’s eaten the linebreaks and I don’t know what formatting engine you’re using. Try this:
def valid_email?
end
June 20th, 2008 at 01:02 PM
nice tip, thanks!
June 27th, 2008 at 03:43 AM
I have tmail installed and have followed the instructions above and it seems to be working for detecting some invalid addresses, but not all. For instance it knows that ‘sdf sdf’ is an invalid email address, but ‘sdfsdf’ is passed through as valid.
Shouldn’t anything with out exactly 1 ’@’ and at least 1 ’.’ be rejected as invalid? Am I missing something?
Thanks.