Ouch, my brain hurts

Email, Programming No Comments »

OK, so a few days ago I decided that the current IMAP servers out there (namely Courier) were a pain in the ass, so being a maniac, I decided to write my own. I figured if I implemented it in Python, did a minimum approach to security, made the server read-only, and a few other shortcuts, I could get the server done rather quickly, and I could read my e-mail with any client I wish.

Well, the protocol is complicated enough that it is quite hard to do any serious processing with simple regular expressions. I first tried incrementally building regular expressions (like the following):

reloginargs = re.compile(_loginargs)
_loginargs = r"(?:%s %s)" % (_userid, _password)
_userid = r"(?:%s)" % (_astring)
_astring = r"(?:%s|%s)" % (_atom, _string)
_atom = r"(?:%s+)" % (_ATOM_CHAR)
_ATOM_CHAR = r"(?:%s)" % (_atom_specials)
_atom_specials = r"(?:[a-zA-Z0-9/])" # HACK
...

This worked reasonably well, except for extracting the data from the matched string was very difficult. I finally decided to bite the bullet and try to use a full parser generator. I initially tried to get reaccustomed to lex/yacc, just to see if it was possible, but I couldn’t quite get my brain around them. Eventually I found a LL(1) parser generator Yapps (Yet Another Python Parser System), which is written in python and produces python code. After looking at the examples for a little bit, along with some experimentation, I was finally able to start on the IMAP grammar file. I spent a few hours, learnt some LL(1) tricks, and I actually have a grammar which understands a good portion of the IMAP protocol (as much or more as the regular expression-based one, and certainly more RFC compliant). It is especially cool because the grammar closely mirrors the EBNF in the “Formal Syntax” portion of the IMAP RFC.

Since I use various clients (Outlook Express and mutt primarily) to test the current state of the IMAP server (and also to see what commands they use, to minimize the commands I must implement), and I’ve pretty much exhausted simply parsing the commands I know, I must move on to actually acting on a parsed command. This should be fairly simple once I think about the right way to do it.

Well, even if this is a big waste of time, I think I’ve learned quite a bit.

Using lex/yacc or a LL parser to parse RFCs?

Programming No Comments »

I spent (wasted?) quite a bit of time today trying to figure out if I can use either lex/yacc or a LL(1) parser generator to parse the IMAP protocol as specified by the EBNF in the RFC. It sure would beat my massive regexps (which actually aren’t that bad to construct, but extracting the data out from matched commands sucks). It would also be useful because this technique may be applicable to other protocols as well.

Unfortunately, I’m rusty with lex/yacc, and I never was very good at them to begin with.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in