[SGVLUG] E-mail archiving -- looking for ideas & techniques

Chris Smith cbsmith at gmail.com
Tue Nov 29 18:12:00 PST 2005


On 11/29/05, Emerson, Tom <Tom.Emerson at wbconsultant.com> wrote:
> > Behalf Of Chris Smith
> >
> > What mail client and server pair are you using? It shouldn't strictly
> > be necessary for the "comparing notes" time to increase as you get
> > more messages in the folder.
>
> The server is Courier-imap on my "visable" machine [most of the "holes" poked through my
> firewall go to this machine, specifically for things like imap...]  This machine has duel
> 500mhz P-III processors and at least 512meg of memory (might actually be a gig...)  From
> external locations, this passes through a 384kbit/sec DSL (same speed both ways, though
> still technically an "aDSL" setup...)

Courier uses the maildir format, IIRC. That format, while better than
mbox format, has scaling problems with thousands of messages and
fundamentally restricts some of the things that Courier can do to
improve performance. You might want to look at Cyrus, who's mailstore
is kind of maildir++ (it basically creates some additional files to
cache headers info and similar bits).

> At home I'm using kMail (part of KDE).  Since this is "inside" the firewall, the connection is
> 100mbit/sec.  The client in this case is a 64-bit AMD at 2.??? ghz (AMD 3200, I think, or
> possibly 3400...)  When I click on the "sgvlug discussion list" folder, kmail takes at least 5
> seconds to syncronize

Last I checked, kMail's approach to IMAP wasn't much more intelligent
than Evolution's (which isn't a good thing). I've heard it's improved,
but you may get better results using thunderbird.

> When "out in the wild", I generally use thunderbird, though the other night I had the
> opportunity to use the native Mac OS-X e-mail client (simply known as "mail")  I think in it's
> "default" configuration, it was attempting to "download" all messages (even though I had
> specified "IMAP" as the server type -- fundamental misunderstanding of why someone
> would use IMAP, I suspect, though in Apple's defense I believe it was loading a copy of
> every message so that it could perform content-based searches locally...)

Yeah, the default config for the OS-X client is to download messages
smaller than a certain size. The thinking is that for small messages
the extra download time won't matter. This tends not to be quite so
true when you have thousands of messages. ;-)

Also the rational for downloading the messages isn't so much for local
content-based searches, but rather to allow for better off-line
reading capability. One of the key points of IMAP is that it allows
you to sync with a server and then disconnect and manage your e-mail
while offline. Sadly, few mail clients really take advantage of this
as much as they could.

> I "just now" ran thunderbird and it took 12+ seconds to "open" the sgvlug folder.  There are
> 7112 messages in that folder, 3 of which were "unread".  another folder with 8000+
> "unread" messages is still loading as I write this (it took 30-ish seconds between when I
> pressed "get mail" and when it started retrieving the headers...)

So, to give you a perspective, I use Cyrus and Thunderbird primarily.
My UUASC folder has over 12,000 messages on it. I opened it up in
thunderbird and it took about 18 seconds to sync. During the sync it
received information about 87 new headers. If I go back to my inbox
and then back to UUASC it syncs in ~1 second. The machine it was
syncing with is a *laptop* with a 700MHz PIII w/192MB of memorry and a
painfully slow 4500rpm laptop hard drive with the kind of access times
that only a laptop user could love. The hard drive on it is not only
painfully slow, but it is actually *failing*, and I often get IO
errors that require a retry (yes, I need to get my data off there
right quick). Indeed my bet is that the bulk of my wait time when
touching the UUASC folder was waiting for the IO timeouts (which would
go away on subsequent attempts because the relevant data would be
cached in memory).

In general the sync times seem to scale pretty much in relation to how
many message headers I don't have on the client.

Anyway, I can't swear to you that Cyrus is going to universally
perform better than Courier, but I suspect it will.

--
Chris


More information about the SGVLUG mailing list