Discussion:
[python-win32] Possible memory leak in pywin32
Kapil Dolas
2015-01-16 10:43:21 UTC
Permalink
Hi,

I am using pywin32's mapi module to read data from PSTs. I have shared my
program which reads email and attachment data here
<http://pastebin.com/2AXy3BVH> (http://pastebin.com/2AXy3BVH). Currently,
program is not storing any of the read data. But, still I can see gradual
increase in memory usage when I run the program over large PST. That PST
contains around 9000 emails and it has 9 GB of data. Max size of email is
24 MB only. For this PST, program's initial memory usage is about 10 MB,
but it gradually increases and reaches to 40-45 MB. I don't know why this
memory usage increases up to this value. I have tried using pympler to find
the root cause, but without any success. It appears that memory increase is
not due to python objects. Can you point out the reason behind (gradual)
increase in memory usage? Is it due to the memory leak in pywin32, or any
mistakes in my program?

Regards,
Kapil Dolas
Tim Roberts
2015-01-16 17:41:08 UTC
Permalink
Post by Kapil Dolas
I am using pywin32's mapi module to read data from PSTs. I have shared
my program which reads email and attachment datahere
<http://pastebin.com/2AXy3BVH> (http://pastebin.com/2AXy3BVH).
Currently, program is not storing any of the read data. But, still I
can see gradual increase in memory usage when I run the program over
large PST. That PST contains around 9000 emails and it has 9 GB of
data. Max size of email is 24 MB only. For this PST, program's initial
memory usage is about 10 MB, but it gradually increases and reaches to
40-45 MB. I don't know why this memory usage increases up to this
value. I have tried using pympler to find the root cause, but without
any success. It appears that memory increase is not due to python
objects. Can you point out the reason behind (gradual) increase in
memory usage? Is it due to the memory leak in pywin32, or any mistakes
in my program?
MAPI is the poor neglected stepchild in the Windows world. It is
functional, but it has received relatively little optimization
attention. My guess is you're just seeing memory being used by MAPI
itself, perhaps building an index of your mammoth PST.

For what it's worth, 45 MB is nothing. Thunderbird balloons to 400MB on
my machine.
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Mark Hammond
2015-01-21 04:09:09 UTC
Permalink
It's certainly possible, but tracking a pywin32 leak down from such
scant information is not really possible. If you can tweak your program
to narrow down a leak we might have more luck - eg, add pointless loops
that repeat the same operation a thousand times in various places, see
how they change the leak behaviour, then rinse and repeat until you can
see a significant leak from a single MAPI operation so repeated.

Mark
Post by Kapil Dolas
Hi,
I am using pywin32's mapi module to read data from PSTs. I have shared
my program which reads email and attachment datahere
<http://pastebin.com/2AXy3BVH> (http://pastebin.com/2AXy3BVH).
Currently, program is not storing any of the read data. But, still I can
see gradual increase in memory usage when I run the program over large
PST. That PST contains around 9000 emails and it has 9 GB of data. Max
size of email is 24 MB only. For this PST, program's initial memory
usage is about 10 MB, but it gradually increases and reaches to 40-45
MB. I don't know why this memory usage increases up to this value. I
have tried using pympler to find the root cause, but without any
success. It appears that memory increase is not due to python objects.
Can you point out the reason behind (gradual) increase in memory usage?
Is it due to the memory leak in pywin32, or any mistakes in my program?
Regards,
Kapil Dolas
_______________________________________________
python-win32 mailing list
https://mail.python.org/mailman/listinfo/python-win32
Nick Czeczulin
2015-01-21 09:34:26 UTC
Permalink
Post by Kapil Dolas
I am using pywin32's mapi module to read data from PSTs. I have shared
my program which reads email and attachment datahere
<http://pastebin.com/2AXy3BVH> (http://pastebin.com/2AXy3BVH).
Currently, program is not storing any of the read data. But, still I can
see gradual increase in memory usage when I run the program over large
PST. That PST contains around 9000 emails and it has 9 GB of data. Max
size of email is 24 MB only. For this PST, program's initial memory
usage is about 10 MB, but it gradually increases and reaches to 40-45
MB. I don't know why this memory usage increases up to this value. I
have tried using pympler to find the root cause, but without any
success. It appears that memory increase is not due to python objects.
Can you point out the reason behind (gradual) increase in memory usage?
Is it due to the memory leak in pywin32, or any mistakes in my program?
You can try these changes and see if it helps mitigate some of the bloat
your are experiencing in the script you linked:

1. Only call MAPIInitialize() once on the main thread before you do any
other mapi calls. Then call MAPIUninitialize() before exiting the
process. It's been documented that the outlook mapi dll intentionally
leaked the heap as a workaround in previous versions. If you are
creating/destroying multiple instances of MAPIReadTest(), that may be
contributing to some of the bloat you are seeing.

2. Instead of walking the folder hierarchy and opening/caching multiple
folder entries, you can also try using
GetHierarchyTable(mapi.CONVENIENT_DEPTH) instead to get the entry id's
and process them in sequence.

hth,
-nick

Loading...