PHP memory management is bloated

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:

PHP memory management is weak compared to something like the JVM. Perhaps that is because PHP scripts usually execute in a few seconds and then stop, whereas JVM software often has to maintain 99.99% uptime?

In this post I want to investigate the memory usage of PHP arrays (and values in general) using the following script as an example, which creates 100000 unique integer array elements and measures the resulting memory usage:

$startMemory = memory_get_usage();
$array = range(1, 100000);
echo memory_get_usage() - $startMemory, ' bytes';
How much would you expect it to be? Simple, one integer is 8 bytes (on a 64 bit unix machine and using the long type) and you got 100000 integers, so you obviously will need 800000 bytes. That’s something like 0.76 MBs.

Now try and run the above code. You can do it online if you want. This gives me 14649024 bytes. Yes, you heard right, that’s 13.97 MB - eightteen times more than we estimated.

So, where does that extra factor of 18 come from?

For those who don’t want to know the full story, here is a quick summary of the memory usage of the different components involved:

| 64 bit | 32 bit
zval | 24 bytes | 16 bytes
+ cyclic GC info | 8 bytes | 4 bytes
+ allocation header | 16 bytes | 8 bytes
zval (value) total | 48 bytes | 28 bytes
bucket | 72 bytes | 36 bytes
+ allocation header | 16 bytes | 8 bytes
+ pointer | 8 bytes | 4 bytes
bucket (array element) total | 96 bytes | 48 bytes
total total | 144 bytes | 76 bytes
The above numbers will vary depending on your operating system, your compiler and your compile options. E.g. if you compile PHP with debug or with thread-safety, you will get different numbers. But I think that the sizes given above are what you will see on an average 64-bit production build of PHP 5.3 on Linux.

If you multiply those 144 bytes by our 100000 elements you get 14400000 bytes, which is 13.73 MB. That’s pretty close to the real number - the rest is mostly pointers for uninitialized buckets, but I’ll cover that later.

Now, if you want to have a more detailed analysis of the values mentioned above, read on :)

The zvalue_value union
First have a look at how PHP stores values. As you know PHP is a weakly typed language, so it needs some way to switch between the various types fast. PHP uses a union for this, which is defined as follows in zend.h#307 (comments mine):

typedef union _zvalue_value {
long lval; // For integers and booleans
double dval; // For floats (doubles)
struct { // For strings
char *val; // consisting of the string itself
int len; // and its length
} str;
HashTable *ht; // For arrays (hash tables)
zend_object_value obj; // For objects
} zvalue_value;
If you don’t know C, that isn’t a problem as the code is pretty straightforward: A union is a means to make some value accessible as various types. For example if you do a zvalue_value->lval you’ll get the value interpreted as an integer. If you use zvalue_value->ht on the other hand the value will be interpreted as a pointer to a hashtable (aka array).

But let’s not get too much into this here. Important for us only is that the size of a union equals the size of its largest component. The largest component here is the string struct (the zend_object_value struct has the same size as the str struct, but I’ll leave that out for simplicity). The string struct stores a pointer (8 bytes) and an integer (4 bytes), which is 12 bytes in total. Due to memory alignment (structs with 12 bytes aren’t cool because they aren’t a multiple of 64 bits / 8 bytes) the total size of the struct will be 16 bytes though and that will also be the size of the union as a whole.

So now we know that we don’t need 8 bytes for every value, but 16 – due to PHP’s dynamic typing. Multiplying by 100000 values gives us 1600000 bytes, i.e. 1.53 MB. But the real value is 13.97 MB, so we can’t be there yet.