Pat Shaughnessy dissects how much work Ruby has to do to give you a string

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

Easy is difficult, and this is a great look at how much work Ruby has to do so that you, the software developer, can change your mind about what kind of string you want:

The standard and most common way for Ruby to save string data is in the “heap.” The heap is a core concept of the C language: it’s a large pool of memory that C programmers can allocate from and use via a call to the malloc function. For example, this line of C code allocates a 100 byte chunk of memory from the heap and saves its memory address into a pointer:

char *ptr = malloc(100);

Later, when the C programmer is done with this memory, she can release it and return it to the system using free:

free(ptr);

Avoiding the need to manage memory in this very manual and explicit way is one of the biggest benefits of using any high level programming language, such as Ruby, Java, C#, etc. When you create a string value in Ruby code like this, for example:

str = “Lorem ipsum dolor sit amet, consectetur adipisicing elit”

You can see here that the RString structure contains two values: ptr and len, but not the actual string data itself. Ruby actually saves the string character values themselves in some memory allocated from the heap, and then sets ptr to the location of that heap memory, and len to the length of the string.

Here’s a simplified version of the C RString structure:

struct RString {
long len;
char *ptr;
};

I’ve simplified this a lot; there are actually a number of other values saved in this C struct. I’ll discuss some of them next, and others I’ll skip over for today. If you’re not familiar with C, you can think of struct (short for “structure”) as an object that contains a set of instance variables, except in C there’s no object at all – a struct is just a chunk of memory containing a few values.

I refer to this type of Ruby string as “Heap String,” since the actual string data is saved in the heap.

This is the overall work that Ruby has to do for you:

Whenever you create a string value in your Ruby 1.9 code, the interpreter goes through an algorithm similar to this:

Is this a new string value? Or a copy of an existing string? If it’s a copy, Ruby creates a Shared String. This is the fastest option, since Ruby only needs a new RString structure, and not another copy of the existing string data.

Is this a long string? Or a short string? If the new string value is 23 characters or less, Ruby creates an Embedded String. While not as fast as a Shared String, it’s still fast because the 23 characters are simply copied right into the RString structure and there’s no need to call malloc.

Finally, for long string values, 24 characters or more, Ruby creates a Heap String – meaning it calls malloc and gets some new memory from the heap, and then copies the string value there. This is the slowest option.

Source