July 10, 2021

Comparison of garbage collection algorithms

Overview

In this article we will compare various garbage collection algorithms and will give a brief description to the algorithm used in Starlight.

Root set

Before we start looking at algorithms we should know what 'root set' is. Root set in Starlight and in most of other JS engines would include:

  • Any object references in the local variables and stack of any stack frame and any object references in the global object.
  • Any object references that were explicitly rooted using letroot! in Starlight.

Basic algorithms

1. Reference counting

Reference counting garbage collection algorithm stores reference count for each object on the heap and when some object has zero reference count it is considered dead and object is instantly freeed.

The count is changed when:

  • When object first created it has one reference count
  • When any other variable is assigned a reference to that object, the object’s count is incremented.
  • When object reference does exit the current scope or assigned to the new value its reference count is decreased by one
  • When an object is garbage collected, any objects that it refers to have their reference counts decremented.

Pros:

  • Reference counted collector can run in small cycles or not trigger GC cycles at all and free dead objects on the fly, while mutator runs
  • Reference counted collector is very simple to implement and does not require advanced knowledge from the developer.

Cons:

  • Reference counting does not detect cycles: two or more objects that refer to each other. An simple example of cycle in JS code:
let object = {} // reference count of object is 1
object.field = object; // reference count of object is 2
object = null; // reference count of object is 1

As we can see there's no way RC collector can decrement object reference count after we assign it to null and this way it is still kept on the heap leaking memory.

  • Reference counting is the overhead of incrementing and decrementing the reference count each time.

Because of the disadvantages it is not widely used in JS engines and other VMs. I know that only QuickJS and CPython does use reference counting (although they have cycle collectors that might run sometimes).

2. Tracing

Tracing garbage collectors actually trace out the graph of references starting with the root set. Objects that are encountered during the trace are marked in some way. After the trace is complete, unmarked objects are known to be unreachable and can be garbage collected.

The basic tracing algorithm is called mark and sweep. In Starlight the sweep phase must include running destructors of objects.

Algorithms & Implementation strategies

Moving & non-moving collectors

Once all the reachable objects are marked GC might decide to free all the unreachable objects to the OS or to put it into its own free list or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" garbage collectors, respectively.

Pros of moving collector:

  • No additional work is required to reclaim the space freed by dead objects; the entire region of memory from which reachable objects were moved can be considered free space. In contrast, a non-moving GC must visit each unreachable object and record that the memory it occupied is available.
  • When entire large chunks of memory can be considered "free" GC might allocate objects very quickly by simply incrementing a pointer. While non-moving GC would need to use free lists for allocation and long-running programs might lead to the high fragmentation of the heap leading to slow allocation speed and lots of collection cycles.
  • If specific traverse order is used objects might be moved in memory close to each other which leads to more effective use of CPU cache and OS paging.

Cons of moving collector:

  • The moving collector requires every object to be reachable by the GC. This means some technique must be used to track references on thread stack so GC can update them properly.

Pros of non-moving collector:

  • Predictable performance of the program. If your program does not trash the heap and does not fragment it then non-moving collector has good performance.
  • No need for every single reference to be reachable by GC directly, this way non-moving GC can be conservative.

Cons of non-moving collector are pretty much described in the pros of moving collector.

Compacting vs copying vs mark&sweep vs mark&lazy sweep

Not only do collectors differ in whether they are moving or non-moving, they can also be categorized by how they treat object references during the cycle and how their cycle works.

The most simple approach is semi-space copying collector. In this moving collector, heap is separated into an equally sized "from space" and "to space". Initially, objects are allocated in "to space" until it becomes full and a collection cycle is triggered. At the start of the cycle, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied into "to space", until all reachable objects have been copied into "to space". Once the program continues execution, new objects are once again allocated in the "to space" until it is once again full and the process is repeated.

This approach is very simple, but since only one semi space is used for allocating objects, the memory usage is twice as high compared to other algorithms.

The second approach is mark-compact collector. Mark-compact algorithms can be regarded as a combination of the mark-sweep algorithm and copying algorithm. First, reachable objects are marked, then a compacting step relocates the reachable (marked) objects towards the beginning of the heap area. Such an algorithm is used in Hermes to collect old space objects. This algorithm is quite complex and has a few ways of implementing it and I recommend reading Wikipedia article about it.

Pros of mark-compact collector:

  • Memory is compacted the same way as in copying collector
  • Heap is not separated to "to space" and "from space". Instead all of the heap memory can be used for allocation.
  • Fast allocation just like in copying collector by simply incrementing a pointer.

Cons of mark-compact collector:

  • Slow garbage collection cycle because of complexity of compacting the heap. In the simpliest implementation of mark-compact entire heap might be iterated four times!

Because of the slowness of the GC cycle compaction is usually used in pair with mark-sweep to compact highly fragmented heaps. For example, V8 might compact old space when it becomes fragmented.

Mark-and-sweep approach. Such collector keeps a bit or two in the GC object header to see if an object is alive or not. Note that these bits might be stored in separate bitmap thus removing memory overhead of allocation. Since we described mark-sweep in the basic algorithms I'll just move to pros&cons:

Pros:

  • Once live objects are marked moving or non-moving algorithm might be used determined by the memory available to the runtime.
  • Allows for fast concurrent collection by just requiring user to use write barriers without read barriers. (we will discuss them later)

Cons:

  • Might lead to a fragmented heap. (highly depends on the algorithm used for allocation)
  • Sweep phase has to walk entire heap to free dead objects.

Mark-and-don't sweep aka mark-and-lazy sweep. Such collector simply moves sweeping phase to the allocator and frees objects on demand. Whenever a program tries to allocate memory and free list is empty allocator might just sweep unsweeped chunk of memory.

Pros:

  • GC cycle consists of only marking
  • Has high space efficiency

Cons:

  • Walks entire heap before GC cycle to reset mark bits at the start of the GC cycle. Note that it is faster than sweeping in the GC cycle but it still takes some time. If bitmap is used then this this does not apply.

Such algorithm is used in JavaScriptCore.

Generational GC

It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as the generational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects.

In this approach, the heap is divided into two or more sub-heaps, each of which serves one “generation” of objects.

  • The youngest generation is garbage collected most often. As most objects are short-lived, only a small percentage of young objects are likely to survive their first collection.
  • Once an object has survived a few garbage collections as a member of the youngest generation, the object is promoted to the next generation: it is moved to another sub-heap.
  • Each progressively older generation is garbage collected less often than the next younger generation. As objects “mature” (survive multiple garbage collections) in their current generation, they are moved to the next older generation.

The generational collection technique can be applied to mark and sweep algorithms as well as copying algorithms. In either case, dividing the heap into generations of objects can help improve the efficiency of the basic underlying garbage collection algorithm.

Conservative vs precise vs mostly-precise

GC can be conservative or precise or mostly precise. Some collectors can correctly identify all pointers (references) in an object; these are called precise (also exact or accurate) collectors, the opposite being a conservative or mostly-precise collector. Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object.

Precise

Precise collector knows about each reference in the program which allows for moving collector implementation. Although there is one downside: these references have to be somehow tracked. In SpiderMonkey "shadow stack" is used, it is a simple linked list that holds all the GC references on the C++ side. In V8 before switching to CPPGC special handles were used which were allocated in local heaps. These ways of handling references introduce double indirection, decrease performance and increase memory usage.

Conservative

Conservative collector does not have any information about reference except its size. This collector simply assumes that anything that looks like GC pointer is a GC pointer and this allows for simpler FFI between GCed and non GCed language. All the conservative collectors use mark&sweep collector since there is no way to move references when pointers are identified conservatively.

Pros:

  • Simple to use does not require the user to provide special functions for tracing objects.
  • Does not have the overhead of tracking references on the stack (i.e does not use handles).
  • Allows for pointer arithmetic (although there are some collectors that does not allow this).

Cons:

  • Collector might identify random integer as a reference to GC object which might result in actually dead object to be alive again.
  • Pointer identification might be costly in some cases.

Mostly-precise

Mostly-precise collector combines benefits of precise and conservative collectors. It allows moving objects around while not requiring use of handles to keep track of them. Usually to move objects all the objects identified on the thread stack conservatively are "pinned" on their locations (usually using additional bit in the header) and not moved during the garbage collection cycle.

Pros:

  • Does not make use of handles for tracking references
  • Allows to move unpinned objects around during collection which might decrease fragmentation.
  • Allows for FFI between GCed and non-GCed language.

Cons:

  • Might identify random integer as pointer although this is very rare on 64bit platforms.
  • Pointer identification might be costly in some cases.

Concurrent and incremental collectors

Concurrent and incremental collectors do not stop the program for GC cycle but pefrorm work in parallel to the program or stop it for small cycles respectively. In order to achive this write and read barriers are used.

Write and read barriers

1. Write barrier: Write barrier is simply used when concurrent or incremental collector mark objects and during the marking new objects are allocated. In order for GC to see these new objects write barriers catch writes of new objects to already marked objects. Here's a simple implementation of write barrier:

function writeBarrier(object,field) {
      if (isMarked(object) && isNotMarked(field))
         gcMark(field); // mark new field
      
}

2. Read barriers: Read barriers are used when collector is moving. They help to get correct reference to the object when collection is running:

function readBarrier(object) {
      // if gc copied object we return new location of it
      if (copied(object)) return newLocationOf(object);
      return object;
}

Incremental collector

An incremental collector does not run parallel to the program but instead stops the program to perform a small amount of work. This sometimes allow to achieve hard-realtime requirements for your programs. Typical run of incremental GC looks like this:

Program -> GC start -> Program -> marking -> Program -> marking -> Program -> Sweep -> Program -> Sweep -> GC end 

Pros:

  • Allows for hard real-time systems to be implemented
  • Great for games and other real-time tasks

Cons:

  • Collects garbage very slowly and tends to have higher memory usage.
  • Still stops the program.

Concurrent collector

The concurrent collector is an upgraded incremental collector. Concurrent collector runs parallel to the program and stops the program only when it starts GC, switches GC states, and ends GC.

Pros:

  • Does not interrupt program for long periods of time.
  • Very good for games.
  • Might run in multiple threads at the same time this way increasing throughput.

Cons:

  • Requires additional threads to run which might increase memory usage.
  • Program might allocate garbage faster than collector collects it. Some collectors might decide to fully stop the program. (i.e JavaScriptCore)
  • Requires use of safepoints which are hard to implement and waste CPU cycles in some times.

What does Starlight use?

Starlight uses mostly-precise mark-and-sweep collector. To somehow solve conses of mark-and-sweep and mostly-precise colletors we have:

Simple segregated storage

Simple segregated storage is used to allocate small objects (<=8KB) in 16KB blocks. Each block has its own cell size which allows for very fast allocation through freelist and fast iteration for sweeping. Large objects are allocated using mimalloc.

Bitmap marking

Bitmap marking allows us to not waste bits in GC header although we have tri-color marking for future implementation of concurrent collector. Bitmap marking also helps us to identify references conservatively. We maintain two bitmaps: mark bitmap and live bitmap. Live bitmap gets updated on each allocation and we test pointers on the stack by simply checking bitmap.

Conclusion

There is no good-for-everyone collector to use. But there are some that might perform OK in most of the programs i.e mostly-precise concurrent collector. If you're just starting your journey to the VMs and GCs I would recommend implementing precise mark&sweep collector or precise copying collector.

Further reading

Introducing Riptide:WebKit’s Retreating Wavefront Concurrent Garbage Collector

High-performance garbage collection for C++

Incremental Garbage Collection in Ruby 2.2