A simple semi-space collector – wingolog (2022)

42 points by whereistimbo 2 days ago | 7 comments

ogogmad 2 days ago |
This approach makes sense if you have access to either all available memory or a fixed quantity of memory, half of which would go to the fromspace and the other half to the tospace. What I don't understand is: When should the heap be made bigger? What about smaller?
rurban 2 days ago |
Making it bigger requires a stop the world, and moving some objects.
You'll never need to make it smaller, but this requires the same efforts.
But good to see someone finally explaining the simplest and fastest GC, because the objects are compacted, usually in the same cache line. MMTk is way underused
burnt-resistor 2 days ago |
Tiny nit observed intended to be helpful feedback: "Efforts" implies multiple separate projects doing the same thing or many failed projects by one or many groups or people that were abandoned without success. The word "effort" customarily indicates a single project done by one or many people that maybe one or more tasks and may or may not be complete. In the sense of describing vague work, "effort" would be the better word choice. This can be a confusing but offers subtle nuance.
The intent is to help improve writing clarity because, to lawyers and us amateur unsolicited grammar "mall security guards", words and word choices can vary in accuracy and precision based on how they were chosen to increase or decrease confusion. Clarity is important because being imprecise or inaccurate imposes a costlier burden on multiple people other than yourself. English is a difficult and messy language, with slightly different interpretations conventions British and American flavoⓊrs, seemingly arbitrary odd rules, and much imprecision; it's basically Indo-Greek-Latin-Germanic "Creole gumbo". (I can't even learn Spanish, much less Arabic, Hindi, Korean, Mandarin, Russian, or Urdu.) And beware of singulare tantum without plural forms like "feedback", for they are sneaky too.
p_l a day ago |
I believe the original two-space aka semispace collector actually used all of the memory as main memory... then it used a tape as temporary buffer where it wrote "live" objects, and restored from tape again
davexunit 2 days ago |
If you like reading about GC implementation than keep reading wingo's blog. He's currently being funded by NLnet to finish the Whippet GC library.
burnt-resistor 2 days ago |
Since we're talking GCs:
C4 2011 [PDF] https://www.azul.com/sites/default/files/images/c4_paper_acm...
ORCA 2013 [PDF] https://www.ponylang.io/media/papers/opsla237-clebsch.pdf
Pony+ORCA 2017 [PDF] https://www.ponylang.io/media/papers/orca_gc_and_type_system...
Generational GC (as seen in Erlang BEAM) 2008-2010? [PDF, presentation not paper] https://www.cs.purdue.edu/homes/hosking/661-12/20120216.pdf
[PDF, different presentation] https://pdfs.semanticscholar.org/737d/041822cca60a341e4058ba...
Also: https://gchandbook.org
vanderZwan 2 days ago |
Wingolog blog posts are always such a joy to read (for me at least). The combination of low-level fundamental stuff presented in a simple and accessible way that gets the core idea across is a rare quality.
An immediate question I have is whether one could extend compacting GCs to also "sort" or "cluster" the live objects for better cache access in some way (in a practical sense I mean, where the extra runtime overhead is worth it). Also what kind clustering would actually work in that case.
I'm guessing naively tracking how often each object is accessed in a certain window of time and sorting by that wouldn't work, since the most accessed objects could be used at completely different times - some sort of "accessed together" algorithm probably makes more sense. But then we're in the territory of a combinatorial explosion of options if we're not careful.
Maybe something really stupid would work:
1. use a single global integer that increases by one with each heap memory access, 2. assign the counter's value to GC metadata of the accessed object in question as a sort of "time stamp" 3. during compacting, "sort" all living objects by whatever they have as their value (this is probably really hard) 4. after compacting, reset counter and time stamps
Worst case this would put objects right next to at least two objects that were accessed around the same time.
I presume that this question must have been researched to death, does anyone have any suggestions?