Kolja Sam Pluemer

In search of a better Spaced Repetition algorithm #1

Thoughts on how to optimize my flash-card based learning


I have a problem: Right now, my learning software tells me that I have 5,239 flash cards to review today. That’s not ideal.

Why is that a problem?

My - self-made - learning and productivity software (https://kaado.io) uses Spaced Repetition (SR). Looks like this:

screenshot of a flash card in kaado

Now, one core tenet of SR is that learning items are shown just before the brain would forget them. If they are shown after that point in time, something bad happens. I am not sure what exactly, to be fair. That’s one of the reasons this article exists.

Anyways, obviously I cannot honor five thousand cards with Active Recall today. Doesn’t work. Thus, we can presume that today’s efficiency of my learning system is very much suboptimal.

What determines the number of due learning items and why is it so high?

Let’s do some cause analysis on how I got into this situation:

1. I have a lot of topics and learning items relating to those

The obvious answer as to why I have so many damn cards due is that I am using my software to try and learn loads of stuff. World capitals, Python, plants, design rules, Italian. It adds up.

However, I like it like that. I am in this for the long run, so I don’t mind the volume. So this paragraph was mostly to get the obvious duh answer out of the way. SR can and is used to learn complex and sprawling topics, so quantity should not be a problem. It just has to be handled well. Which my software is currently not, which is what this post is about.

2. Quality of the Spaced Repetition algorithm

Another thing with SR is that the interval passing between reviewing a given learning item goes up the better you memorize it. Which, for a well designed item, means that the interval goes up over time. The longer an item lives in the system, the larger it’s interval. On average, that is.

Calculating this interval is the job of an algorithm. Algorithm is not an euphemism for shitty AI here, it means just something taking an input (for example when have I looked at this least and did I remember the item just know) and spits out an output (like this item shall be reviewed again in 27 days). This logic can indeed include neural networks, or be so simple that you can do it by hand (see for example Leitner Box).

Now, this has an important implication:

The better the Spaced Repetition algo works, the faster card intervals become so large that they stop clogging up the system.

As in:

  • If I have 365 cards and my system is so bad that I will never memorize them I will have 365 due cards every day.
  • If I have 365 cards and my system works well, I quickly get to the point where intervals between reviews will be very large (say for example 1 year), meaning I have only 1 card due per day on average, with that value continually falling

3. Method of introducing new cards into the system

The last component that determines daily due cards is how (and specifically how many) new cards are introduced and when.

Example: I commit to learning French and determine that knowing the ten thousand most common words would be helpful. So I have 10k cards, or 20k when learning bidirectionally. Obviously, reviewing a gazillion cards on your very first day of learning is somewhere between boring, frustrating and impossible, so you don’t. Instead, you gradually introduce them. But how?

We will discuss approaches later, let’s just save this as the second important pillar determining the number of items we have to review on a given day.

Check: Preliminary goal and how to get there

With all this in mind, let’s quickly review:

  • We want to fix the problem of having too many daily due cards
  • We know of two things we can improve:
    • Quality of SR algorithm, so cards ‘leave’ the daily rotation quicker
    • Quality of card introduction algorithm, so the system doesn’t get overwhelmed by new cards

Current system

Let me quickly share how my system currently works:

My Spaced Repetition algorithm

My SR algorithm is simply SM-2. Why? It’s open, it’s obvious, and I found a javascript library for it. Good choice at the time, far from ideal from a learning standpoint.

For this research, I finally gritted my teeth on the supermemo wiki long enough to gain a rough understanding of what SM-2 actually does, so let’s take a step back first:

How SM-2 works (roughly)

To be clear, this is a nothing but a rough sketch of the same information detailed on the supermemo wiki and probably better explained elsewhere. Anyways, my understanding:

  • SM-2 is an algorithm calculating the interval when a given item is to be reviewed again
  • The first review interval is always one day, and the second one six days
  • After that, it calculates the next interval according to a formula based on the last interval and your judgement on how well you remembered

I think this simplification will work fine for now. We are going to look at problems and alternatives later.

My introduction algorithm

In my app, new cards are introduced as follows:

  1. A random number is generated so that…
    • …in 90% of cases a learning card that already occurred is picked
    • …in 10% of cases a new learning card (never reviewed before) is picked
  2. If we want a novel card but there is none, we get an ‘old’ one instead and vice versa

Now this may seem a weird choice as opposed to for example just introducing 10 new cards every day. These are my reasons for going this way:

  • it’s fairly easy to implement
  • it produces a predictable and fairly desirable outcome for a multitude of scenarios:
    • no new cards → only old cards are shown until you’re done
    • no old cards because you are just starting out → you get to flow through your fancy new learning deck until you run out of motivation
    • the number of introduced cards scales with the amount of time the user spends learning (more cards reviewed, more cards introduced)
  • it does not require storing extra values somewhere nor messing around with datetimes, which is always a pain

Problems with my current system

Now we know the symptom (so many due cards!) and a little about the underlying systems, let’s get to the juicy part. What’s going wrong?

Disadvantages of SM-2

There is really just one big one: It’s over thirty-five years old!

Now of course, math usually doesn’t spoil, but in this case we are talking about a computational approximation of a psychological phenomenon. And that very much improves with time and research invested.

Piotr Wozniak - inventor of SM-2 - and his company SuperMemo themselves have made leaps and bounds in the last three decades. The fact that their current SR algo is called SM-18 attests to this. Even SM-5, also born in the eighties, seems to be already quite ahead of SM-2 (see comparison).

That begs the question: Why not just use SM-18?

Short answer: Because the algorithm isn’t open, and using SuperMemo (the software) is not an option for me. More on that later.

Disadvantages of my introduction algorithm

The problems with my every tenth card is a new card are:

  • It evidently clogs the system with way too many daily required reviews
  • More subtly, it also has no concept of a new learning card that isn’t due - if you add 500 new cards, you have 500 more due cards, then and there. Doesn’t really matter for the math, matters for morals.

Alternatives, solutions and ideas

Now, let there be sun. This is going to be the least organized section of this post, where I discuss different approaches and concepts that may help me (and maybe you) with all this.

What even is a good SR algo?

I figured that should I consider swapping out my SR algorithm, I should probably have a means of evaluating whether it was a good move. My dive into this was not too deep as of yet, so here are some pointers:

  • There is this article in the supermemo wiki discussing means of comparing Spaced Repetition algorithms
    • It mentions Forgetting Index as a favored yet flawed value to base judgements of SR algos on
    • Internally, supermemo uses the R-Metric to compare algorithm performance, which is calculated from the difference of the predicted recall and the learner’s rating on how well they remembered an item
    • There is also the B-W metric, which does some math based on the difference of retrievability and the learner’s score. I am not super sure how that is different.
  • The supermemo wiki also briefly defines the concept of Burden, which may be useful for my specific problem
  • There is plenty opportunity to get very meta here: Comparing spaced repetition algorithms for legal digital flashcards finds difference in retention depending on the SR algorithm used, but not in actual applied performance

Alternative SR algorithms

So, what algorithm shall one move to then? I am not sure yet, and with the considerations discussed above turning out a bit more sticky than I expected, I also have not researched that broadly, yet. Again, some pointers:

  • While of course everything SuperMemo writes about their own software must be considered an ad in some capacity, SM-18 does indeed appear to be very good. However, it seems to require recursive reading and understanding of the wiki articles of both every predecessor and a wide range of fundamental concepts. Furthermore, I am unsure of the legality of actually implementing SM-18.
  • Anki uses SM-2 and also doesn’t really consider anything but SM algos. Their reasons for not “upgrading” to a newer version seem similar to mine - organizational, legal and product reasons trumping the advantage of superior learning.
  • A random Reddit comment claims that SM-15 is actually quite straight-forward to implement. May be worth researching.
  • Wikipedia’s Spaced Repetition article brings up Neural Network, interestingly citing yet another SuperMemo post from 1998 describing an approach. I am sure there will be plenty papers attempting this.
  • As a definite to do, check the Google Scholar Results for implementations of alternative SR algos

Alternative introduction algorithms

Back to how many new learning items a day should be introduced. Now, this is a frustrating one.

Everybody just seems to be guessing.

On Reddit, one can peruse /r/Anki to find threads of everyone just happily sharing their arbitrary magic numbers for new-cards-a-day, which is a user setting in Anki.

While I am all for user customization, surely there must be a mathematical optimum here, right? I can’t imagine that letting my irrational brain guesstimate a good number is the solution here.

Other apps aren’t much better. DuoLingo is similarly user-controlled, just less direct. Users chose whether they want to repeat old lessons or move on to new ones, with fans creating complex systems like the Waterfall Method to balance these choices. Interesting, but hardly a silver bullet.

Lacking a Windows system and an explanatory article, I couldn’t find out how SuperMemo does it. From the wiki it seems that what they call overflow (having an excess of reviews) is tolerated or encouraged and then dealt with a variety of tools.

Currently I am considering building a system similar to Anki, releasing a set amount of new cards per day, but adapting this load according to how many cards the user usually reviews on a given day, maybe combined with future projections of how many cards there will be due based on how review intervals are developing.

Sounds painful.

Maybe some papers with standalone implementations of pure SR algorithms will give me some inspiration.

Closing words

I hope this rambling article brings some value, if only to my future self. Reading papers it is. But first, let me create a Flags of the World deck anyways. Cheerz!