Schema for extracting language learning data from wikitravel phrasebooks

Schema for extracting language learning data from wikitravel phrasebooks

  • wikitravel phrasebooks (example) are a potent and open source of language learning data

  • we will probably need an LLM to extract data in a way that can be used by a learning algo, because formatting is not consistent

  • this here is a sketch for a data format which an LLM could use to get a page and output JSON — which then can be used by all kinds of flashcard or adaptive language learning apps