Commit Graph

10 Commits

Author SHA1 Message Date
9845da5a7e Update learning code.
Now it tracks distance by character, and determines whether to show
furigana based on how long it's been since the last time a word was
shown with furigana rather than the last time a word was shown at all.

Also some minor performance efficiency improvements.
2024-09-20 07:38:02 +02:00
ba5fea6e0a Attach pitch accent indicators in a more reasonable way.
We give it a class so CSS styling can be used on it more easily.
2024-09-18 15:08:53 +02:00
adb58983a7 Add option to include pitch accent information with the furigana 2024-09-18 12:10:22 +02:00
7361240e49 Add option to use hiragana instead of katakana for the generated furigana. 2024-09-17 08:32:51 +02:00
0266341f99 Fix stupid bug in furigana application.
It would sometimes result in characters getting swapped.
2024-09-16 08:28:11 +02:00
4b48f86824 Rework of various things.
This way the main `FuriganaGenerator` can be shared among multiple
threads.

This also adds substitutions for words that the tokenizer insists on
using the less common pronunciations for.
2024-09-15 08:55:03 +02:00
d79cc60a48 Tweak the learning algorithm.
It was both too conservative and not conservative enough in different
circumstances.
2024-09-11 13:25:10 +02:00
44cb2b8bda Make building faster. 2024-09-11 11:22:14 +02:00
ecbac83e26 Add function to get word stats after processing. 2024-09-11 11:14:12 +02:00
1c3afed157 First commit.
A furigana generator, that can do "spaced repetition" style reduction
of furigana over the course of a text.
2024-09-10 18:45:58 +02:00