Generating test data from ReMarkable
For a while I’ve wanted to train a neural net on my handwriting, as part of a larger test project that involves a drawing interface. To minimize error effectively I need the highest recognition rate possible, so using a generic dataset like IAM for testing is not necessarily the best. Plus, I don’t know enough about machine learning currently to entertain adaptive networks, that start from some generic test set and gradually adapt to the writer’s writing style.
I used to contemplate developing an HTML5 app connecting to a backend that records writing on my iPhone, but although that allowed for on-line input, it was less straightforward for a hobby project. An alternative was to develop a template paper with a grid and scan them, but this was also time consuming. Luckily, I have a ReMarkable tablet to speed up the process!
The ReMarkable has a grid template that makes it easy to enter characters en masse:
The ReMarkable desktop app lets you easily export this writing to a folder of PNGs. Within two days I already have, for instance, 329 images of the letter “x” in my handwriting (extracted as separate images of characters using a Python script). My task does not necessarily require word input (yet), so I’m going to make do with what I have and build the model from here.
One of the things I realized as I was drawing was how similar certain letters are to others (and how different). “a” and “d”, for instance, look similar in my handwriting, the only different being the stem on the “d”. “D” and “O” also could be confused, or “U” and “V.” Using this knowledge we can restrict ourselves to a subset of the Latin alphabet which are all pretty mutually distinct, like X Y Z.














