I saw a hyped-up science news article about this paper and got briefly nerd sniped trying to figure out what was going on. I still don’t know.
Both the news article and the paper itself make it sound like this is some … fully general neural net approach for solving PDEs, that works for any PDE, and is fast and accurate … and doesn’t even need to know the PDE, it just learns it from solution data, and then after you “train” it on one solution it knows the PDE, and can produce other solutions.
And I’m like, that can’t be real, right? You can’t learn an infinite-dimensional operator from a finite sample. They must be choosing to prefer some operators over others, all else being equal. Also, what does this look like formally as a statistics problem, what measure are the operators being sampled from …
I found this near-contemporaneous paper which goes into more detail, but still doesn’t resolve my confusion. I also found this paper, cited as a competing method (although it shares several co-authors), which goes into much more mathematical detail and proves a general approximation theorem.
I don’t have the energy and interest to read through all these and figure out exactly what they’re actually doing, especially since I suspect it’s not that interesting.
(If PDEs are “generically learnable from finite samples under reasonable conditions” in some non-trivial way, that’s very interesting, but seems like something one could discover with pen and paper, and then go on to win prizes for discovering, without even needing a computer. I wouldn’t expect such a discovery to look like these papers.)
But if anyone else feels like reading these papers, let me know what you find out!
I’ve been reading the Li papers. They are pretty standard incremental-progress-in-NN stuff. True, finding optimal infinite dimensional operators is even nastier than finding finite-dimensional maps, in principle. But practically, as in much NN research, we are far from the regime of proving ultimate awesomeness and actually in the regime of going “oh cool this architecture works suprisingly less in practice”. The architecture in this case is a frankenstein mashup of kernel-learning/basis function decomposition smashed onto (1,1,) convolutions which has an efficient implementation in practice. As for what types of PDEs this biases us towards, jury is still out, but heuristically, some low-frequency wavs plus some wiggly bits at each layer sounds a lot like it might encode classic PDE solvers. If you are prepared to give up the resolution-”independence” of the Li papers, there is a classic series of papers by Haber and Ruthotto that develop a sort-of-duality between Resnets and PDE solvers which claim that you can translate between the two viewpoints and between different resolutions, sort-of:http://arxiv.org/abs/1703.02009 and http://arxiv.org/abs/1804.04272 are a good entry point. Practically, I quite like the Li papers and will probably use them, especially if I can find a Bayesian interpretation, but I don’t think there is anything Next Level here, just a Sweet-Hack. I blogged some more on this theme.













