Zeno's Notes @zenoga - Tumblr Blog

Posts

Mit der Stichwortsuche können Sie nach den folgenden Informationen suchen: "Badestelle", "Bezirk" und "Badegewässerprofil".

Wasserqualitäts-Website für Berlin. Wichtig im Sommer!

Interesting blog post with an interesting idea and interesting pointers ...

Nice example of a refined workflow to solve a quite specific problem. But very nice! Encourages me to use InkScape a bit more in the near future ...

Really high-quality Python 3 cheat sheet.

Now on GitHub: Generic Qualitative Reasoner

Before I committed myself to the field of machine learning more than 10 years ago, I worked on symbolic artificial intelligence, in particular qualitative spatial and temporal reasoning. The software package I started back then was continued by other people, and its source code was moved to GitHub one year ago: https://github.com/m-westphal/gqr

#ai freesoftware opensource foss floss

E-Mail and RSS are having a comeback

... and I like it:

https://www.wired.com/story/rss-readers-feedly-inoreader-old-reader/

https://delta.chat/

Koloboke - Java Collections till the last breadcrumb of memory and performance

This is a public service announcement in the interest of more robust numerical calculations. Like matrix inverse, exponentiation is bad news. It’s prone to overflow or underflow. Just try this in R: > exp(-800) > exp(800) That’s not rounding error you see. The first one evaluates to zero (underflows) and the second to infinity (overflows). …

#underflow

In my experience, mathematics suffers from a diminishing return effect. The basics are very, very useful… but more advanced mathematics often turns into solutions seeking problems.

To be smart, work on problems you care about – Daniel Lemire's blog

Berlin Buzzwords 2016

After skipping last year, it was time again to visit this nice conference. Here is a brief subjective overview of some talks I saw.

Edit: Added video links. Just click on the bold-face titles.

Acceptably inaccurate: Probabilistic data structures. James Stanier presented Bloom filters (for approximating set membership), count-min sketch (frequency counting), and HyperLogLog (multiset cardinality). The latter was new to me. For each case, James showed Java code for the exact and approximate solution side by side, and compared memory usage for typical problem sizes. Through this I also learned that Guava now (well ... since 2011) has a Bloom filter implementation. Count-min sketch and HyperLogLog implementations are available in stream-lib.

Edit: A reader pointed out that Adrien Grand talked about HyperLogLog++ (among other things) at Berlin Buzzwords 2015.

Shipping at Scale: ChatOps at GitHub by Georgi Knox. Instead of from the command line, GitHub operates their servers via chatbots. Think of it as a shared shell -- everybody sees what you are doing, and its consequences. This helps to collaborate on ops issues, especially in a distributed work environment.

ExpAn - A Python library for advanced statistical analysis of A/B tests. Dominic Heger from Zalando presented a library for analyzing A/B tests. It is based on numpy, scipy, and Pandas. Of course it does not do advanced consistency checking, nor will it automagically connect with the software system that you are trying to improve, or even come up with the next A/B test you should run -- even if some in audience were clearly asking for those features ;-).

Apache Lucene 6: What's coming next? We use Lucene quite a bit at HERE Maps Search. Uwe Schindler’s talks are always fun and interesting for me, given that he usually talks about Lucene details, features, and internals. This talk not only coverered new features of Lucene 6 (like the multi-point value reimplementation, immutable Query objects), but also Lucene 5 (like configurable analyzers, getting rid of Filters, two-phase iterators). I also learned about the Accountable interface, which let objects report their heap usage.

Wikimedia Content API: A Cassandra Use-case. Eric Evans showed how Wikimedia use Cassandra to store the HTML renderings for the MediaWiki visual editor. Eric worked on Cassandra before, and he reflected on the experience of being a user now, and the insights he got from that -- what he would do different now as a developer etc.

Fast Cars, Big Data - How Streaming Can Help Formula 1. Ted Dunning’s presentations are always entertaining. They built a showcase architecture to get high-throughput event logs from racing cars. To be able to play around, they get the data from a simulated environment. Looks like Ted continues to have fun at MapR ;-). Compared to past talks, I could not take way so much useful information.

Learning to Rank: where search meets machine learning was nice introduction to the topic by Andrew Clegg from Etsy, using the RankSVM approach as a running example. Andrew recommended a monography on the topic. I also learned about another simple trick to get rid of the position bias (a kind of presentation bias) when using click data for learning: encode the document position as a feature, train the model, and set the feature to zero at prediction time. This way, a (linear) model should roughly learn that bias, to be able to ignore it later. I wonder whether someone has practical experience with this trick. The “standard” way (as far as I am aware) is to only use pairs for learning where the not-clicked document was presented above the first clicked one.

BM25 demystified. Britta Weber explained why BM25, which replaces TF/IDF as the default retrieval model in Lucene 6, is called a probabilistic model. Details can be found in this paper. A nice reminder for me to have a closer look at this stuff again.

staiger

Keine Lösungen, keine Alternative, nur ein bisschen politisch inkorrekte Meckerei: Die AfD

Keine Lösungen, keine Alternative nur ein bisschen politisch inkorrekte Meckerei und immer für eine Provokation gut: Die AfD

mein Freund Bushido scheißt drauf und will sie trotzdem wählen. Einfach so.

Kann man machen. Muss man aber nicht:

Ein kleiner Brief, warum die sogenannte Protestpartei noch nicht mal als Protest funktioniert so systemkonform, wie sie ist.

(Plus eine kleine Anmerkung zum Gauland-FAS Streit: Dass viele Deutsche einen Boateng gerne im Fernsehen sehen, aber nicht gerne neben einem Boateng leben wollen, ist vielleicht eine Tatsache auf die Herr Gauland zurecht hingewiesen hat. Die Frage ist, was daraus folgt? Arbeitet man daran, dass die Leute mit mehr Boatengs zurecht kommen oder daran, dass weniger Boatengs nebenan einziehen? Im Falle der AfD habe ich den Verdacht, dass sie hart an der zweiten Frage arbeiten.)

(via Song: Erdowie, Erdowo, Erdogan | extra 3 | NDR - YouTube)

... of DPLL fame, among many other things ...

Trending Blogs

Recently Viewed Blogs

Zeno's Notes