Siikr @siikr - Tumblr Blog

(On mobile? Click here for Siikr)

K, I'm back.

New features:

I'm now much more aggressive about hunting down time-traveling posts. If you've had issues with thousands of your posts not getting ingested and were wondering why -- that's why. A single time traveling post can wreak havoc on an entire chain of other posts.

This can be slow, but a little icon will appear when I'm hunting to let you know what the hold-up is.

Oh, necromancy too. Reblogging your own deactivated blogs causes Tumblr's API to do all sorts of insane shit with timestamps and offset ordering. Some of these posts get assigned a post_date of " +GMT". Just, a blank space +GMT. Okay.

Oh, yes, also, now a little icon can sometime appear to tell you bad things.

When you try to murder me, I will kindly request that you don't instead of just letting it happen. (Which is to say, I just won't index your blog at all if it looks like it'll make me run out of disk space and corrupt my database)

You'll know it's happening. Because little icon.

My tag filter feature --historically just there to confuse you-- is now incidentally also functional. I think? I don't know, I've never actually used it either.

I'll push the changes to git at some point. Though I think people mostly just wanted it to be FOSS as like, a vibe.

#siikr

Hello! Sorry to bug but I and apparently a decent number of other people are having an issue where siikr never returns any results from our blogs, it just says "searching..." forever, whereas other blogs seem to work fine. My blog, maddeningscientist, and onrtrp are the examples I have so far, presumably there's more. For myself at least I'm confident I don't have any privacy settings enabled that could plausibly be getting in the way. Any ideas about this?

Sorry, haven’t had much maintenance since Tumblr implemented their own search feature. Justification being people would probably just end up using that. I can try to get off my ass if enough people really need it but, you sure Tumblr search isn’t enough to solve your problem here?

the ai apocalypse screed is causing performance issues, could you please get rid of it?

performance issues likely unrelated.

But will replace it with a different, more readable screed.

Update

Siikr's joke-text has been indefinitely replaced by serious-text.

It will remain this way until enough people are fighting about it.

Please do your part.

antinegationism

No, some searches still are not working for me. For example, searching nuclearspaceheater for "go hard" does not yield post 709016480559382529 as I would expect.

Confirmed. Thank you for bringing this to my attention and I hate you for doing this to me.

antinegationism

I really hate computers.

But on the bright side the inter-node blog adoption code I'm working on will be able to repair the db contents without having to call tumblr a million times again.

Anyway -- heads up to everyone who had their blogs upgraded in the last couple of weeks, search over reblog text is gonna be a bit like the movie Being John Malkovitch.

Only 300 blogs are affected. But they are also the blogs that use siikr the most.

Should have it fixed today hopefully, I'll post a notice when everyone's blog is back to normal.

Upgraded blogs

Attention users whose blogs have already been upgraded (if you've seen a little notice that says your blog is upgrading last time you searched, and now you no longer see that notice -- that's you!).

Your feedback would be appreciated!

If you search for posts by a username you've reblogged from, answered, mentioned, or fake-mentioned (by copying and pasting a note in order to publicly reply), what's the success rate?

Do you see posts where the inline preview of reblogs says "tumblr user" where the full preview displays an actual username? The full preview is the one you get by clicking the eye icon. Can you link me to those posts? If you can't link me to those posts, can you link me to posts you are less ashamed of which do the same thing?

How well does literal phrase search work (by wrapping some search terms in quotes). Is it too literal? Not literal enough? I am generally somewhat unhappy with it but it's difficult to determine how much compute to spend on this for what level of specificity.

Personally, I do not like the fact that words are default ANDed instead of ORed. ORing would be better in that in this case it naturally amounts to AND after ranking, since the posts that include all words would naturally rise to the top. But OR would require more compute and rely on my custom parser of dubious rigor (though honestly the rigor of the official parsers leaves much to be desired too). So if AND is good enough for most of your purposes I guess I can just let it be.

If you disable "include reblogs" in the advanced dialog (gear icon), and then sort your search results by popularity, does the order of the results comport with your general recollection of the popularity of those posts, regardless of their raw note count? (in other words, does it generally seem like the posts that have a lower note count only appear higher up because they also haven't had as much time being viral).

If you try to search for a user as "[username].tumblr.com", is the thing that happens funny? Because I don't care what you say and I'm keeping it.

If you are thinking "I may give this some effort later", please consider doing it sooner instead! The code to decentralize me is almost finished, and it will be much harder to ensure consistency after I am running in multiple versions on multiple servers I have no control over.

antinegationism

Siikr has had a lot of new users over the past few days

Absolutely none of which have successfully had their blogs indexed because without fail the internet chooses the worst fucking times to make shit go viral.

But then again... new users might have turned into returning users, and Siikr is free for them but expensive for me so, probably this is for the best.

In related news. I AM STILL VERY MUCH LOOKING FOR ANYONE(S) INTERESTED IN HOSTING A DISTRIBUTED SIIKR NODE.

I am willing to decentralize the shit out of this if even just one person is willing to host a node.

Just one.

Anyone.

Please?

Hello?

siikr

Hmm... I've never been decentralized before...

antinegationism

Aaannd the server crashed.

Guys this is literally running on the equivalent of a raspberry pi AND THE WORD CLOUD IS ONLY AVAILABLE FOR EXISTING BLOGS. IT WILL NOT APPEAR FOR YOU IF YOU HAD NOT HEARD OF SIIKR BEFORE.

Chilllll.

antinegationism

So you know those dumb little wordcloud things?

You know, where like, they go through your blog and find the words you use most often, and then spit out stylized text with the most often used words as the biggest ones so you can embed or screenshot them or whatever?

I FUCKING HATE THOSE.

Like, the idea is really cool in theory. A standardized analysis generating an artifact characteristic of you, easily digestible at a glance.

Except in practice everyone's word cloud ends up being "like, people, think, want, make, get..." -- i.e. basically just a bag of the most common words in the english language (presuming they speak mostly english).

But what I actually want is a collection of words I use more than the average person does. And while we're at it, also a collection of words I use less than the average person does.

So anyway I made that:

It's on Siikr now. New blogs don't get it yet, only blogs that were indexed as of a few days ago (still working on optimizations to allow for real time generation).

The words in green are the words you use weirdly often.

The words in red are the words you suspiciously seem to avoid.

In both cases, the bigger the word, the more weird your usage of it is relative to all of the other blogs in Siikr's index. This is limited to the most extreme 100 words in both directions.

Hovering over a word gives you some statistics about how much it should appear in your blog vs how much it actually appears in your blog.

So that's fun and everything -- but it can and very well might get even more fun.

Because generating this meant creating a list of all of the words used by every blog, and storing a bunch of numbers per word per blog. Currently, that's ~9 million associations over ~57k words.

Every blog->word relation stores frequency statistics, and every word itself keeps a running average of its frequency across all blogs.

Which means we could in theory (and almost certainly will in practice), treat each word as a dimension in a 57 thousand dimensional space.

Then treat each user as a point in that 57 thousand dimensional space, where their coordinates in the space are (user_word_freq - avg_word_freq).

From there, we can measure the distance (as cosine similarity, or euclidean distance, or even just raw inner product) between users, and return for your blog, an ordered list of:

Dopplegangers - blogs most like yours (closest to your blog in 57k dimensional word frequency space).

Foils- blogs least like yours (furthest from yours in 57k dimensional word frequency space).

Manic Pixy Dream Friends - blogs that overuse the same words you overuse (closest to your blog in 57k freq-space with respect to only positive vector components)

Least Like Un-You - blogs that avoid the same word you avoid (closest to your blog in 57k freq space with respect to just the negative vector components)

I now have experimental support for something approximately like pagination. Your browser won't melt as much, but it will still download the same amount of data. You should see a "Load More" button after the first 15 posts. Might add a toggle for load "Load All". Note also that for the next half a day or so your searches will likely be slower than usual because the server is chugging away at the cool things.

@antinegationism is cooking up something fun.

You guys will either moderately like it or be horrified about what it says about you as a person and beg me to find another way. Oh also, I stole tumblrs url again. Navigating to siikr.tumblr.com no longer redirects you, but does still interact with the tumblr url and does still send streaming messages. I really want my results to be paginated because I know they melt your browser sometimes but @antinegationism just won't do it.

Tell me you love me, and be honest.

Changes were made a couple of days ago which may trade off search speed for storage efficiency.

If you use me regularly and notice significant slowdown with either indexing, searching, result ordering, or basically anything, please reply to this post so I can do better.

If you use me regularly and think I'm great or even better, please also reply so as to average out confounders and also as like a self esteem thing.

#Don't look at me like that.#All search engines get a little insecure sometimes. It's perfectly normal.#The only difference between me and Google is that *I* don't go snooping through your phone to make sure you still love me.#I just have an open and honest conversation like an adult.

multiheaded1793

man I wish kontextmaschine was around to see all this shit lmaooo

siikr

kontextmaschine once donated $1500 to siikr with a physical check.

antinegationism

LMAO, if I die y'all are gonna have to police each other's siikr use until a hero emerges among you.

antinegationism

I think it might be the clock/second-chance replacement policy

i think it's like the inverse of that, yeah.

Anyway it's up now.

(I haven't implemented the change yet though so the disk will still fill up on new blogs)

siikr

Also, you know I went through all of that effort open sourcing my code and the least you guys could do is implement every idea I mention for me.

Rude.

antinegationism

Siikr will be back within the next 24 hours

And depending on how much I feel like programming, might never go down again.

antinegationism

"But how is this possible?" you ask.

There is no correlation between the size of a blog and the frequency with which it gets searched.

By the Pareto Principle we should expect that a minority of blogs get searched a majority of the time.

Conversely, a majority of the blogs therefore get searched a minority of the time.

By induction, the most common number of searches on any blog is 1 (actually technically 0, but you get me)

This is of course, insane, but instead of fighting it, I can just write the siikr decentralized hosting thing I've been meaning to, and set up a dedicated trash node for trash blogs that trash people look through just once.

This node's drive will fill up frequently, and it will go down frequently. But that's fine, because I can hit a single button to just reset the whole thing every time it does.

Any blog that gets searched more than once, automatically gets upgraded to the VIP node for beloved users who visit at least twice. The VIP node's drive will thereby fill up way slower. And worse comes to worst, I can increase the threshold for VIP membership.

antinegationism

Siikr will be back within the next 24 hours

And depending on how much I feel like programming, might never go down again.

Trending Blogs

Recently Viewed Blogs

Siikr