Discover Top Posts Tagged with #hashing

look, its the gang! the animals! the gays! the genders! the constructs of beings! @ayviedoesthings

#art #digital art #therian #bunnies #dragon #dragon art #fanart #bunny #cats #artists on tumblr #hashing #sketches #my art #doodle #drawing

I'm afraid I've been thinking.

A dangerous pastime, I know.

In Animal Crossing, items are identified by a 16-bit code. -1 means no item, and (at least in New Horizons) -2 on the map means it's part of a larger item.

These values are sequential. The axe, the regular one, is item #2100 for example, and this is the value stored in your save game.

Being not meant to be modded, this works fine. But then you get intricate projects like Jubbsticks and suddenly the door is opened for conflicts. Jubbsticks is built on version 2.0, and no more official content is likely to ever be released, so the item IDs are effectively frozen on that end and more can be assigned in mods.

But what if you have more than one mod that adds new items? Aside from the whole "having a single huge CSV table for almost all item data" thing making merging nearly impossible were it not for this thing, you now have the risk of one mod's item IDs conflicting with another. And that's not something the mod manager can resolve.

(Fun fact: Starbound used to have the same problem with species, but that requirement was lifted. Now, it has the same problem with materials, so the Starbounder Wiki has a page where you can register ranges of material IDs for your projects in an attempt to prevent conflicts.)

With that in mind, since Project Special K is specifically meant to be mod-friendly in almost every way Animal Crossing isn't, its save data doesn't use these 16-bit ID numbers. Placed furniture and dropped items are in fact not saved numerically at all, but as ID strings. That is, if you drop or store an axe, it's a line in a JSON file that says "acnh:axe" or whatever.

ID numbers are still used, though. Because comparing numbers is faster than comparing strings, when the game loads it hashes all these identifiers like "acnh:axe" and stores the result alongside it for quick lookup. Then when asked to load an "acnh:axe/14", it takes that first part, hashes it, and uses that to quickly look through what could be hundreds of items. Finding the axe, it can now instantiate one with a wear level of 14.

Right now, those hashes are CRC32. Half because it's simple, half because I already have Minizip involved in my asset handler, and half because ACNH uses CRC32 to obfuscate and parse string values in those CSV files.

I considered at some point to store dropped and placed stuff the way ACNH does, so just for fun I did a little test: I'd take the entire corpus of ACNH items and use CRC16 instead. Then I'd look for collisions.

The axe became 0x7CFD, which is the same as Maddie's photo. There are 252 such pairs out of 5207 items total, not even counting clothing.

With CRC32, there were none.

But I did just say "not even counting clothing" just now, didn't I? Let me just run that check again with the clothing included...

░▒▓█ Half an hour later... █▓▒░

Out of 6785 items, including tools, furniture, and clothing, I found one CRC32 hash collision. You can't even call it that really: the tank object has the same ID as the tank clothing.

This is what I have my handy dandy cheat file for. Not just for injecting romantic/sexual preferences and gender identities into villager data.

Fixing this one little thing aside...

What do you suppose are the chances of some potential future mod causing a CRC32 collision? Is it worth the effort to use some other algorithm?

CRC16 may be out, but surely something like simply adding up all the character values would be so much worse.

#project special k #hashing

Those stairs? They've always been there

#mbq art #art #ink #markerart #drawing #markers #ink markers #ink en #hashing #horror #nightmare #traditional art

Huge post about perceptual hashing

There's this service called StopNCII, NCII being "Non-Consentual Intimate Images," which is a more "professional" way of saying revenge porn. Anyone anywhere in the world can submit intimate (sexual, nude, etc.) images depicting themself and the service provides a secure way for other platforms to determine if that image is being shared, so that they can prevent the transmission. Spoiler alert but the actual images never reach the StopNCII servers, they use secure image-free methods to detect images.

There's some justifiable skepticism about this service. You should be skeptical about services like this. Personally I think this service is likely to work for the people's best interest, and almost certainly takes actual security measures to prevent exfiltration of intimate images. I'll go over the technology used and its potential issues, but if you believe you are likely to have your (real or fake/altered) intimate images exposed on the internet, you should use this tool, and refrain from saying that you did to anyone you know. The second bit is explained at the very end of this post.

StopNCII.org is operated by the Revenge Porn Helpline which is part of SWGfL, a charity that believes that all should benefit from technolog

First off, of course, you cannot remove images off the entire internet, in the same way you can't remove printed photographs off the face of the planet. However, the big social media services (Facebook, Instagram, Threads, Twitter/X, TikTok, Reddit, Google, Bing, and various adult platforms) all participate, meaning that if the image is shared on, for instance, Facebook, it will be detected and fail to post/send, potentially leading to the user being suspended if Facebook staff confirms the violation.

Images aren't shared to the service

A core part of how these sorts of services work is that they do not store actual images. This is for a number of reasons, one being that images are large and comparing images is expensive, but the most important for this usecase is privacy. A service which aims to prevent the distribution of illegal sexual material (like revenge porn) ought to not be in the position to itself distribute the material (like through data leaks).

Instead, any images "submitted" to the service are instead hashed on your personal device, then that hash is uploaded to the service.

Perceptual hashing

A hash is an algorithm which takes in data and produces a (very long) number which represents the data. The most important features of a hash are that the number is always at most a specific number of digits (usually measured in bits) and a given input will always produce the same output. If you hash a screenshot of this page you will get some number, then if you hash the same screenshot again, the same number will be spit out. If you hashed a different image, the number would be different.

There are three main types of hashes but for this explanation only one matters: the perceptual hash. Perceptual hashes can be thought of as a distilled representation of an image, capturing the core details of an image which distinguish it from other images. Perceptual hashes work in such a way that the numbers they spit out will be similar for similar images. Below is an image and its hash, as well as modifications of the image and their hashes. The hashes are computed using the same algorithm used by StopNCII (called PDQ).

Fun fact: these sorts of hash algorithms are used in image searching.

The way perceptual hashes are calculated usually involves several steps, like scaling and color conversion. Here is the process for PDQ:

As the image states, the image is scaled down to 64x64 pixels in the middle of the process.

Images courtesy https://github.com/darwinium-com/pdqhash

Upsides and downsides of perceptual hashing

The upside of this scheme is that it's possible to identify and flag potentially abusive images without ever distributing them between services. The user can hash their image, send the hash to the StopNCII service, then participating platforms can hash images sent on their services and check the database of hashes (not images) without anyone having ever exchanged an actual image between platforms.

There are a few downsides though. One is that perceptual hashing is not fallible, it is possible to fool it by extreme cropping or editing, sophisticated noise methods (called evasion in the literature), or by encryption. Good end-to-end security employing encryption in general makes these sorts of systems infeasible and ineffective. End-to-end security is only really viable for small invite-only groups (like DMs) though, so this scheme is still viable for public sharing, as well as unsecured DMs (like those on the partnered platforms).

I think most would expect that sort of limitation, what people might be more concerned with is if the hashes could somehow be reverse engineered back into the original image. This process is called inversion in the context of perceptual hashes (and preimage attacks for cryptographic hashes).

Reverse engineering perceptual hashes

Most perceptual image hashes actually do store some sort of image data. For PDQ, they store the low frequency data of a scaled down 64x64 version of the input image. If we take the hash bits, reverse their ordering, then arrange them in a 16x16 grid in the upper left corner of a blank image, we can use the inverse discrete cosine transform to convert the hash back into an image:

Truth be told, I don't even know if I actually did this process correctly, hence why there are two reversed images. One of them is correct (I'm pretty sure), but it's really impossible to tell because the reversed hashes are unrecognizable even next to the original image. While yes, there is some actual image data in a PDQ hash, it's totally unrecognizable.

This is a pretty general property of perceptual hashes. These hashes have to be tiny, PDQ hashes are 256 bits long. Any image that the PDQ algorithm receives gets smashed down into just 32 bytes (the same amount of data in 8 raw RGBA pixels), so a lot of data gets thrown out.

Because of this, the best perceptual hash inversion methods use machine learning techniques, and an example of an inversion attack against PhotoDNA (which is similar to PDQ but a different algorithm) can be seen here:

This is from a 2024 paper (pre-print) which investigates evasion and inversion attacks against perceptual hashing, and proposes some solutions. The leftmost image is the original, the second leftmost is the reversal of the hash. As you can likely see, the reversed image is extremely blurry and almost unintelligible, certainly unrecognizable. The images to the right are reconstructed from perturbed hashes, which are simply hashes with some of the digits messed around. The reversed perturbed hashes are even blurrier.

That still seems pretty spooky, right? Couldn't StopNCII just train a massive reverser and get my images? Well, that figure above is the absolute best case scenario for machine learning assisted reversing. The researchers could only get meaningful (still very blurry) images when they used extremely specific datasets; the reverser for that figure is only trained on celebrity head shots. As soon as they tried to train the reverser on more varied images, it failed and couldn't produce anything sensible. Furthermore, the reverser couldn't successfully reverse hashes from PDQ (the algorithm used by StopNCII), just PhotoDNA. From the paper:

In general, hash-inversion failed over STL-10 [varied image dataset] while some level of success could be seen in MNIST [hand drawn numbers] and CelebA [celebrity faces] over PhotoDNA hash only. The reason is that the MNIST and CelebA images have similar or regular formation. The reconstructed images tend to converge to such common formation when the hash is long enough. However, even in this case, the reconstructed CelebA images could not be used to recognize the original face.

This means that a hypothetical inversion attack for StopNCII hashes would have to consist of many specially trained hash reversal models: one specifically for close-ups of breasts, one specifically for high-angle nude selfies, one specifically for skimpy cowboy outfits, ... Assuming this was done, the results of the reversal would still be very blurry and unidentifiable, with each model likely producing its own distinct "reasonable" reversal for the same hash, because remember: the process of generating the hash rescales the image to a minuscule 64x64 pixels, then throws even more data away. Even if there were some super dataset and some super inversion model, it could only ever reliably manage to recover at most a very blurry black and white 64x64 image. Any additional "data" is pure guesswork.

What if they're lying about hashing?

StopNCII could feasibly lie about hashing the images and simply store them instead. This is an extremely bad idea for them to do, though. Many jurisdictions (including the UK, where it is based) have laws about disseminating the exact sort of content StopNCII exists to thwart. If StopNCII transmitted and stored actual images they would be liable to heavy fines, and imprisonment for the operators of the service.

Of course, liars exist everywhere, but this would be something which would be very easy to verify by anyone. Simply submit an image while the browser console's network tab is open. If a large transfer the size of the image you submitted is sent to the service, it means that they are lying about hashing. If not, well then they were telling the truth.

You can also read the Javascript code of the service, which your browser uses to perform the hashing. I have not reverse-engineered the code to definitively say whether the hashing is done or not, but the code includes variables and functions relating to "pdq" and hashing, which we would expect if it were hashing the images.

A note about partner platforms

A key component of this system and other perceptual hashing or automated content detection systems is the platform. Platforms are the ones who actually enforce the deletion, suspension, etc. processes.

In the case of StopNCII I personally don't have much concern over how the platform side is operated since platforms can be quite conservative with regards to punishing users who send images which appear to match hashes. Matching images can simply be automatically deleted, with repeat apparent violations causing an account to be put under monitoring.

This sort of perceptual hashing system also exists for detecting child sexual abuse material (CSAM) and with that there's a much more delicate balance that platforms have to take. Platforms don't want to store that content even for review since it is highly illegal, extremely so, the world over. Automated suspensions are a common way platforms handle these cases. Since hashes are liable to collisions, which is when two different things hash to the same number, fully automated suspensions can (and have) caused innocent users to be suspended. This has been a problem on, for instance, Discord.

In the case of Facebook with StopNCII, they explicitly say that they do not deliver automated suspensions when revenge porn is detected (source). Though of course take the article with a grain of salt, as it says that "machine learning" is used in the process (PDQ is an entirely conventional algorithm) and Facebook is not the most trustworthy entity out there. Still, I doubt they would outright suspend a user if they send an image which matches a hash in the StopNCII database, they likely just delete the image and escalate if it keeps happening. The article says that a human will review each match and manually delete it, but I doubt that. Automated silent deletion escalating to manual review seems more likely, at least to me.

A limitation (for videos only)

This post has so far covered exclusively the image portion of StopNCII, but the service also works for videos. How do they implement support for videos?

There are perceptual hash schemes for videos, and they could likely have used the technology, but they have instead used MD5, which is a cryptographic hash (though insecure). Perceptual hashes aim to have similar inputs produce similar outputs, whereas cryptographic hashes aim to have similar inputs produce vastly different outputs.

Cryptographic hashes are useful if you want to ensure that nobody can ever guess what was hashed to produce the identifying number (like for password verification), but they aren't the best for detecting abuse like this. This is because a single bit being changed in the file will completely change the hash. If the video is downscaled, has its audio altered, anything like that, the hash will be completely different and the video will not be detected.

Videos are pretty expensive to deal with, with regards to modification. Really, videos are expensive to do anything with, I think that's a major reason why StopNCII doesn't use video perceptual hashes, it would take a long time to compute them on user devices. Images are comparatively cheap to compress, alter, hash, whatever, so platforms will often downscale and recompress images, but videos will usually be unchanged. So while the MD5 solution isn't the best, it isn't as bad as one might think. It should still detect a large portion of offending videos.

So should I use this service?

Because it is technically infeasible to reverse a hashed image into its original form, and it would be legally incompetent to not hash the images submitted as the site says, I'm confident that not only does the service hash images on your device before submitting, but also that they will never be able to actually invert the hashes back into the original images.

All in all, if you ever suspect that someone may nonconsensually share intimate images of you (real or fake/altered) on public social media platforms, in DMs on those platforms, or on any of the other StopNCII partnered platforms, you should use the service. The service is free for anyone in the world to use.

While someone could still share your images on other platforms, or heavily modify the images to circumvent the system, by using the service you would make it much more difficult for people to share your intimate images without your consent since unedited (and most forms of edited) images would still trigger the system and prevent sharing. I am not sure how exactly platforms implement support, but repeated violations likely flag accounts for review, which may lead to suspension for those illegally sharing your images.

To entirely avoid people circumventing the system, don't tell people you use it. Submit your images and tell no one. That way, if someone does try to share your images, they won't even know there's a system to circumvent.

Kuebiko Folks

Kuebiko - a state of exhaustion inspired by acts of senseless violence, which force you to revise your image of what can happen in this world

#lukekoiwai #artists on tumblr #my art #art #character design #oc #oc art #original character #digital art #fish girl #rose tinted glasses #flower pattern #fish #cigarmen #hashing #monochrome #ocs #my ocs

See?

#yano kun no futsuu no hibi #mr. yano's ordinary days #tsuyoshi yano #hashing

fox, furze, flowers, flour

brush-tails rush trails

#fox #furze #gorse #flour #flower #running #hashing #commission