Technology used in backend logic #247
AccessViolation95
started this conversation in
General
Replies: 1 comment
-
This sounds like a cool and challenging idea. I've not explored malware detection, but the use of Fuzzy hashing and Bloom filters might be useful for this.
Some bits of what goes on in the backend: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there!
I'm a huge fan of your work (I'm at least in part responsible for the hundreds of visits from Tor Browser, I love playing around with settings and seeing how they affect the detection), and I'm fascinated by your backend logic. In particular with regards to automatic grouping of similar fingerprints and being able to tell which properties are and aren't relevant for different fingerprints.
I'm working on a project with the goal of rapidly categorizing strains of malware and deciding whether newly uploaded files are likely to belong to a certain malware strain. The goal is that I can dump 100 different executable files of the same malware strain or family in the service, and it will parse the files intro structured data containing many attributes, and create a single fingerprint which would represent that malware strain. New samples of the same malware would match that fingerprint. Ideally the grouping of submitted files would be largely automatic, and I would just need to define certain created fingerprints as being a certain malware family.
Another side project of mine that these techniques could be useful for is my network location service (an alternative to GPS that uses nearby emitters like Bluetooth and wifi devices), where the task is detecting whether wifi routers are likely mobile or stationary before observing evidence that they've moved, based on previously collected data and the attributes in the announce packets broadcast by routers.
I expect there to be some overlap between what I'm describing and how your backend works. I know it's closed source, but are there things you can point me to? Research papers, talks, blog posts, or maybe a general overview of the technology used for processing the attributes of submitted data?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions