Million Browser Botnet
Jeremiah Grossman and Matt Johansen from WhiteHat Security do not have good news for us. Attendees were reminded that when you visit a web page (yes, this one too), you are essentially granting total control over your browser to the operator of the site. Last year, they took us through an alphabet soup of rotten stuff that can happen to Aunt Susie when she browses to a page: CSRF, XSS, clickjacking (more on that below) and on and on. Today’s session was all about taking these attacks to the next level, and on a distributed scale. What would it cost to borrow the combined computing and network power of, say, a million web browsers?
Without hacking. As the presenters pointed out, “The web is supposed to work this way.” Bummer.
End-To-End Analysis of a Domain Generating Algorithm Malware Family
Jason Geffner from CrowdStrike took us through an analysis of a piece of malware that has its origins in a money-mule campaign that’s at least five years old. Geffner shared some interesting details of a new-ish obfuscation technique that relies not on obscure packers and hard-to-find decryption stubs, but on nonsense code interspersed in-line. The challenge with this approach is that when one disassembles and decompiles the code, one finds a bunch of extra junk making it very difficult to understand the nature and behavior of the malware. A Hex-Rays plug-in called “CrowdDetox” was released to the community that can help identify and delete this mush from reversed code.
Then Geffner turned to the Domain Generating Algorithm (DGA) used by this particular specimen. The whole DGA idea stems from the premise if the good guys can discover the IP addresses or domain names associated with a malware campaign, it’s not complicated to get those assets shut down or seized. If I’m an attacker, I need a way for my malware to “phone home” to a server I control that will be unknown to defenders. DGA’s solve this problem. For example, take the words in this paragraph, choose two at random, add “.net” to the end, and that’s a randomly generated domain name: behaviormoney.net, seizedcommunity.net, and so on. Add to this algorithm a series of time windows for using each domain, use the algorithm to select and then register specific domains a day or two before the attack, and presto: a shifting target for defenders. This is a similar approach to the one taken by the malware Geffner described at Black Hat.
Clickjacking Revisited: A Perceptual View of UI Security
Where do hacking and neuroscience intersect? With Devdatta Akhawe, a grad student at UC Berkeley. Dev showed proof-of-concept demonstrations of five methods for attacking human perception to trick them into clicking something against their will: say, a “pay now” or a “like us on Facebook” button.
The proposed techniques have interesting names like “destabilizing pointer perception,” “attacking peripheral vision, “motor adaptation,” “fast motion mislocation,” and “visual cues & click timing.” The takeaway for attackers is it’s not too hard to manipulate the human brain into doing stuff it shouldn’t. The takeaway for defenders is that we need our UI designers to take the nuances of human perception (and our brains’ vulnerabilities) into account when designing products. Dev’s site may someday be updated with more on these techniques.
BinaryPig – Scalable Malware Analytics in Hadoop
The guys at Endgame have, like the other players in the defensive community, assembled a vast corpus of malware. By the way, when Zachary Hanif used the throwaway phrase “an embarrassment of malware data,” I felt the need to tell the world if it’s a troop of monkeys and a murder of ravens, it’s definitely got to be an embarrassment of malware samples. No doubt. Where was I? Oh yeah: a vast embarrassment indeed. In Endgame’s case, we’re talking about twenty million malware samples, occupying 10 TB of disk space.
How does one analyze this stuff and find patterns and attributes that will help identify and defend against future artifacts? How does one scale this activity and mine this corpus for useful tidbits? It’s a hard problem. The first approach one might take is to put it all on a file server, run scripted analysis tasks against it and store the results in a relational database. The problem is that the file server isn’t reliable, the tasks take too long (and are ponderous to repeat when you have new analysis ideas) and the data generated isn’t structured enough to submit to a rigorous database schema. You end up with a giant ugly mess.
So the Endgame guys came up with an approach built on HDFS, Hadoop and Apache Pig, patched in with a modular set of analysis and query tools one can extend on one’s own. They call it BinaryPig, and they’ve released it to the community. It’s no panacea and it won’t turn any Joe with a directory full of malware into an elite analyst, but it presents a framework for scaling a wide variety of analysis techniques that’s quite promising. If they can attract community interest, they may be onto something.
The Factoring Dead:Preparing for the Cryptopocalypse
No big deal or anything, but it’s just that the ENTIRE GLOBAL ECONOMY is hanging from the thread represented by a couple of obscure mathematical oddities.
Presenters Alex Stamos, Tom Ritter, Thomas Ptacek and Javed Samuel started by outlining the current state of cryptography and trust on the Internet. It really all boils down to RSA and Diffie-Hellman (DH). The former is an asymmetric encryption algorithm (asymmetric means it involves private keys and public keys: asymmetric methods are almost always used just for securely communicating symmetric keys, because symmetric encryption is less computationally expensive than asymmetric encryption), and the latter is a method for two parties to securely exchange their keys even when a third party might be eavesdropping on them. The importance of these 35-year old cryptographic standards to the operation of the Internet and the conduct of commerce cannot be overstated.
RSA and DH are based on what are known as trapdoor functions. A trapdoor function is a mathematical transformation that is difficult or impossible to reverse. It turns out the multiplication of two large prime numbers is such a transformation. If I gave you the number 323 and asked you to produce its factors, you’d have to do some trial and error before eventually arriving at its prime factors 17 and 19. Suppose I asked you to factor 3785934577? If 3785934577 is the product of two large prime numbers (it’s not, I just randomly mashed my keyboard just now), there is no known efficient way to find those factors. The RSA algorithm is built upon this presumed fact. DH is based on a similar, yet more mathematically challenging trapdoor called discrete logarithms. Quoting Wikipedia, “if g and h are elements of a finite cyclic group G then a solution x of the equation g^x = h is called a discrete logarithm to the base g of h in the group G.” Understanding the math isn’t required: the point is if you have values g and h, it’s "probably but not provably" very difficult to compute x. If it were easy, DH wouldn’t work and trust on the Internet would be, in Stamos' words, toast.
Academics are working hard on these two problems, and have made more progress in the past six months than in the past 35 years. The presenters are gravely concerned that a) the two problems are related in ways they may be solved, and b) a solution to one or both may be found within years rather than generations. If (or when) this happens, it will take mere hours for the encryption systems built on RSA and DH (by which I mean nearly all of them) to be exploited on a wide scale, resulting in unbelievable chaos.
It’s not the end of the world: there is another class of trapdoor function that is believed to be much more mathematically sound, which has given rise to elliptic curve cryptography (ECC). Here’s wikipedia’s brief summary: “it is assumed that finding the discrete logarithm of a random elliptic curve element with respect to a publicly known base point is infeasible.” Again, it’s not required that one thoroughly grok the forgoing, but the point is that RSA and DH don’t have to be the only game in town: well-established cryptographic algorithms exist based on ECC and have been used for several years to safeguard a relatively narrow range of communications--mainly on Blackberry handhelds and on classified US Government traffic. The presenters provided an overview of these algorithms and suggested we (all of us: developers, consumers, those responsible for the Internet’s infrastructure – everybody) need to work to update our cryptographic ecosystem so the eventual fall of DH and RSA doesn’t have cataclysmic results.