A depressingly true adage in the security industry goes something like this: “Security is an afterthought for every new IT fad.” It couldn’t be more true for Big Data. In fact we wrote about this in our Security Trends 2012 Report (it’s Trend #5: NoSQL = No Security?). Only a few of the leading platforms have any options for built-in security and third party solutions for securing Big Data deployments are scarce, most of them focusing on masking or encrypting the data going in. (I don’t want to give too much away, but Imperva is on track to start changing this soon).
Now, I don’t have anything against encryption and masking, but I don’t understand why anyone believes that’s the right starting point for Big Data security. In the regular data world (i.e. RDBMS and old school data warehousing), deploying database encryption is still pretty rare. There are two reasons for this. First , performance and application functionality are very hard to preserve when data is encrypted. Second, encryption is best suited for a simpler access control use case than what’s needed in a database application. In other words, it’s good for binary access to a file, but not great for contextualized access to the kind of complex data you have in a database. As a result,we see a lot more use of disk / file encryption underneath the database to prevent the bulk theft of data via the underlying DB files themselves. Imperva’s partner Vormetric has quite a few successful such deployments that work in conjunction with SecureSphere.
Think about this in the case of big data: the fundamental concept is:
1) Very very large data sets (i.e. bigger than RDBMS can practically handle)
2) Distributed processing of jobs (making key management even more complicated)
3) Complex processing of the data itself (meaning you’d have to unencrypt the massive data set to run the already performance intensive job)
It doesn’t make for a smooth sailing in terms of real-world big data encryption deployments. Instead, like the regular data world, I’d expect to see more focus on monitoring and access controls vs. encryption and masking. The reason is that you can monitor and control without digging into the application or changing the data, so performance and stability are much easier to preserve.
But back to the point, many industry professionals bemoan this security-is-an-afterthought state of affairs. I don’t. It’s natural that security comes after the primary use cases for any new technology. By being external to the underlying technology, security can and should cross the domain. In the regular data world, for example, a very important requirement is the ability to have consistent policy across different RDBMS systems…a requirement that fundamentally can’t be met by the built in audit capability of a given RDBMS.