Friday, January 16, 2009

Data security, privacy, regulatory compliance, auditing, searching.....

I've been researching a little bit about these general areas, because that is where I want to head for my thesis. So this will be a chance for me to clearly write down the road map of what I have gathered so far.

In the past decade, data security requirements in IT field has been toughened by scores of regulations, such as SOX, HIPAA and FISMA and tonnes more - all US regulations, for example SOX applies to all publicly listed companies operating in US. But requirements are generally similar for regulations in other countries. These regulations apply to different industries and enforce different rules, and are almost always technology-free, meaning there are no specific technology implications in implementing any of them. I would divide the underlying implementational aspects into 3 main areas:
1. Tamper-proof data retention (WORM, insider attacks, privacy)
2. Auditable trail of data (
COW, versioning, sanitized audit logs)
3. Searchability of data across versions (robust indexing)
There are a lot of studies already done and being done on these areas, like Write-Once-Read-Many data stores (many of them tape-based, optical media are used also but says it costs more), Ext3COW (a result of a research from Johns Hopkins uni), jump indexing etc etc (there are simply too many of them for me to have read all, yet...). My general focus is the "insider tampering" and auditing, but I'll have to talk to my almost-adviser..

So, thats the overall, simplified picture of my research to date. I am looking into Ext3COW at the moment, hoping to understand its implementation better - which means I have to know more about Linux (its file systems and linux programming in general). This scares me and at the same time, makes me eager because I have been envying those uber geeky people who talk in C/C++ in the land of Unix/Linux (
I googled "C/C++ for java programmer" and someone says I don't wanna go there). I know it will be hard but it is a chance for me to really get into it I suppose. I have been reading Unix books and stuff also, but unfortunately even the simplest OS concepts seems to be out of my grasp. So I am going to audit (that's what people call it here.. basically I'm going to attend the class without enrolling - well I can't because I'm a graduate student) one of undergraduate architecture/systems course that covers the basics. Come to think of it, I did take CS210 Computer Systems paper back home, but we spent 1/3 of the course learning T-code - I think thats what it was called, try google but you won't find what it is. From what I remember, it is a compression algorithm developed by the lecturer who taught the paper... So, anyways, if nothing, I'd have been exposed to Linux at least.

So.. big hopes and dreams for my coming semester - it will be tough with 2 of my own courses + 1 seminar course + researching + auditing a class + 20hr/week TA-ing but I am excited. Hopefully I'll find a nice internship for the summer break too. Fingers crossed ;)

No comments: