header image

Sample Work of AIT580

This is a sample of AIT580 group work. It talks about the security solutions of our big data problem. We try to do a big data project. The goal is to tell New York City drivers which and where next trip will come from, like Uber does. Our Big Data team has vary roles, such as manager, users, analysts, DBA, etc. I take the developer role. Developer is a very important role, involved in many processes, such as data collection, storage, curation, etc. This role is a challenge for me, because I have less technical experience than many other classmates, but the result shows that I am good at this role, and could be a competitive candidate in future. That not because I am a genius, that because I am a quickly learner and a logical thinker.  

 Security

We consider security in two perspectives: data and system. The data we will collect is all from public data sources which is not sensitive, however after the data is analyzed and transformed into information and knowledge, it will be sensitive and will need to be protected against malicious attacks. The analyzing and predicting system should be robust enough to prevent potential attacks, technical or man-made glitches (the system should not be interrupted under these situations). In order to protect our data and system, we will apply following security models and procedures:

  • All data flow and communications will be encrypted and passed through SSL.
  • Gateway, proxy, load balancer enabled between developer/data source/user and our system.
  • Well designed and granular access/permission assigned accounts.
  • Data replications.
  • System health monitoring tools.
  • Timely security patches updates.
  • Comprehensive logging system. The logs will be used for monitoring and analyzing potential risks. The security structure and procedures will be reviewed based on the log analyzing results.

Below is the structure of our security models and structure (click to see):picture7771

 

SSL encrypted communications:

All data flow and communications, no matter external or internal, should be encrypted and passed through SSL. Therefore, our data are free of hijacking during transmission and our system would not be suffering the session hijacking.

Gateway, proxy and load balancer:

Our system will not be exposed to external networks directly. All communications from the outside world should land to our gateway, proxy and load balancer first where the firewall, ACL, and network rules are enforced. Developers and Administrators must ssh(secure shell) to gateway first then ssh from gateway to our system. The data coming from multiple data sources (data APIs, web scrapers, etc.) will be quarantined and validated in gateway first and then put into AWS Firehose in our system. Users’ requests will be load balanced, malicious requests such as the Dos attack should be detected and filtered by Firewall.

Permission control:

The permissions for each accounts should be designed carefully. In addition, it should be reviewed periodically. Root permission is highly sensitive and the number of accounts with root permission should be tightly restricted and those accounts should be protected with highest priority. Other accounts should be grouped based on their roles. The commands allowed to be executed in our system by each account or account group should be whitelisted. Furthermore, the maximum computing resources allowed for an account should be limited. Thus, no matter if there is a malicious attack or just carelessness, no one can exhaust the system computing resources and block all others’ work by submitting a big job.

Data replication:

Instead of data failover and replica provided by AWS under the hood, we also conduct data replication by synchronizing and copying the data across multiple S3 repositories. This will ensure that our system is strong enough to deal with data loss issues.

System monitoring:

We need monitoring tools such as the AWS CloudWatch to monitor the system health. Any possible health issue can be detected and handled then.

Logging, risk analysis, ongoing adjust/improve:

We need to log all administrator’s and developer’s actions, analyzing jobs, user requests, and data ingestions. The security logs can be used to analyze potential risks and help perform intrusion and penetration testing. The whole security structure should be reviewed and adjusted based on analysis and test results.