Analytics: Achieving 'Privacy by Design'

  • 20th September 2018
  • Security
  • Antony Heljula

Organisations have recently been putting in considerable effort to comply with the new European data protection regulations (GDPR).  

Whilst the majority of this effort has been spent updating company policies and ensuring both business processes and practices are fully compliant, it is important not to overlook the requirements in relation to privacy by design.

Privacy by design poses a major challenge to companies since IT systems now have to be designed with security in mind. 

Privacy by design is more about technology. It is about making sure your technology is designed to achieve maximum levels of protection against security breaches and data loss, and to minimise the impact of any such incidents, should they occur.

Data Storage

When it comes to protecting data, the obvious place to start is data encryption, since stolen data is essentially worthless if it is 256-bit encrypted.

Encrypting your data is easiest to achieve if you centralise all your storage into a dedicated Storage Area Network (SAN), since most modern SANs will provide built-in “at rest” data encryption. Job done.

Data Scrambling and Masking

As a general rule, data outside of your live environments should be irreversibly scrambled so that all personal data is permanently anonymised. There are a variety of techniques that can be adopted to maintain data integrity and to keep data relatively meaningful even when totally scrambled.

By scrambling non-production data, your developers and testers can be given greater flexibility to do their jobs without any fear of data loss. Theoretically speaking, you don’t need to have such strict controls in place for environments where there is no fear of losing sensitive data.

You should also consider scrambling sensitive data prior to making it available to data scientists that wish to perform data profiling activities such as machine learning.   

The scrambling of data is important even on aggregated data. For example, if a risk report identifies that all team members of a certain ethnicity are high-risk, then it could be easy to identify individuals if that ethnicity is in a minority.

Networking and Communications

The infrastructure behind an analytics application is usually divided up into 3 separate tiers:

  • Web server
  • Application server
  • Database

It is advised to place these components into separate network VLANs with firewall rules in between to restrict traffic. This is so that one tier can only talk to the next tier over the specific ports and protocols (e.g. HTTPS) necessary to allow the application to function, which should exclude command-line access.

By ‘listening’ in on network traffic, hackers could however still gain access to data even if they have not managed to reach the database tier. To protect against this, you should consider configuring SSL (Secure Socket Layer) encryption for all application traffic flowing between VLANs. This is, of course, in addition to enabling HTTPS (Where the ‘S’ relates to SSL) for all traffic flowing between the end user and your web tier.

Continuous Threat Detection

Having a centralised threat detection system is a valuable asset. These systems typically deliver 4 important functions:

  1. Host intrusion detection (HIDS)

An agent runs on each server (or VM) and performs continuous behavioural monitoring.  

  1. Network intrusion detection (NIDS)

Continuous monitoring of communications to identify network threats and breaches. 

  1. Vulnerability scans (penetration tests)

Continuous scanning of all assets on a network to search for security flaws. 

  1. Data loss prevention (DLP)

Applications used for email, collaboration and file sharing are monitored to identify instances where personal or sensitive data are being shared. 

Centralised Auditing and Logging

Oracle Analytics provides standard features for auditing all application access. The ‘usage tracking’ feature will record a history of user activity within the application.

It is important, however, to implement centralised auditing of any access to the back-end operating systems and databases:

  • Who has accessed the OS/database?
  • The commands issued

Linux comes with a built-in auditing service for logging all OS access and commands issued. The Oracle Database also comes with its own auditing facility.   

It is a good idea to store all the OS/database audit logs in a central location, as this will give you improved threat detection capabilities since you can analyse patterns of behaviour across multiple hosts.  A centralised audit store also makes it easier to deliver compliance reports and alerting.

2-factor Authentication

The accidental or deliberate sharing of passwords is ultimately the easiest method via which a security breach can occur. As a first line of defence, 2-factor authentication should be required prior to any network or systems access.  

With 2-factor authentication, the user’s password changes with each access and a separate device (e.g. a mobile phone app) is required to obtain the user’s ‘single use’ password. Only a single device can provide a user’s password and a security PIN has to be manually entered first.

Virtual Desktops

To perform effective support and maintenance, it is obvious that developers, support teams and database administrators are going to need some form of direct access to the live application or its backend infrastructure. At the same time, you have to prevent them from downloading any files, reports or data that they may be able to access. All Analytics platforms have a ‘Download to Excel’ button!

The best method of locking down access is to provide your support staff with ‘virtual desktops’ such as Windows Remote Desktop or Linux VNC with the file transfer and copy/paste functions disabled. They can then access the application with all the required support tools, but have no ability to transfer data out of the live system.

Segregation of Duties

In addition to technology considerations, it is important to adopt a team structure that would theoretically prevent any single person from being able to perform more than one of the following functions:

  • Access the live application and infrastructure
  • Transfer files/data out of the infrastructure
  • Modify system permissions and privileges
  • Access or change system passwords
  • Approve requests such as modifying privileges, granting access or file transfers

The aim here that any security breach would require at least 2 people from separate teams working in collaboration. And because of previous measures taken to deliver centralised auditing and logging, all their actions would be recorded anyway.

Mobile Device Management (MDM)

It is evident that employees are increasingly working from home and being granted access to company files and emails via ‘bring your own’ portable devices (BYOD).

With both internally and externally (cloud) hosted platforms, it is important to make sure:

  • Only approved company devices can gain access
  • All mobile devices are PIN/password protected and have encrypted storage
  • You maintain the ability to remote-wipe sensitive data and disable applications (even on BYOD)

There are a variety of vendors that deliver Mobile Device Management (MDM) solutions that provide the above restrictions and capabilities.

In Summary

Achieving ‘privacy by design’ in the world of analytics can often seem like a daunting challenge due to the wealth of data that can be accessed, downloaded or profiled without consent.   

But with the right choices in software and infrastructure it can be a lot easier than you may think.  

Remember also that analytics cloud service providers will implement a range of these measures by default!

For help with ensuring your managed analytics strategy is alligned with the appropriate levels of privacy, get in touch with our team.


Leave a comment