Jusletter IT

Towards Visualisation and End-User Verification in Data Protection Law

  • Authors: Thomas Kurz / Christoph Rücker / Bibi van den Berg / Thomas Lampoltshammer
  • Citation: Thomas Kurz / Christoph Rücker / Bibi van den Berg / Thomas Lampoltshammer, Towards Visualisation and End-User Verification in Data Protection Law, in: Jusletter IT 29 February 2012
The paper at hand presents an approach for the visualisation and end-user verification in the area of online data privacy and data protection. The emphasis is on increasing transparency and ease of use for both, the end-users and small and medium sized enterprises (SMEs) within the European Union. The end-user, in this case the data subject, should be able to request, edit and delete data which he provides in online-services to SMEs and be able to understand for which purposes companies are using his data. This should be represented in an understandable and visually appealing way. On the other hand, SMEs should be supported in being compliant with the international and national data protection laws and legal experts should be enabled to edit data access without technical knowledge or assistance by technical personal. This paper shows how such a framework is currently implemented and which are the main components of the architecture.

Inhaltsverzeichnis

  • 1. Introduction
  • 2. Social Requirements
  • 3. Architecture, Design and Implementation
  • 3.1. End-User Verification Tool
  • 3.2. Privacy Rule Definition Language
  • 4. Conclusions
  • 5. Acknowledgment
  • 6. Literature

1.

Introduction ^

[1]
The starting point of the concerns elaborated in this paper is a simple online registration process of a so-called end-user at any kind of web-service, provided by private or public authorities or enterprises. This could be a social network [1], an online marketplace or a web-service for healthcare by an insurance company. When end-users register at such an online-service they provides non-sensitive and/or sensitive personal data to the authority [2]. The end-user, or the data subject, ideally reads the privacy statements on the website and gives consent to the storage and processing of the data objects provided [3]. Although that is an every day process, there are some issues regarding data privacy, where we intend to support both, the data subject and the SME or public authority offering the online service to on the one hand enforce and on the other hand ensure compliance to data protection law.
[2]
Although data privacy statements should be a fundamental part of a registration process, research reveals that all too often users do not read these statements, because they are lengthy, consist of too much legal jargon, and are hard to understand. If they exist and the user finds them, naturally the first question would be: How long are end-users willing to read such statements and which type of visualization is suitable for those privacy statements [4]? Further an electronic support to add, change and foremost delete data in a trustful way would help the data subject in enforcing his or her rights of data privacy.
[3]
On the other hand, especially small and medium enterprises (SMEs) in the European Union face the issue, that the European Data Protection directive only provides a legal framework and the national and even regional implementations of this directive are different, depending on the national legislation [5, 6]. Therefore SMEs, which are active on the international market, need to be compliant with different levels of stringency dependent on the national implementation of the data protection directive. Further, small enterprises might have no legal experts who are familiar with international data protection law and therefore could benefit from a) a user-guided system to identify compliance based on data protection decision trees and b) a set of data access rules in natural language, which are understandable by non-technical users, automatically machine processable and compliant with law.
[4]
Parts of this paper are funded and carried out by a European Project named ENDORSE [7], which is concerned with providing a legal technical framework for privacy preserving data management (http://ict-endorse.eu). It aims at developing an open source toolset to support data subjects as well as data controllers to handle data in a compliant matter.
[5]
Section 2 shows results of a social analysis of the requirements and already gives starting points for the implementation of an end-user verification tool. Section 3 explains the main components of such a system. In Section 4 the design of each of the main components is explained in detail and implementation details are provided. Section 5 concludes with the main findings and future plans of this work.

2.

Social Requirements ^

[6]
Within the ENDORSE project, the social requirements for the before mentioned end-user verification tool were collected [8]. This task concerned eliciting the socio-cultural requirements for the end-user components that allow for policy inspection and facilitate subject access rights. It surveyed end-user concerns, attitudes, understanding, and requirements regarding online privacy policies and subject access rights. Particular attention was given to application areas in which sensitive personal data are dealt with, such as healthcare and insurance.
[7]
The objective of this activity was to understand how end-users could be helped to fully understand their rights with respect to the processing of their personal data. This includes providing service subscribers with insight in the meaning of privacy policies provided by service providers, in what happens with their data and who accesses and uses these data. Furthermore it includes empowering users to effectuate their subject access rights.
[8]
An online survey was carried out by S. van der Hof and B. van den Berg, which was based on three principles: transparency, accountability and trust. The survey focused on privacy protection (informing users of which data about them is collected and/or processed), privacy perception (privacy concerns beyond compliance with law) and privacy timing (differentiation according to when information is anticipated or wanted and how this information is communicated). The results of the quantitative research showed a set of requirements toward the end-user verification tool. Information should be simple, sufficiently specified, relevant and clear. It was anticipated that a graphical representation of information as in [19] is created. The participants of the survey valued the representation of privacy statements and the respective legal knowledge in everyday language, i.e. text, very high in comparison to visualizations and icon-based approaches. Moreover, two thirds of the respondents want information from the first moment when they enter into a relationship with a company (online shop or social network site).
[9]
In recent years several research groups have turned their attention to communicating privacy-related information, regarding the data collection and processing practices of websites, through the use of icons (for a complete overview of these different initiatives, please see [11]). The central idea is that icons and symbols are easy to understand, and provide information at a single glance, rather than through the use of (long lines of) text. One of the earliest attempts to capture privacy-related information in icons comes from Mary Rundle, who created a set of seven icons, which companies could post on their websites, for example to communicate to users that they would not use their data for marketing purposes or that they would not trade or sell users’ data [18]. In the PrimeLife [20] project, an EU FP7 project on privacy and identity in the online world, a much larger set of icons was developed and tested among a user base [19]. This was a layered system: users could hover over or click the icons to receive more information and access the legal text relating to the protection of their personal data. The icons in the PrimeLife set communicate for example whether or not a website is tracking users’ behaviors, facilitating anonymisation, and whether or not the data are passed on to third parties. Moreover, even such complex issues as whether or not data are aggregated with personalized third-party information, and whether or not the processing practices of the company fall under EU law or equal protection were captured in icons [11].
[10]
When reviewing the results from these attempts to make privacy-related information more accessible to end users, it turns out that capturing such complex, detailed material as data protection legislation in one single image, or a (relatively) limited set of images is incredibly difficult. For one, the icons that were developed in the PrimeLife project consist of circles that often contain several, rather small elements. The icon ‘tracking’ taken from [19], is a clear example in case: in order to express accurately the issue at stake a rather complicated drawing is required (just as one cannot explain the issues surrounding tracking personal data in one or two words either). Almost all of the icons in this set, and for that matter in the other sets presented in Hansen’s article as well [11], unfortunately suffer from this same problem: they attempt to communicate such fine, detailed information that the resulting image becomes too complex to understand at a single glance. Apparently, communicating legal issues through simplified visualisations such as symbols and icons is not an easy task, and in light of the findings from the user survey, which show that users prefer to receive information in everyday language, and/or through the use of examples rather than through icons, we decided to take another approach to improving the accessibility of data collection and processing practices communicated to end users.

3.

Architecture, Design and Implementation ^

[11]
The basic architecture of the suggested end-user verification tool includes a couple of components, which are loosely coupled, rely on standard interfaces and therefore allow a modular architecture.
[12]
The main component and user interface for the data subject is the web-based end-user verification tool. The requirements for that are derived mainly from the social requirements summarised in section 2. The privacy statements as well as the underlying legal obligations are encoded in a so called Privacy Rule Definition Language (PRDL), which represents legal texts and natural language privacy statements in structured natural language and in parallel is automatically processable by a rule engine in the backend system [9].
[13]
Additionally, a simple editor for defining expressions in PRDL should be provided, which is web-based and supports the user with links to, e.g. legal provisions and ontologies for common purposes for data processing. The editor should be usable by end-users for modifying and even withdrawing the consent for data usage and should be usable for data controllers at companies to formulate their privacy statements [10]. The backend system consists of data repositories for the user data as well as a repository for the PRDL rules and a rule engine, which is capable to granting access to the user data, based on the given PRDL rules. Furthermore the data controller could be supported by a wizard-guided system for stepping through compliance decision trees, helping in identifying sensitive data according to legal restrictions.

3.1.

End-User Verification Tool ^

[14]
The end-user verification tool is a GUI-driven web-interface that is capable of verifying policy compliance to the end user. This tool will give an overview on all relevant data privacy settings and implements an interface to the underlying PRDL repositories to give timely feedback on the current and past status of user data and privacy policies. This 'transparency' tool is intended to foster trust in the privacy protection environment. It is also intended to be a tool by which the data subject can intuitively check the effectiveness of privacy statements and thus 'learn' to formulate best-suited privacy preferences. The tool needs to be installed and integrated with a company’s backend systems by the company itself. With the installation and integration, a SME can earn the ENDORSE seal which includes a certification process, where the correct implementation is checked by independent authorities. The detailed certification process is currently under development by the ENDORSE consortium.
[15]
One of the interesting findings of the survey we discussed in section 2 was that the informational wishes of end users neatly align with the requirements laid down in data protection law: end users tend to want to be informed of the same information processing issues (passing information on to third parties, processing purposes etc.) as the legal demands that companies need to meet. On some level, of course, this is not surprising: if all goes well legal requirements mirror the demands of the people they aim to protect, or at least align with these demands. However, other attempts at improving the accessibility of privacy statements, or companies’ data collection and processing practices have never started from this finding.
[16]
This led us to the idea of going back to the origins of (almost all) of the data protection legislation that is available today: the OECD Guideline, composed in 1980, on the ‘Protection of Privacy and Transborder Flows of Personal Data’, also known as the Fair Information Principles [12].
[17]
The 'Fair Information Principles' consist of a set of principles that form the basis of the European Data Protection Directive, along with most of the data protection legislation of the Member States. There are eight basic principles in the OECD Guideline, ranging from a Collection Limitation Principle (also known as the data minimization principle, which stipulates that one can only collect those data one needs to complete a certain action, and no more than that), the Data Quality Principle (data should be accurate and up to date), and the Purpose Specification Principle (data may only be collected and processed for specified purposes).
[18]
The survey revealed that users look for precisely these types of information when engaging with companies who set out to collect and process their data. This is why we have rephrased the key principles in the set in everyday language and have placed them on the spokes of a wheel. Clicking on the spokes enables users to access second and even third layers of information, where they receive progressively in depth information about each specific aspect of the processing. The spokes have the following labels: limited collection, data quality, clear purposes, limited use, safe & secure, consent, third parties and hold us accountable (see Figure 1).
[19]
A possible visualization of the end-user verification “entrance” at a website therefore could be a flash visualization as in Figure 1 in the corner of an ENDORE certified website. By clicking on the wheel, the end-user can access further data about the privacy settings at this particular company.
[20]
Structured along this very broad and general linking point to the end-user verification tool, the data privacy statements will be visualised by tag-clouds (see Figure 2). Here most frequently used terms are presented in natural language and colour schemes can be introduced to emphasise purposes, critical terms like “marketing” or “third party usage”. By clicking on the terms in the tag clouds, users can see the respective sentences in the privacy statements and therefore can easier access the individually interesting parts of the policies. In order to be able to automatically derive data access rules for the user data, the data privacy statements of companies will be formulated in the later described PRDL language, which is described below. This helps also in processing the privacy statements automatically and links the legal provisions with the automatic data access. Additionally, graphs [24] could be introduced to show the dependencies of legal provisions across countries.

3.2.

Privacy Rule Definition Language ^

[21]
The main purpose of enterprise privacy policies aims at covering internal usage only. However, several key aspects suggest a shift towards a standardization process of such polices and their included processes. One aspect is related to the issue of technical implications embedded into the assembling process of regulations which could be dealt with. Furthermore, heterogeneous data sources within the organization would get manageable in that way, that as a first step it could be guaranteed to enforce the internal policies on all these repository completely. Last but not least, globalization progresses further and moreover, data exchange between global players becomes part of the daily business. As a consequence, the legal aspects regarding use and distribution of the exchanged data have to be covered all the time, which implies the implementation of homogeneous enforcement mechanism as regards all parties involved in the business process [13]. The data, its related preferences as well as the covering policies may be subject to changes, due to transparent processes [15] in order to empower the customer to have full control over his own data. Therefore, the enforcement implementations and tools have to be able to cover a lifecycle, not only regarding the data they are protecting, but also regarding their own repositories and properties.
[22]
In order to cover the above-mentioned aspects, the Privacy Rule Definition Language (PRDL) is developed. For the representation of privacy policies by PRDL, the language itself has to provide enough expressivity in order to incorporate the major aspects of data privacy as they are stated in the European Directives governing the use of personal data [6, 22, 23]. At the same time however, it should evade large vocabularies in order not to introduce further complexity to the concept. One of the main objectives is to bring more transparency to the data subject (the entity which data is processed by the system). Furthermore, legal compliances should be provided.
[23]
In order to cope with these issues, the language has to come up with the right balance between expression power to formulate large aspects of the relevant regulation and the efficiency in terms of usability and applicability to be utilized in all areas of SME environments and as a last consequence for all types of users as well. Therefore, PRDL has to consider user-focused facets as well as extensibility, comprehensibility and usability [16]. In addition to that, the handling of legal aspects out of the regulation shall be covered as well and so the privacy rule definition language has to incorporate the aspect of purpose [17] into its structure.
[24]
The current PRDL development process encompasses an XML schema providing a basic structure. Furthermore, natural language-based rule templates were derived to cover usability aspects. The elements within the syntax template represent rule elements in natural language while elements contained in {} stand for single choice elements being part of PRDL. The last elements in () are for the use of comprehensible natural language. At the current development process there are templates available for normal access rules with a constraint, data access rules with time constraints as well as normal privacy data access rules. The upcoming template represents a normal privacy data access rule.

[DataController] {MUST, MAY} {VIEW, ADD, DELETE, MODIFY, STORE, ACTION}

[Data Object] (FOR) [Purpose]

[25]
A practical use-case of above-mentioned template in Drools Rule language is shown in the following.

Rule 3: EurA MAY process name, address, mobile number, FOR customer identification

rule "Access to name, address and mobile number"

when

$r : Request (dataprocessor.getAuth == "EurA"

&& modality.get() == "may"

&& requesteddata.isEqual("name,address,mobilenr")

&& purpose.getPurpose() == "customer identification")

then

$r.grantAccess();

end

[26]
As it can be seen, the rule is on the one hand readable for a human being and on the other hand it can be processed in an automatic way such that an interference of a human is only necessary for further conflict resolution purposes.

4.

Conclusions ^

[27]
The consideration of a so called end-user verification tool, outlined in this paper, started with the need for more transparency, accountability and trust in the interaction with online services like social networks and online marketplaces. A survey was conducted which showed that the wishes of the end-users in online interaction match considerably well with already well known principles of the protection of privacy and trans-border flows of personal data. Additionally, the representation of privacy statements in an icon based approach might lack information needed by the end-user to evaluate if a website is trustful or not. Besides the needs of the end-users of such services, also issues of companies in the international market were considered. Especially data controllers at small and medium sized companies (SMEs) need support for the set-up of law compliant IT services.
[28]
The three main components, which were described in this paper, will help the end-users to get more transparency on the data usage, processing and storage of their personal data. It helps SMEs to show their law compliance to their customers that should therefore increase trust in online services and support data controllers in their daily work. By introducing a Privacy Rule Definition Language (PRDL), which is understandable by end-users and data controllers without programming skills but also effects directly the data access at the backend systems, small changes in the IT system can be done without changing the programming of the IT system itself.
[29]
As a next step the PRDL language should be extended in order to understand business processes and therefore being able to describe and execute compliance decision trees. In many legislations such trees are available to guide data controllers through a compliance process. By reusing the PRDL rule set automated wizards should be developed, which lead a data controller through e.g. an applicability process with linkages to the respective data privacy law in different countries.

5.

Acknowledgment ^

[30]
This work was funded in part by the European Union's 7th Framework Program in the ENDORSE project, nr. 257063.

6.

Literature ^

[1] Boyd, D. M., & Ellison, N. B., Social network sites: Definition, history, and scholarship, Journal of Computer-Mediated Communication, 13(1), article 11 (2007).

[2] Årnes, A., Skorstad, J., Paarup Michelsen, L. H., Social Network Services and Privacy, A case study of Facebook, (2011).

[3] Pollach, I., The scope and depth of privacy statements on business-to-consumer websites, Proceedings to IADIS International Conference WWW/Internet, Pages: 1171 – 1174, (2003).

[4] McDonald, A.M., Faith Cranor, L., The Cost of Reading Privacy Policies, ACM Transactions on Computer-Human Interaction, Volume: 4, Issue: 3, Pages: 1-22 (2008).

[5] Anderson, R., It’s a Jungle Out There, Data IQ Journal, Summer (2011).

[6] Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications)

[7] ENDORSE Project, Legal Technical Framework for Privacy Preserving Data Management, FP7, http://ict-endorse.eu/, (2012).

[8] Van der Hof, S., Van den Berg, B., D2.2, Social Requirements and Implications, ENDORSE Project, (2011).

[9] Kurz, T., Rücker C., Lampoltshammer T., Heistracher, T., D3.2, Privacy Rule Definition Language - Preliminary Specification, ENDORSE Project, (2011).

[10] Kurz, T., Rücker C., Lampoltshammer T., Heistracher, T., D4.3, Rule Engine – Preliminary Implementation, ENDORSE Project, (2011).

[11] Hansen, M., Putting privacy pictograms into practice: A European perspective. In GI Jahrestagung, edited by Fischer, S., Maehle, E., and Reischuk, R. GI, Volume 154: 1703-1716, (2009).

[12] OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, http://www.oecd.org/document/18/0,3343,en_2649_34255_1815186_1_1_1_1,00.html, (1980).

[13] Backes M., Pfitzmann B., Schunter, M., A toolkit for managing enterprise privacy policie, European Symposium on Research in Computer Security (ESORICS), pages 101–119. Springer Lecture Notes in Computer Science 2808 (2003).

[14] Jone, R., Tahr, D., An overview of EU data protection rules on use of data collected online, Computer Law & Security Review 27 (6), P. 630–636, (2011)

[15] Ardagna, C.A., Cremonini, M., De Capitani di Vimercati, S., Samarati, P., A privacy-aware access control system, Journal of Computer Security, 16(4):369-397, (2008).

[16] Yague, M.I., Survey on XML-Based Policy Languages for Open Environments, Journal of Information, Assurance and Security 1, P. 11-20, (2006).

[17] Al-Fedaghi, S., Dismantling the Twelve Privacy Purposes, IFIP International Federation for Information, Processing, Volume 238, Pages 207-222, (2007).

[18] Rundle, M., International Data Protection and Digital Identity Management Tools, Presentation given at the Privacy Workshop I of IGF 2006, Athens (2006).

[19] Holtz, L. E., Nocun, K., Hansen, M., Towards Displaying Privacy Information with Icons, Privacy and Identity Management for Life, IFIP Advances in Information and Communication Technology, Volume 352/2011, 338-348, (2011).

[20] PrimeLife – Privacy and Identity Management in Europe for Life, http://www.primelife.eu/, last accessed on 22/06/2011.

[21] Amazon Privacy Notice, last accessed on 10.01.2012, http://www.amazon.com/gp/help/customer/display.html/ref=hp_551434_privacy?nodeId=468496.

[22] Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data

[23] Directive 2009/136/EC of the European Parliament and of the Council of 25 November 2009 amending Directive 2002/22/EC on universal service and users’ rights relating to electronic communications networks and services.

[24] TouchGraph, http://www.touchgraph.com/navigator, last accessed on 10.01.2012.