Jusletter IT

PRDLWorkflow – A Language-based Expert System for Data Privacy

  • Authors: Thomas Lampoltshammer / Thomas J. Heistracher / Sandro Kapeller
  • Category: Articles
  • Region: Austria
  • Field of law: Advanced Legal Informatics Systems and Applications
  • Collection: Tagungsband-IRIS-2013
  • Citation: Thomas Lampoltshammer / Thomas J. Heistracher / Sandro Kapeller, PRDLWorkflow – A Language-based Expert System for Data Privacy, in: Jusletter IT 20 February 2013
Democratic societies and self-determined individuals claim for their right of data privacy. The European Commission and their associated regulations endorse this right. These regulations cause several implementation issues due to the fact that each Member State of the European Union has to implement its own national version, which results in various, not comparable and incomplete realizations. As a consequence, internationally operating small and medium sized enterprises (SMEs) struggle to comply with these heterogeneous laws. One possibility of formally encoding law compliant Enterprise Privacy Policies (EPPs) is the so-called Privacy Rule Definition Language (PRDL), an artificial language developed within the ENDORSE project, funded in the 7th Framework Programme of the European Commission. The aim of the work at hand is to tackle compliance issues by introducing a wizard-based expert system, which assists Data Controllers in writing their EPPs. The bases for this wizard are workflow definitions, which are encoded into the system by the use of PRDLWorkflow, one of the three dialects of PRDL. These workflow definitions are interpreted by the system and then turned into browser-based wizards. It is envisioned that these wizards enable law experts to create, adopt and modify EPPs without the necessity of expert knowledge in the fields of software engineering and expert systems design. This newly created environment shall foster the correct adaption of regulations within enterprises, draw attention to regulation conflicts and guarantee uniformity throughout all subsidiaries.

Inhaltsverzeichnis

  • 1. Introduction
  • 2. The ENDORSE Project
  • 3. PRDL Dialects
  • 4. PRDLWorkflow and Existing Law Information Systems
  • 5. PRDLWorkflow Schema
  • 6. PRDLWorkflow Language Elements
  • 7. PRDLWorkflow Architecture
  • 8. Conclusion
  • 9. Acknowledgment
  • 10. References

1.

Introduction ^

[1]
Various publications and surveys show that the number of electronically analysed data increases rapidly. Enterprises become more and more interested in storing their customers» data and use them for e.g. marketing purposes. Even pure selling of such data is widely common and has developed into a lucrative business field. According to a Eurobarometer survey1, about 70% of the people within the European Union (EU) are afraid of their data being handled incorrectly. As a consequence, the desire for transparency and a law-conform processing of person-related data rises.
[2]
A major issue as regards the regulation of data privacy can be identified in its defective currency. The directive by far is not implemented completely by the majority of EU member states. This holds also true for Austria that was sued in 2010 by the European commission (EC). It was stated that Austria’s legal-organisational implementation of the directive is not appropriate2. The cause for these implementation issues is related to the fact that each Member State of the EU has to implement its own national version, which results in various, not comparable and incomplete realizations. As a consequence, internationally operating enterprises struggle to fulfil the compliance to these heterogeneous law regulations.
[3]
Based on these circumstances, the EC brought up a new proposal. This proposal is dedicated to a novel approach, explicitly stating the orientation towards the new digital age. In addition to that, what is proposed is a regulation, which is directly and uniformly applicable in every Member State. However, the new superordinate regulations are only one side of the new concept. In addition, self-regulating approaches shall be implemented. It is the declared aim of this research work to foster transparency and to improve the current situation.

2.

The ENDORSE Project ^

[4]
The project was funded within the 7th Framework Programme of the European Union. The consortium comprises academic computer science partners, data protection legal experts, software engineers and industry players from all Europe (Italy, Ireland, UK, The Netherlands, Spain and Austria). The project’s main concern lays with the development of a legal-technical framework dedicated to privacy conserving data management. The technical outcomes of this project form a set of open-source tools. These tools are intended to empower the data controllers as well as the data subjects (namely the end users) to handle personal data in a law-compliant way. Furthermore, a methodology for certification is suggested to increase trust in ICT products and services as regards data privacy and data protection.
[5]
With the introduction of the Privacy Rule Definition Language (PRDL) (Kurz et al. 2011, Malone et al. 2008), the ENDORSE project laid the first step of establishing a language for formalization of data protection policies. This enables stakeholders to formulate policies in a clear and understandable way. Due to the formalisation aspect, the policies are machine-readable and can therefore be integrated into established IT infrastructures. In return, this holds benefits for SMEs that want to do automatized checks for compliance (Marcon et al. 2008). During the development of PRDL it became obvious that it is not feasible to integrate all needed functionality into one single language. Therefore, PRDL was split into three distinct dialects.

3.

PRDL Dialects ^

[6]
To better focus on the relevant purposes, three dialects of the underlying natural language-based notation are introduced:
[7]
PRDLepp offers a language dialect for operators of online services. This dialect enables the operators to formulate Enterprise Privacy Policies (EPP) in a clear and understandable way. During the development phase of this dialect several established data protection policies of business partners were analysed. The result is a simple grammar as well as a reasonable number of templates — suitable for most descriptions of EPPs of SMEs. PRDLepp is kept simple by intention to enable persons without expert experience to create and interpret EPPs as well. In addition, the simple design allows for a transformation into machine-readable rules. In further consequence, EPPs can be checked via a rule engine during runtime. The creation of the EPPs themselves is done via an EPP editor provided by the ENDORSE project.
[8]
The second dialect is dedicated to the description of workflows within PRDL. The overall aim here is to create guided user dialogs and integrate them into the EPP editor. This user dialogs are intended to directly support the users during the creation of law- compliant EPPs. The prototype and the included software components for interpreting and executing PRDLWorkflows are the core aspects of this paper. Several application scenarios are possible for the workflows. They could serve as alternative entry point as regards the EPP editor. Thus, by intelligent questionnaires, the context of the EPPs can be derived and the modification can be started at the right point within the editor. Another possibility would be a smart intervention during the acquiring phase of the EPP. For example, if a user enters the keyword «consent» or «minors» into the editor, additional questions are issued towards the user, paired with excerpts of relevant law texts. In return, additional context-relevant rules could be automatically added in the editor.
[9]
Before the EPPs are finally stored within the system, a compliance check is issued. This functionality is realized with PRDLlex. It consists of a set of rules, stored within the backend of the ENDORSE system, to verify country-specific requirements.

4.

PRDLWorkflow and Existing Law Information Systems ^

[10]

Law information systems can be described as systems dedicated to searching in law-related contents. This content may comprise items such as legal texts, verdicts and administrative regulations. An Austrian version of such a system is the «Rechtsinformationssystem des österreichischen Bundeskanzleramtes (RIS)’3. There exist German and EU counterparts as well, namely the «Juristische Informationssystem für die Bundesrepublik Deutschland (JURIS)’4 as well as the «EUR-Lex» system5. The data volume of such systems is ever increasing and as a consequence, new methods for machine-learning-based automatized categorization need to be applied. Text classification methods can group already known documents to classes and build models based on them. As a next step, these models can be utilized to classify new documents. Examples of such implementations can be found in the projects «Know-Center» or «Lexis Nexis». To improve the already promising results, statistical classification methods may be combined with expert knowledge in terms of predefined rule sets (Rapp et al. 2012).

[11]
In the following, general requirements (R1...R9) for the operational aspects of PRDL- Workflow are described:
[12]
R1. Out of the general aim of the ENDORSE project, as well as the main criteria that non-expert users have to be able to use the system as well, several requirements regarding the PRDLWorkflow implementation arise.
[13]
R2. Normally, workflows are used to confront the user with specified questions. Therefore, user interaction is an important criterion. The user should have the possibility to input simple textual data into the system, e.g. answers to questions. In addition, there should be a possibility to choose from static options, for example «yes» or «no» to questions presented. Moreover, dynamic options could be offered as well making it possible to include options from the ENDORSE database during runtime. In general, the users input should be saved during a workflow to be displayed at a later point in time.
[14]
R3. The workflow itself has to support branches in order to build decision trees. The controlling of the workflow will be achieved via user input. This can be compared to the programming solution of IF and ELSE.
[15]
R4. The display of simple text is required in order to deliver excerpts from law texts to the user.
[16]
R5. To foster the reusability of workflows, sub-workflows should be included. These workflows can be integrated into other workflows. As a consequence, every workflow including sub-workflows may be a sub-workflow itself at some point. By this concept, the effort for workflow development as well as redundancy can be decreased.
[17]
R6. To boost the accessibility of the system itself, a Web-based solution prevails. This way no client installation is necessary and the system can be accessed from virtually everywhere as long as there is network or Internet connectivity.
[18]
R7. The workflows themselves have to be generic in that way that they are only created via PRDLWorkflow. The creation or adaptation of workflows may not cause any changes to the system itself. This makes it possible that non-expert users can work with the system. In daily life this would mean that instead of IT personnel, privacy officers will work with the system and create and maintain workflows.
[19]
R8. PRDLWorkflow extends the existing EPP editor with an expert system. Therefore, interfaces have to be defined to enable the two separate applications to interact and exchange data.
[20]
R9. And last but no least, the whole solution shall only build on open-source tools as it is one of the main objectives of ENDORSE to provide access to the results to anyone interested
[21]
Considering these requirements, the XML-based implementation of the workflow language is described next.

5.

PRDLWorkflow Schema ^

[22]
PRDLWorkflow represents workflows in terms of juridical processes, which in return serve as a basis for the creation of user dialogs. In the final step, these dialogs are then integrated into the EPP editor in form of wizards. A standard workflow normally consists of several tasks, each task then represented by a user dialog. The language used for describing workflows is the extended Markup Language (XML). Therefore, an XML schema defines syntax and extent of the language. An example of such a schema can be seen in Lst. 1.

Listing 1: Simple PRDLWorkflow schema

[23]
The structure is mainly divided into two parts: i) the definition of the steps within the workflow (tasks), and ii) the definition of the transitions. This structure is based on existing standards such as BPMN6 and BPEL7 for modelling business processes. In contrast to these standards, PRDLWorkflow is not only used for modelling the process itself but also for describing the user interface — generated from the description itself. There exist different types of tasks within a workflow. Depending on the type of the task, several additional XML elements are available which influence the processing and presentation of the user interface. After defining the steps of the workflow, the sequence between the single steps is determined via transitions. If necessary, conditions may be stated as well to restrict transitions to certain cases. This enables the control of processes via user inputs. The XML schema itself was designed hierarchically via XML inheritance techniques. Thus it becomes possible to describe elements and groups of elements only once, even if they are used in many places.

6.

PRDLWorkflow Language Elements ^

[24]
In order to provide a better understanding of the language elements included in PRDL-Workflow, the upcoming section describes the eight main elements in more detail (see list of elements in Lst. 2).

Listing 2: Extended PRDLWorkflow schema

  • PRDLWorkflow represents the root element and therefore a workflow definition. For the sake of referencing, every definition is associated with an ID.
  • Every workflow features exactly one StartTask. This task initiates the sequence of the workflow.
  • The CallWorkflowTask enables the call of sub-workflows. In this way, frequently occurring workflows, e.g. finding the right jurisdiction, can be represented as basis workflows. This basis workflow can then be included in different workflows as a sub-workflow. When a sub-workflow is finished, the normal sequence of the original workflow is followed.
  • If no user input is required and only text shall be displayed within the wizard, a UserResponseTask is issued. The message tag includes the text to be displayed. Furthermore, additional details may be added with the DetailedMessage tag.
  • The InputTask provides basic input capability for simple texts. This input is, as all inputs, stored within a variable.
  • The StaticSelectionTask provides simple choice options within the user dialogs. These options are static and do not change during runtime.
  • DynamicSelectionTask may be gathered from the ENDORSE database during runtime. What kind of options have to be fetched exactly is decided based on parameters and variables.
  • Transitions connect workflows and set up the sequence and flow. This is done via the use of source and destination parameters. If necessary, conditions may be included as well via Boolean expressions.
  • All user inputs are stored in Variables and can be accessed throughout the whole workflowlifetime.


After this short description of details regarding the language elements, the next section will discuss the key components of the PRDLWorkflow architecture.

7.

PRDLWorkflow Architecture ^

[25]
The expert system itself is a Web-based application, which offers a variety of wizards. These wizards may then be included within the EPP editor to support the end user. The most prominent feature of the system is represented by the fact that the wizards are not «programmed manually» but are generated out of the workflow definitions. This process is not executed during runtime. Instead, a so-called wizard generator is preparing the wizards at the point in time when the workflow descriptions are added to the system. An overview of the architecture can be seen in Fig. 1.
[26]
This type of architecture offers a number of benefits. The pre-generation process provides a higher performance during runtime. Moreover, the creation of the user dialogs is also «outsourced» to the wizard generator, which results in additional performance gains.
[27]
The wizard generator is a Java-based command line application, which generates a wizard out of an according PRDL workflow, as well as the associated user dialogs in form of Java Server Pages (JSPs). The JSPs are stored within a folder in the file system. This folder is then linked to the Web application. The left description block within Fig. 1 depicts the wizard generator’s architecture. The main component here is the wizard generator controller. The controller regulates all processes during the processing of the PRDLWorkflow descriptions and serves as entry point for the application as well. The processing procedure is as follows: first, the Workflow Definition Reader scans the queued XML-based workflow descriptions via JAXB8. Subsequently, the controller iterates over all tasks included in the workflow description. Each task utilizes its own TaskHandler, which is identified via the TaskHandlerFactory. The TaskHandler then collects all necessary data out of the workflow description and initiates the TemplatingEngine for the generation of the wizard. The Template-Repository provides a specific template for every task containing the program logic. The templates themselves can be described as JSP files. These files contain placeholders that then are filled in by data from the workflow description.

Figure 1: PRDLWorkflow architecture

[28]
The runtime environment (see Fig. 1 — right block) for the wizards is a Web application. Before the wizards are runnable by users, they have to be generated and included into the application. Besides the wizards themselves, the Web application provides all necessary functionality for running the wizards and the interaction with other system components. By the help of the wizards it becomes possible to include guided user dialogs into the EPP editor. These dialogs are usually interactive, receiving user inputs, which influence the further data processing.
[29]
The Connect Servlet serves as interface for calls of wizards out of the EPP editor. The editor controls via request parameters which wizard is called. The same interface is also implemented within the editor itself, so it becomes possible to forward the user inputs from the instantiated wizards back to the editor. The wizard pages are stored within the Page Container that represents a file structure enclosing the wizards. For each wizard there exists a folder featuring all user dialogs in form of JSP files.
[30]
The AJAX Servlet provides an interface for requesting data from the server asynchronously. This enables optional and dynamical selection from the ENDORSE database. All methods for calling data via this interface are stored within the Service Facade.
[31]
The expert system as such does not need a database to be functional. However, if DynamicSelectionTasks are utilized, the options are read out of the ENDORSE database. The connection itself is established via a Java Database Connectivity (JDBC) and is handled as Data Access Object (DAO) within the system.
[32]
Based on the technical infrastructure described in this section, the consortium’s industry partners have implemented first scenarios in their own IT infrastructures together with respective rule engines.

8.

Conclusion ^

[33]
The right of data privacy is fundamental to democratic societies and self-determined individuals. Transparent technical solutions to enforce high levels of data privacy are still in early stages. That is why the specific notation of a Privacy Rule Definition Language (PRDL) and its dialects have been proposed to better mediate between the interests of the core stakeholders which are legal bodies, end users (data subjects), and companies (data controllers). Data subjects and also data controllers want to establish law-compliant processes within their enterprises. PRDLWorkflow offers an integrated approach in order to serve both parties at the same time, what is in line with the «strategic» plans and intentions of the European Commission regarding data protection.
[34]
The presented workflow concept in conjunction with the software tools discussed in this paper offers a high degree of integration on an industry level as PRDL can be parsed by machines and therefore is compatible with existing rule engines. On the other hand, the descriptive concept makes the workflow readable for humans as well. As a result, the expert system can be used by people form the legal domain without the need of additional IT experts. Overall, the PRDLWorkflow language, paired with the whole ENDORSE concept provides one important step towards transparent specification and rule-based enforcement of privacy of personal data.

9.

Acknowledgment ^

[35]
This work was funded in part by the European Union’s 7th Framework Program in the ENDORSE project, nr. 257063.

10.

References ^

Kurz, T.; Rücker, C.; Lampoltshammer, T.J. and Heistracher, T. (2011) Privacy Rule Definition Lan- guage — A Multistakeholder Approach to ENDORSE Privacy. In: Frontiers in Artificial Intelligence and Applications Legal Knowledge and Information Systems - JURIX 2011: The Twenty-Fourth Annual Conference Edited by Katie M. Atkinson Volume 235, 2011, pp. 135-139, 14-16 Dec. 2011, Vienna, Austria.

Malone, P., McLaughlin, M., Leenes, R., Ferronato, P., Lockett, N., Guillen, P.B., Heistracher, T., Rus- sello, G. (2010) ENDORSE: A Legal Technical Framework for Privacy Preserving Data Management. Proc. Annual Computer Security Applications Conference 2010, Dec. 6-10, Austin, TX, USA: 27-34.

Marcon, G., Okada, H., Heistracher, T., Corallo, A., De Tommasi, M. (2008) Software engineering within a dynamic business ecosystem. Int. Journal of Business Process Integration and Management 3(4): 239-247.

Rapp, S.; Kienreich, W. and Lex, E. (2012) Maschinelles Lernverfahren für die Automatische Klassi- fizierung von Juristischen Dokumenten. In: Transformation juristischer Sprachen / Transformation of Legal Languages, Tagungsband des 15. Internationalen Rechtsinformatik Symposions IRIS 2012, Feb. 23-25, 2012, Salzburg, Austria.

 


 

Thomas J. Lampoltshammer, School of Information Technology and Systems Management, Salzburg University of Applied Sciences.

 

Thomas J. Heistracher, School of Information Technology and Systems Management, Salzburg University of Applied Sciences.

 

Sandro Kapeller, School of Information Technology and Systems Management, Salzburg University of Applied Sciences.

 


  1. 1 European Commission Special Eurobarometer 359 http://ec.europa.eu/public_opinion/archives/ebs/ebs_359_en.pdf (last access: 08.01.2013).
  2. 2 Heise online http://www.heise.de/newsticker/meldung/Datenschutz-EU-Kommission-verklagt- Oesterreich-vor-dem-EuGH-1128034.html (last access: 09.01.2013).
  3. 3 RIS http://www.ris.bka.gv.at/ (last access: 08.01.2013).
  4. 4 URIS http://www.juris.de/jportal/index.jsp (last access: 08.01.2013).
  5. 5 EUR-Lex http://www.juris.de/jportal/index.jsp (last access: 08.01.2013).
  6. 6 BPMN http://www.bpmn.org/ (last access: 08.01.2013).
  7. 7 BPEL http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html (last access: 08.01.2013).
  8. 8 JAXB http://jaxb.java.net/ (last access: 08.01.2013).