Jusletter IT

Open Government Data Licenses Framework for a Mashup Model

  • Author: Martynas Mockus
  • Category: Articles
  • Region: Italy
  • Field of law: Data Protection
  • Collection: Tagungsband IRIS 2014
  • Citation: Martynas Mockus, Open Government Data Licenses Framework for a Mashup Model, in: Jusletter IT 20 February 2014
This article analyses the problems coming from the intellectual work on mashup of open government data, defines and describes open data principles, investigates the licenses of open government data, proposes how to use the technology as a tool to manage the problems and to enrich basic legal principles used in the copyright of databases. As example, CC-BY v.4.0 licenses are analyzed.

Inhaltsverzeichnis

  • 1. Introduction
  • 2. Definitions and principles
  • 3. Variety of open government data licenses
  • 4. Creative commons licenses v. 4.0
  • 5. How to make licences more suitable for mashup models?
  • 6. Conclusion and Perspectives for the Future
  • 7. Acknowledgements
  • 8. References

1.

Introduction ^

[1]
Probably, the most significant results coming from open government data can be extracted if the data is merged, connected, combined, mixed or other ways enriched and analyzed, but there exist legal problems which do not allow to do it smoothly.
[2]
Open data licenses are not unified and that problem influences a deep analysis of open data licenses for every developer before starting to connect different datasets in a mashup model. It is not clear what kind of the new license could be applied for mixed, new-value added datasets. The copyright laws are not designed to protect fluent development of the intellectual work on mashup [1]. A legal protection of the datasets connected and enriched by AI tools is also a «grey area» of the legal framework.
[3]
The objective of this research is to analyze the legal framework of the most frequent licenses adopted in open government data global scenario and to create the theoretical model, which will be able to inspire an automatic or the semi-automatic computational model that could check the compatibility among different licences. Moreover, the legal framework should be able to create basic legal principles for defining the new license of the new dataset result of the mashup or of the significant updating.
[4]
The research started with the survey of definition and principles of open data, and continued with the survey about the licenses used by the different governments (the research was limited to N. America, S. America, Europe and Australia) around the world in their open data portals. The most used licenses were identified and are presented in the article. Those licenses were analyzed, critical questions were formulated: (1) Is it possible to find the model to compare the licenses in order to discover which could be in compliance with each other? (2) Which permissions are allowed in case of compatibility or in case of conflicts? (3) Which license the new dataset should have in case of mashup of different datasets with different licenses? (4) How to define «derivative work» and what happens to the ownership of the dataset? (5) How to model the evolution of the dataset over the time under the license point of view and so indicate appropriately the different contributions from the several actors?
[5]
An informal logical analysis of norms of each top-used license is on the process and available results are presented in this article. In parallel, the licenses in the future will be analyzed from the different perspective of the actors involved in the open datasets ecosystem (e.g. producer, user, enricher, enabler, etc.), to detect critical issues and weaknesses.
[6]
The results presented in the article could be useful for the other researchers in the open data and the copyright law domain, also for the representatives of public administration to gain deeper understanding of the problem of the multiplicity of licenses of open data.
[7]
In this article the order of investigations is as follows. In Section 2, definitions and principles of open data and open government data are described. Section 3 presents results of a survey of the open government data global scenario. In Section 4, the most popular Creative Commons licenses are analyzed and results of the investigation of licenses compatibility are presented. Section 5 introduces the technology and represents the example of formulization of licenses. In Section 6, conclusions and prospects for the future are presented.

2.

Definitions and principles ^

[8]

Open data definitions come from different sources. E.g. Wikipedia uses this definition: «Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control» [2]. This definition is very ambitious and represents the main idea about the freedom of data, but in real life it is more complicated. The Open Knowledge Foundation proposes the more realistic description: «Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike» [3]. This definition represents the legal united conditions e.g. attribute or share-alike which comes from the legal domain of open data and it is represented trough licences. The definition of open data contains these principles:

  1. Availability and Access – the data in a convenient and modifiable form should be available and downloadable over the internet all the time;
  2. Re-use and Redistribution - the data must be provided under the terms that permit reuse and redistribution including the intermixing with other datasets for free of charge (no levy, no closed paid format of the data);
  3. Universal Participation – the conditions of use, re-use and redistribution of data should be not restricted and should be allowed for everyone for all the purposes (e.g. commercial re-use);
  4. Interoperability – the data should be open to interoperate – or intermix – different datasets in terms of technical and legal conditions [3].
[9]

An open government data definition comes from the public sector information re-use definition in the EU. Directive 2003/98/EC implements a main principle: «Member States shall ensure that, where the re-use of documents held by public sector bodies is allowed, these documents shall be re-usable for commercial or non-commercial purposes». The Directive 2003/98/EC defines re-use of the public sector information (PSI): «re-use means the use by persons or legal entities of documents held by public sector bodies, for commercial or non-commercial purposes other than the initial purpose within the public task for which the documents were produced». The Directive 2003/98/EC started a process of sharing PSI with private bodies, but it started to develop differently from the main principle implemented by the Directive 2003/98/EC in the Member States, e.g. imposing taxes for using data of Real Property Register and Cadaster in Lithuania or even forbidding re-using and re-selling data to third parties [4]. Those cases had influenced the European Commission to review the Directive 2003/98/EC and the European Parliament has updated it by adopting new Directive 2013/37/EU in 2013. In the revised Directive the definition of PSI re-use comes closer to the open data definition, but there are still special issues which allow charges for PSI re-use: (1) Minimal restriction to re-use (3p.); (2) Interoperability (20p.); (3) Machine-readable format: (21p.); (4) Open licences (26p.).

[10]
To sum up, open government data in the EU could be defined as the public sector information offered paid or non-paid for a non-commercial and commercial re-use, available in a machine-readable format, interoperable, covered likely by open licences or minimal restrictions to re-use it. On the other hand, a condition which defines that the paid information is released in proprietary standard and covered by a special licence, is converse to the open data definition. From that point of view, not all PSI could be defined as the open government data, but only that which corresponds to the open data definition.
[11]
Moreover, this definition is still more theoretical, and in practice government data «as open data» is not widely available. This is because de facto Member States are still not using the updated version of the Directive. It should be implemented by 18 July 2015 and revised before 18 July 2018.
[12]
Open Government Data in the U.S. is defined as Federal Government’s, executive departments and agencies information, which is publicly available data structured in a way, that enables the data to be fully discoverable and usable by end users. Open Government Data consists of these principles: (1) public: all the data except that data which is not allowed to be published by law should be available; (2) accessible: machine readable, indexed, open, not proprietary formats; (3) described: all expected data for the re-user should be provided, e.g. how to process, limitations, security requirements; (4) reusable: no restrictions to re-use are guaranteed by open licence; (5) timely: data should be released as quick as possible; (6) managed post-release: there should be an interacting service provided to respond to complaints [5]. The U.S. and EU open government data definitions are not consistent. The U.S. definition is closer to the open data definition than EU definition.
[13]
According to Tim Berners-Lee [6], it is not only important to open the data, but also it is necessary to connect the data with links creating linked open data (LOD). His explanation about the 5 stars scheme of open data development fulfills the presented open government data definition from a developers’ point of view: (1) available on the web (whatever format) but with an open licence, to be open data; (2) available as machine-readable structured data (e.g. excel instead of an image scan of a table); (3) as (2) plus non-proprietary format (e.g. CSV instead of excel); (4) all the above plus use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff; (5) all the above, plus link your data to other people’s data to provide context[6].
[14]
The Open Knowledge Foundation suggests the following definition of open government data: «Data produced or commissioned by government or government controlled entities and it can be freely used, reused and redistributed by anyone». The following benefits of open government data are expected: transparency, released social and commercial value, and development of Participatory Governance [7].
[15]
In conclusion, the definition of open government data is not united, it’s differently understandable in government institutions, different regions and among the open data app developers community. This situation determines different expectations from the developers of open data apps and the public bodies, and it creates pluralism in the legal framework of open government data and does not solve the legal problems coming from the mashup of open government data.

3.

Variety of open government data licenses ^

[16]
The primary target of this work is to investigate the legal framework of licensing of open government data by collecting and comparing the different licenses and analyzing the possibilities of the different licence combinations.
[17]
During October and November 2013 selective analysis of the open data portals has been conducted. The following methodology was used: there were selected randomly 30-50 different datasheets found in the open government data portals and the type of the license, which protects the datasheet, was identified. It gives us the results that confirm that the following different kinds of licenses are used to protect open government data:
  1. CC0 licenses are used by public administration institutions in Italy, The Netherlands and Uruguay;
  2. CC-BY licenses are used by public administration institutions in Australia (CC BY 3.0 AU), Austria (CC BY 3.0 AT), Chile (CC BY 3.0 CL), Germany, Italy, Portugal (CC BY 3.0 PT), New Zealand (CC BY 3.0 NZ) and also by Bahía Blanca (Argentina) municipal (CC BY 2.5 AR);
  3. CC-BY-SA licenses are used by public administration institutions in Brazil (CC BY-SA 3.0) and Greece (CC BY-SA 3.0 GR);
  4. CC-BY-NC licenses are used by public administration institutions in Italy and Germany (CC BY-NC 2.0);
  5. ODBL licenses are used by public administration institutions in Argentina, Brazil and Chile;
  6. PDDL licenses are used by Bahía Blanca (Argentina) municipal, Costa Rican public administration institutions and Metropolitan Municipality of Lima;
  7. Local licences are used by public administration institutions in Brazil (not openly licensed), Belgium, Canada (Open Government Licence), France (Open Licence), Italy (IODL v1.0 , IODL v2.0), Germany, Norway (NLOD), Moldova, Spain, UK (OGL), Uruguay (Uruguay Open Data Licence), U.S and also by European Commission and Rotterdam (The Netherlands) municipality (Open licence).
  8. Without protection of license we found some datasheets in U.S. and Uruguay;
  9. Licensing depends on the providers of datasheets in Brazil, Belgium, Chile, Italy, Germany, U.S., Uruguay and also in the municipality of Bahía Blanca (Argentina);
  10. Only one single kind of license we found in Australia, Austria, Canada, Costa Rica, France, Greece, Moldova, New Zealand, Norway, Portugal, UK, Spain, municipality of Lima, municipality of Rotterdam and European Commission (legal notice).
  11. Central government and municipalities use different kind of licenses in Argentina (e.g. government use ODBL, Bahía Blanca municipality use CC BY 2.5 AR and PDDL) and The Netherlands (government use CC0, Rotterdam municipality use local license).
[18]
The results of this survey require future investigation of comparison of licences and the effect on datasets coming from different jurisdictions and different permissions to use data. These results show that open data principles are not usually respected, e.g. by covering datasheets by attributes «non-commercial use only». The most popular licenses are coming from Creative Commons.

4.

Creative commons licenses v. 4.0 ^

[19]
The Creative Commons Corporation in the end of the year 2013 released the version 4.0 of the licenses. Those licenses incorporate sui generis database rights and are more adopted for the international use of licenses. This survey presents comparison of CC-BY v. 4.0 licenses by main terms. The CC-BY 4.0 license respects the open data principles mostly and CC-BY-NC-ND 4.0 license -the least. The survey basically confirms the same discoveries made by Morando [8] in the previous versions of the Creative Commons licenses, except the datasets/databases covered by CC-BY-ND 4.0 and CC-BY-NC-ND 4.0 licenses cannot be used in a mashup of datasets/databases models, because no modification is allowed, including arrangement with additional datasets. The results of the survey are:
  1. The licenses which restrict to use the licensed material for commercial purpose are CC-BY-NC 4.0, CC-BY-NC-SA 4.0 and CC-BY-NC-ND 4.0.
  2. The licenses which allow reproducing and sharing the licensed material, as a whole or in part are CC-BY 4.0, CC-BY-SA 4.0, CC-BY-ND 4.0, CC-BY-NC 4.0, CC-BY-NC-SA 4.0, CC-BY-NC-ND 4.0.
  3. The licenses which restrict to share adapted material (modification) are CC-BY-ND 4.0 and CC-BY-NC-ND 4.0.
  4. The licenses which require that the same terms or conditions should be provided on the modification are CC-BY-SA 4.0 and CC-BY-NC-SA 4.0.
  5. CC-BY 4.0 can be used for a mashup with other licences, except those which do not allow modifying: CC-BY-NC-ND 4.0 and CC-BY-ND 4.0.
  6. CC-BY-SA 4.0 licence requires same BY-SA terms for the modifications, which does not allow to set the extra terms as non-commercial or no modifications. CC-BY-SA 4.0 licence is combined only with CC-BY 4.0 and CC-BY-SA 4.0 licenses.
  7. CC-BY-ND 4.0 and CC-BY-NC-ND 4.0 licences cannot be used for a mashup with the other licences because it does not allow modifications.
  8. CC-BY-NC 4.0 licence can be used for a mashup with the other licences which allow setting extra terms as non-commercial or it has it already: CC-BY 4.0, CC-BY-NC 4.0 and CC-BY-NC-SA 4.0.
  9. CC-BY-NC-SA 4.0 licence can be used for a mashup with the other licences which allow setting extra terms as non-commercial and no modifications or it has it already: CC-BY 4.0, CC-BY-NC 4.0 and CC-BY-NC-SA 4.0.
[20]
From the perspective of a mashup of the different datasets/databases the survey shows that not all CC-BY v. 4.0 licenses are compatible. E.g. if content covered by the CC-BY 4.0 license is mixed with content covered by the CC-BY-NC-SA 4.0 license, the new content should respect all CC-BY-NC-SA 4.0 license conditions. On the other hand, we can find that general terms of the CC-BY v. 4.0 licences require to use the same license conditions for taken original content and to identify that content. Basically, the new license for the mixed content provided in the example should be covered by the CC-BY-NC-SA 4.0 license plus datasets/database which were covered by the different licences must be identified and the information about the conditions of the primary licence to the identified concrete content must be provided.

5.

How to make licences more suitable for mashup models? ^

[21]
We can find that theoretically it is not a problem to mix two CC-BY v.4.0 licences, but when we look at the practical use of mashup models, we can discover these problems:
  1. According to CC-BY v.4.0 licences’ general terms, every dataset requires identification and link to the primal original licence. E.g. if there is a new database which contains 100 datasets from 100 different sources, then there should be 100 links to the original licenses with concrete identification of taken datasets in the new license. Practically, the end-user hardly can read and understand the terms. If we focus on the smart technologies based on the semantic web, those datasets numbers could be «uncountable», which makes the terms of use not understandable for end-users, re-users, providers, (e.g. application creator) and related third parties (e.g. Apple Inc.).
  2. It is easy to understand whether CC-BY v.4.0 licences are combinable or not, however, it is hard to distinguish if the other kind of licences, that are not produced by Creative Commons Corporation, but by the governments, NGO’s and etc., are combinable or not. The language barriers, the different law systems, and the different regulations could make the mashup very difficult and costly.
[22]
There are already the answers on how to solve these problems. Gangadharan G.R. et al. [9] announced «representing license terms in a machine interpretable way is a first step towards resolving the intellectual rights in mashups». Governatori G. et al. [10] describes the model how formalizing the licences by deontic logic using two heuristics AND and OR could automatically generate a set of licences. To sum up, the technology as a tool could be used for solving problems raised by the legal framework.
[23]
By following the idea of Governatori G. et al. [10], CC-BY v.4.0 licenses could be formalized in the informal logical analysis. The results of such analysis are represented further.
[24]
CC-BY v.4.0 licenses have the common (general) rules, which are applied to all the kind of the CC-BY v.4.0 licenses, and specific rules, which are applied to licenses differently. We could extract these general CC-BY v.4.0 rules in this example:
  1. IF You Share the Licensed Material OR Modification, AND IF is NOT requested by the Licensor to remove any of the information THEN You must provide the identification of the creator(s) AND a copyright notice AND a notice that refers to Primary License AND a notice that refers to the disclaimer of warranties AND the URI or the hyperlink to the Licensed Material to the extent reasonably practicable.
  2. IF You Share the Modification THEN you must indicate AND retain an indication of any previous modifications.
  3. IF You Share the Licensed Material THEN you indicate the Licensed Material is licensed under Primary License, AND include the text of, or the URI or hyperlink to, Primary License.
  4. IF You Share the Licensed Material THEN every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of the Primary License.
  5. IF You Share the Licensed Material THEN You may NOT offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material
  6. IF You fail to comply with the Primary License, THEN your rights under the Primary License terminate automatically. This rule doesn’t apply: automatically as of the date the violation is cured, provided it is cured within 30 days of your discovery of the violation; or upon express reinstatement by the Licensor.
[25]

Specific CC-BY v.4.0 rules are presented in the table:

Specific CC-BY v.4.0 rules\Licences CC-BY 4.0 CC-BY-SA 4.0 CC-BY-ND 4.0 CC-BY-NC 4.0 CC-BY-NC-SA 4.0 CC-BY-NC-ND 4.0
Reproduce and Share the Licensed Material, in whole or in part for any purpose X X X      
Reproduce and Share the Licensed Material, in whole or in part for non-commercial purpose X X X X X X
Produce, Reproduce, and Share the Adapted Material X X   X X  
IF You Share the Modification THEN 1) The Adapter’s License You apply must be the Creative Commons license with the same License Elements, this version or later, or the Primary License Compatible License, AND2) You must include the text of, or the URI or hyperlink to, the Adapter’s License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share the Adapted Material, AND 3) You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to the Adapted Material that restrict exercise of the rights granted under the Adapter’s License You apply.   X     X  

Table 1: Comparison of CC-BY v. 4.0 licenses by informal logical analysis of norms

[26]
After the deeper formalization of rules, the enrichment of semantics, and the involvement of the more and more kinds of licences, transformation of licenses to the machine readable format, e.g. XML, the idea of the automated license terms operation by the multi-agents could be materialized. Scholars are working already by trying to solve these issues: G. Governatori, A. Rotolo, S. Villata and F. Gandon [11-12] are developing a very preliminary copyright ontology, Deontic Logic Semantics in open data licensing; Palmirani [13-16] is an expert of legal XML, LegalRuleML and is working on open data projects and already demonstrated in different papers how to use LegalRuleML for modeling rules of legal documents. Either, the related project «ENGAGE» is funded by the EU [17].

6.

Conclusion and Perspectives for the Future ^

[27]
The creation of the European digital single market without language barriers, high expectation in economic value of the re-use of public sector information in Europe, and practical situation of open data domain dealing with problems coming from mashups, forces us to look for more innovative adaptation of the legal framework that is coming from the copyright and sui generis database rights which are used to ensure protection of creators’ rights, innovation and economic growth.
[28]
The process of licensing of open government data shows that the open data principles are not respected enough and extra terms in licenses are included. The preliminary investigation shows that the most popular licenses are Creative Commons licences in the global scenario of open government data, but not all CC-BY licenses are combinable together.
[29]
Making licenses understandable and compatible to machines, by which, as technology, datasets/databases are used, could help to solve the problems. Formalization of the rules in machine understandable and compatible language could set the requirement to use the technologies designed to respect the rules. On the other hand, this could energize the open data domain, because of the savings on investigation time and the costs for creating the mashups.
[30]
The future work could be to investigate the change of open government data’s legal framework global scenario, to look for alternative ways of using technology to solve the problems, to enrich proposed technology by formulization of more popular licenses, developing of computational ontology, creating the theoretical model and legal principles for the mashup of open government data.

7.

Acknowledgements ^

This research is funded by the ERASMUS MUNDUS program LAST-JD, Law, Science and Technology [18] and supervised by Prof. Monica Palmirani.

8.

References ^

[1] Arase M.T., Digital sampling & the mash-up: a «grey» area for copyright law. The University of Arizona, James E. Rogers College of Law. SSRN-id2160020. http://ssrn.com/abstract=2160020 last accessed 1 January 2013 (2012).

[2] Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z., «DBpedia: A Nucleus for a Web of Open Data». The Semantic Web. Lecture Notes in Computer Science 4825, doi:10.1007/978-3-540-76298-0_52. ISBN 978-3-540-76297-3 . pp. 722. (2007).

[3] Open Data handbook, http://opendatahandbook.org/en/what-is-open-data/ last accessed 1 December 2013 (2013).

[4] National Audit Office of Lithuania, Report on Activities of the State Enterprise Centre of Registers in Providing Public Services. http://vkontrole.lt/failas.aspx?id=486, last accessed 1 December 2013 (2004).

[5] Executive office of the President, Open Data policy. Managing information as an asset. May 9, 2013. http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf last accessed 1 December 2013 (2013).

[6] Berners-Lee, T., Linked Data. http://www.w3.org/DesignIssues/LinkedData.html last accessed 1 December 2013 (2013).

[7] What is Open Government Data, http://opengovernmentdata.org/ last accessed 1 December 2013 (2013).

[8] Morando, F., Legal Interoperability: Making Open (Government) Data Compatible with Businesses and Communities. http://leo.cilea.it/index.php/jlis/article/view/5461/7927 last accessed 1 December 2013 (2013).

[9] Gangadharan G.R. et al: Exploring the intellectual rights in the mashup ecosystem, Virtual Goods, Poznan University of Economics Publishing House, pp. 13-22, 2008.

[10] Governatori G. et al., Heuristics for Licenses Composition, Legal Knowledge and Information systems, JURIX 2013, IOS Press, pp. 77-86, 2013.

[11] Governatori, G., Rotolo, A., Villata, S., Gandon, F., One License to Compose Them All: A Deontic Logic Approach to Data Licensing on the Web of Data. http://www.nicta.com.au/pub?doc=7079 last accessed 1 December 2013 (2013).

[12] Rotolo, A., Villata, S., Gandon, F., A Deontic Logic Semantics for Licenses Composition in the Web of Data. http://www-sop.inria.fr/members/Serena.Villata/Resources/icail2013.pdf last accessed 1 December 2013 (2013).

[13] Brighi, R., Palmirani, M., Legal text analysis of the modification provisions: a pattern oriented approach, in: Proceedings of the 12th International Conference on Artificial Intelligence and Law, NEW YORK, NY, ACM, 2009, pp. 238–239 (atti di: International Conference on Artificial Intelligence and Law – ICAIL 09, Barcelona, Spain (2009).

[14] Biasotti A., Francesconi E., Palmirani M., Sartor G., Vitali F., Legal Informatics and Management of Legislative Documents, ROMA, Nazione Unite Press, (Global Centre for ICT in Parliament Working Paper), pp. 89 (2008).

[15] Oliveira Lima, J. A., Palmirani, M., Vitali, F., Moving in the time: An Ontology for Identifying Legal Resources, in: Computable Models of the Law: Languages, Dialogues, Games, Ontologies, HEIDELBERG, Springer, (Lecture Notes in Ar-tifical Intelligence 4884) pp. 71–85 (2008).

[16] Riveret, R., Palmirani, M., Rotolo, A., Legal Consolidation Formalised in Defeasible Logic and Based on Agents, in: Proceedings of the V Legislative XML Workshop, FLORENCE, European Press Academic Publishing, (atti di: V Legislative XML Workshop, San Domenico di Fiesole, Florence, Italy, 14–16 June 2006) , pp. 117–135 (2007).

[17] ENGAGE project. http://www.engage-project.eu, http://www.engagedata.eu last accessed 1 December 2013 (2013).

[18] Palmirani M., Joint International Doctoral (Ph.D.) Degree in Law, Science and Technology. http://www.last-jd.eu last accessed 1 December 2013 (2013).

[19] The Creative Commons, About The Licenses. http://creativecommons.org/licenses/ last accessed 1 December 2013 (2013).

[20] Patry W., How to Fix Copyright, Oxford University Press, 2012.


 

Martynas Mockus

Erasmus Mundus Research Fellow, University of Bologna, CIRSFID
Via Galliera 3, 40121 Bologna, IT
Martynas.Mockus2@unibo.it; http://www.last-jd.eu/