1.
Introduction ^
Legal data analysis is still insufficiently developed. Experts do not use such tools in order to generate synthesis tools like commentaries. This paper describes the potential, the state of the art and shows existing strengths and lacunae.
2.
Legal system & textual representation ^
3.
Goals of legal data science: creating 8 views of the legal system ^
The goal of legal data science is to complement the existing methodology of law with the new computer-based methods, and to bring it into a theoretical framework. For a long time, work on legal knowledge has been only in the focus of legal theory and information science. Only since the late 1950s – with the start of research on legal information retrieval – appropriate progress can be noted (term retrieval, text retrieval, metadata, citations, search technologies, user interface, telecommunications, etc.). The theory distinguishes 8 views – text view, metadata view, network view, user view, logic view, ontological view, visualisation view and argumentation view.
In the last decades, text corpora with search engines have become the standard of knowledge representation but are now considered as insufficient. The «views theory» of Lu and Conrad6 – documents view, annotation view, citation view and user view describes this enlargement and constitute a basis for this work. However, it should be extended by four other views – legal logic, legal ontologies, legal visualisation and legal argumentation. Further, the works of Sowa7, Fiedler8, Zeleznikow/Hunter9, the authors of the book collection of Yearwood/Stranieri10 as well as our own work on knowledge representation of the law11 have been taken into account.
Knowledge representation of law is not just about the documentation itself; each view represents further insights on the law itself. The use of all available media and methods – language, meta-knowledge, visualisation, structure, mathematics and statistics, logic, ontologies, formal structures, etc. – implies also that legal «knowledge sources» must be integrated into this knowledge structure, however, keeping the holistic and harmonious picture. Language, visualisation, structure, etc. are also essential methods of human thinking itself. The principle applies that more quantity of methods and analysis ultimately results in higher quality. The more comprehensively all media, factors and methods are used, the better is the structural analysis of the law.
The description view contains the metadata of the law (Lu/Conrad: annotation view). Information science has developed a good methodology to describe materials in order to classify and summarize important content or extract relevant information. The quality of this description is highly variable (e.g. the high-level West’s Key Number System14 or the formidable CELEX metadata system15). In general, metadata is only useful for those with sufficient familiarity. However, AI & law methods allow intelligent use of metadata, e.g. setting the proper context of the communication, e.g. persons, roles, purpose, topic etc. without much user input.
The citation network (Lu/Conrad: citation network view) describes the relations between the documents. It has always been a key topic of legal documentation and is and will remain indispensable. Formally, it is documented if a document cites others ((out-bound (cited) sources) and if it is quoted by others (in-bound (citing) sources)). It is important to specify citations, e.g. referring also to the structuring elements of a document, e.g. articles, sections, paragraphs, lists, etc. The main types of citations are: basis of the act, cited acts in the document, citations in the operative part of the judgment, document amending other documents, document is amended by other acts, etc.16.
The conceptual representation can draw on the experience of «concept jurisprudence» (Begriffsjurisprudenz) that has intensively dealt with the conceptualization and systematisation and was dominant in the 19th century. The conceptual jurisprudence is a tool of thought for a better understanding of the legal system. The legal ontology, however, brings the automation option and a much more powerful systemisation. Since the 1990s, many legal ontologies have been developed21; now a strong standard exists with LRI Core, LKIF22 and LegalRuleML23. Much research has been reused for a more convenient representation of legal knowledge24, 25. The respective elements of the concepts have to be transposed into a computer-readable structure, e.g. header, definition, relations (upper/lower term, antonym, related term, synonym, homonym, polysem etc.), pre-subsumption (relation between the concept of law and facts element) and comments. Facts, e.g. a world ontology, can be taken from projects like Cyc. This project is to provide automated applications with a formal knowledge base of «common sense» knowledge. Currently, more than three million facts and rules are formally represented in the Cyc knowledge base, provided as OpenCyc.26 A major advantage of an ontology is the easier representation of relations between facts and legal concepts in a form of pre-subsumption.
4.
Methodology ^
For producing this enormous structural analysis, a lot of resources are needed in case of manual work, e.g. for searching, reading, interpreting and understanding of the legal text corpus. The standard methodology is to locate, read, interpret and understand the «legal stuff», taking into account the legal interpretation and reasoning methods in a dynamic world of concepts. Other elements are also considered, e.g. social context, legal authorities, sophisticated methods of interpretation etc. At present, financial constraints restrict deep analysis to areas of legal disputes or theoretical interest. Stronger use of AI & law can be the solution to this knowledge acquisition problem.
AI & law methodology allows a representation according to the different needs of the users: Handbook, Dynamic Electronic Legal Commentary (DynELC), citizen information or case-related synthesis. AI & law methods can produce information elements for a comprehensive picture. Starting from the text corpus in XML format, legal data science methods drawn from many fields within the broad areas of mathematics, statistics, and information technology: pattern recognition and learning, machine learning, probability models, statistical learning, visualization, data warehousing, etc. AI & law has developed and refined these methods for legal output. It has to be noted that the required accuracy for practical use has very often not been achieved yet.
4.1.
Information elements ^
Temporal relations: The automatic generation of temporal relationships analyses the temporal dimensions: existence, force, efficacy and applicability. It is standard in information retrieval but not sufficiently implemented in the AI & law environment. A concept exists with the work of Scharf33.
Document description and summary: A lexical ontology, e.g. a thesaurus, remains critical for a condensed short content description. Multilingual versions like Eurovoc or the Swiss legal thesaurus are very helpful for first handling of documents in a foreign language. Much work has been done already in the 1990es (e.g. FLEXICON and KONTERM, see for an analysis34).
User assessment: Modern search engines can (semi)automatically generate user’s perspectives.36
4.2.
AI & law methods ^
The standard representation of documents in XML format has to be improved by use of a standard part-of-speech tagger and a parser. Thus, pattern matching, machine learning and natural language processing37 can be strongly improved for document categorisation, document segmentation, citation analysis38, temporal relations39 and thesaurus generation.
The output is presented in a pre-defined formal model (e.g. types of citations with relevance ranking). Further, this semi-automatic approach delivers also relevant feedback from authors and users for improving the knowledge base40. The research on the quality of legislation is also relevant41. A strong support can be expected from the study of natural language features42.
5.
Conclusions ^
- 1 E. Schweighofer, From Information Retrieval and Artificial Intelligence to Legal Data Science. In Proceedings of the ICAIL MWAIL2015 Workshop, pages 13–24, OCG, Vienna 2015.
- 2 L. Allen, Symbolic Logic: A Razor-Edged Tool for Drafting and Interpreting Legal Documents. In The Yale Law Journal 66, 833–879, 1957.
- 3 R. v. Jehring, Der Kampf ums Recht, Vortrag (The fight for law), Vienna 1872. Schutterwald/Baden 1997, 1872.
- 4 E. Schweighofer, Indexing as an ontological-based support for legal reasoning. In: J. Yearwood and A. Stranieri (eds.), Technologies for Supporting Reasoning Communities and Collaborative Decision Making: Cooperative Approaches, pages 213–236, IGI Global Publishers, Hershey 2011.
- 5 H. Kelsen, Reine Rechtslehre (Pure Theory of Law), 2nd ed., pages 196 et seq., 1960.
- 6 Q. Lu, J. Conrad, Next generation legal search – it’s already there. In: Cornell Legal Information Institute, VoxPopuLII, https://blog.law.cornell.edu/voxpop/2013/03/28/next-generation-legal-search-its-already-here/, 2013 (last accessed: 12 January 2016).
- 7 J. F. Sowa, Knowledge representation: logical, philosophical, and computational foundations. Course Technology, Boston 2000.
- 8 H. Fiedler, Modell und Modellbildung als Themen der juristischen Methodenlehre (Model and modeling as subjects of legal methodology). In: Proceedings of the International Legal Informatics Conference IRIS2006, pages 275–281, OCG, Vienna 2006.
- 9 J. Zeleznikow, D. Hunter, Building Intelligent Legal Information Systems, Representation and Reasoning in Law, Computer Law Series 13, Kluwer, Deventer 1994.
- 10 J. Yearwood and A. Stranieri (eds.), Technologies for Supporting Reasoning Communities and Collaborative Decision Making: Cooperative Approaches, IGI Global Publishers, Hershey 2011.
- 11 See footnote 4.
- 12 The ICAIL proceedings are published with ACM Publishers, New York and available in the ACM Digital Library. Latest edition: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law, Rome 2015: http://dl.acm.org/citation.cfm?id=2514601 (last accessed: 12 January 2016).
- 13 Website listing JURIX proceedings: http://jurix.nl/proceedings/ (last accessed: 12 January 2016).
- 14 Westlaw, West Key Number System® on WestlawNext®: thomsonreuters.com/pdf/wln2/L-374484.pdf (last accessed: 12 September 2015).
- 15 E. Schweighofer, Wissensrepräsentation in Information Retrieval-Systemen am Beispiel des EU-Rechts, Dissertation, Universität Wien 1995, (knowledge representation in information retrieval systems on the example of EU law, PhD thesis, University of Vienna 1995, published in extended version WUV publishers, Vienna 2000).
- 16 A. Berger, The development of references in the legislative documentation, Verlag Dokumentation, Pullach near Munich 1971.
- 17 J. Conrad, presentation (Skype) at the OCG digital2014 Conference in November 2014 in Vienna.
- 18 J. Scharf, Wissensrepräsentation und automatisierte Entscheidungsfindung am Beispiel des KOVG (Knowledge representation and automated decision making), PhD thesis, University of Vienna 2015.
- 19 Idea of Erich Schweighofer, based on relevant thoughts of C. Reed, You Talkin’ to Me?. In: Jon Bing, en hyllest, a tribute, pages 154–171, 2014.
- 20 A prime example is the Australian company SoftLaw; this was subsequently acquired by Oracle; the application itself is available as Oracle Business Rules.
- 21 G. Sator, P. Casanovas, M. A. Biasiotti, M. Fernández-Barrera, Approaches to Legal Ontologies: Theories, Domains, Methodologies, Springer, Dordrecht/Heidelberg/London/New York 2011.
- 22 R. Hoekstra, J. Breuker, M. De Bello, A. Boer, The LKIF Core Ontology of Basic Legal Concepts. In: P. Casanovas, M.A. Biasiotti, E. Francesconi, M.T. Sagri (eds.) Proceedings of LOAIT 07, II. Workshop on Legal Ontologies and Artificial Intelligence Techniques, pages 43–64:
- 23 Website OASIS LegalRuleML, https://www.oasis-open.org/committees/ tc_home.php?wg_abbrev=legalruleml (lastly accessed: 12 January 2016).
- 24 N. Casellas., E. Francesconi., R. Hoekstra, S. Montemagni (eds.), Proceedings of LOAIT 2009, 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with 2nd Workshop on Semantic Processing of Legal Text. IOT Series, Barcelona 2009.
- 25 P. Casanovas, M.A. Biasiotti, E. Francesconi, M.T. Sagri (eds.) Proceedings of LOAIT 07, II. Workshop on Legal Ontologies and Artificial Intelligence Techniques: http://www.ittig.cnr.it/loait/LOAIT07-Proceedings.pdf, 2007.
- 26 Wikipedia EN, Cyc, https://en.wikipedia.org/wiki/Cyc (last accessed: 12 January 2016).
- 27 T. Berners-Lee et al., The Semantic Web, Scientific American 284 (5), pages 34–53, 2001.
- 28 C. Brunschwig, Multisensory Law and Legal Informatics – A Comparison of How these Legal Disciplines Relate to Visual Law. In: A. Geist, C. R. Brunschwig, F. Lachmayer, G. Schefbeck (eds.), Strukturierung der Juristischen Semanik – Structuring Legal Semantics, Festschrift für Erich Schweighofer, pages 573–668, Editions Weblaw, Bern 2011.
- 29 K. D. Ashley, Modeling Legal Argument. Reasoning with Cases and Hypotheticals, MIT Press, Cambridge 1990.
- 30 R. Alexy, Theory der juristischen Argumentation (Theory of Legal Argumentation), Frankfurt am Main 1983.
- 31 J. K. Vinjumur, Evaluating Expertise and Sample Bias Effects for Privilege Classification in E-Discovery. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law ICAIL 2015, pages 119–128, ACM, New York 2015.
- 32 Australasian Legal Information Institute AustLII: http://www.austlii.edu.au/ (lastly accessed: 12 January 2016).
- 33 See footnote 18.
- 34 E. Schweighofer, Legal Knowledge Representation. Kluwer Law International 1999.
- 35 P. Wahlgren, Legal Risk Analysis. A Proactive Legal Method, Stockholm 2013.
- 36 Q. Lu, J. G. Conrad, M. Dahn, W. M. Keenan, Systems, methods, and interfaces for extending legal search results. US States Patent No. 9,177,050 USPTO Patent Full-Text and Image Database: http://patft.uspto.gov (lastly accessed: 5 February 2016), 2015.
- 37 See foonote 34.
- 38 E. Schweighofer, D. Scheithauer, The Automatic Generation of Hypertext Links in Legal Documents. In Proceedings of Database and Expert Systems Applications, 7th International Conference, DEXA’96, Zurich 1996, Lecture Notes in Computer Science 1134, pages 889–898, Springer, Berlin 1996.
- 39 See footnote 18.
- 40 See Footnotes 4 and 34.
- 41 M. Curtotti et al., Machine Learning for Readability of Legislative Sentences, In: Proceedings of of the Fifteenth International Conference on Artificial Intelligence and Law, ICAIL2015, pages 53–62, ACM, New York 2015.
- 42 Bernhard Waltl, Florian Matthes, Supporting the Legal Subsumption Process: Determination of Concreteness and Abstractness in German Laws using Lexical Knowledge, In: Proceedings of MWAIL2015, pages 75–88, OCG, Vienna 2015.
- 43 H.-G. Fill, Bridging Pictorial and Model-based Creation of Legal Visualizations: The PICTMOD Method. In: E. Schweighofer, F. Kummer, W. Hötzendorfer (eds.), Proceedings of International Legal Informatics Conference IRIS2015, pages 443–447, OCG, Vienna 2015.
- 44 A. Geist, Relevanzsortierung bei Rechtsdatenbanken (relevance ranking in legal databases), Ph.D thesis (to be submitted), University of Vienna 2016.
- 45 S. Dayal, M. Harmer, P. Johnson, D. Mead, Beyond Knowledge Representation: Commercial Uses for Legal Knowledge Bases. In: Proceedings of the Fourth International Conference on Artificial Intelligence and Law ICAIL1993, pages 167–174, ACM, New York 2015.
- 46 M.D. Islam, G. Governatori, RuleOMS: A Rule-Based Online Management System. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law ICAIL 2015, pages 187–191, ACM, New York 2015.
- 47 E.g. J. Park, C. Blake, C. Cardie,. Toward Machine-assisted Participation in eRulemaking: An Argumentation Model of Evaluability. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law, ICAIL2015, pages 206–210, ACM, New York, 2015; M. Dragoni, G. Governatori, S. Villata, Towards Automated Rules Generation from Natural Language Legal Texts. In: ICAIL 2015 Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts, 2015; G. Governatori, Thou Shalt is not You Will. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law ICAIL 2015, pages 63–68, ACM, New York 2015; L. Al-Abdulkarin, K. Atkinson, T. Bench-Capon, Factors, Issues and Values: Revisiting Reasoning with Cases. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law ICAIL 2015, pages 3–12, ACM, New York 2015.
- 48 J. Scharf, rOWLer – A hybrid rule engine for legal reasoning. In: E. Schweighofer, F. Kummer, W. Hötzendorfer (eds.), Proceedings of International Legal Informatics Conference IRIS2015, pages 155–162, OCG, Vienna 2015.