Jusletter IT

Legislative Time Machine – Explore Law in the Past, Present, and Future

  • Author: Grant Vergottini
  • Category: Short Articles
  • Region: USA
  • Field of law: Elektronische Rechtsetzung
  • Collection: Conference proceedings IRIS 2011
  • Citation: Grant Vergottini, Legislative Time Machine – Explore Law in the Past, Present, and Future, in: Jusletter IT 24 February 2011
Despite being consolidated into codes, determining California's law at a point in the past is often quite difficult. Likewise, predicting what it might be like in the near future is challenging. The codes released by the State of California only reflect the law at the current time. Even when the historical records of California legislation are made public, the information is often fragmented, incomplete, or ambiguous. We are developing a holistic solution to this problem in the form of a legislative time machine that can present the law as it did look at a date in the past, as it does look at the present time, as it will look in the future based on the effective statutes, and as it might look in the future based on pending legislation. This system is built on semantic web technology.

Inhaltsverzeichnis

  • 1. Prior Work
  • 2. Problem Statement
  • 3. Project Concept
  • 4. Understanding California’s Codes and Statutes
  • 5. Legislative Information Model
  • 5.1. Versioning
  • 5.2. Identifiers
  • 5.3. Referencing
  • 6. Time-Based Calculation of the Codes
  • 6.1. Examples
  • 7. Sources of Information Integrated
  • 8. Current Implementation
  • 9. Future Goals
  • 10. References

1.

Prior Work ^

[1]
The author’s prior experience comes from the field of computer-aided design (CAD). Earlier projects involved developing data management systems for electronic systems design that included sophisticated versioning and configuration management tools for design documents. In 2001, the author was introduced to the field of legislation by obtaining information about the EnAct system1 used by the Tasmanian Legislature in Australia. This led to an early prototype of the system being described in this paper. That prototype allowed the user to view the United States Constitution as it existed at any date in the past. As a result the prototype, a contract was secured to develop the XML information model for the State of California’s legislation and to build a bill drafting and publishing system. This system, built on an information model we named CAML2 , was deployed in 2004. Initially it implemented many of the core features of the EnAct system and has grown to feature more EnAct features with time. The latest update includes facilities to automate the amendment generation process.
[2]
In addition to the work for the State of California, the author has developed a standalone bill tracking system. This system, known as LegisWeb3 , is based on a second generation legislative information model structured around a semantic web (Tim Berners-Lee, 2001). The work described in this paper is an outgrowth of that system.

2.

Problem Statement ^

[3]
California has a long history of consolidating statutory law into codes. The initial four codes were established in 1872. Starting in 1929 and for the next 30 years, most of California’s statutory law was consolidated into additional codes. Since the 1950s, most legislation has been drafted as code amendments. A few subjects, primarily dealing with gambling and water rights remain unconsolidated (California Law, 2010) (E. Dotson Wilson, 2006).
[4]
While the California Legislature does not officially publish the codes, an online version has been available online since 19944 . However, this representation only includes current law and is published as basic and somewhat unstructured text. Any historical and annotative information is omitted from this representation. However, there is much historical and future information that can be derived from data in other related repositories and databases that are made available to the public by the State of California. The problem is that the fragmented nature of these data sources makes it difficult to research California’s Codes. Even the publications from third-parties anecdotally have many errors as the publishers have to rely on their own historical tracking records. This problem is large enough that several businesses dedicated to legislative research have sprung up in California.

3.

Project Concept ^

[5]
Our project publishes an open and freely available representation of all California codes, statutes, and pending legislation in the form of an integrated semantic web. Information that is fragmented in the State’s official sources is integrated into a single model so as to facilitate research. The «legislative time machine», known as Legix, allows one to explore the historical, proposed, and pending codes in a simple time-based manner.

4.

Understanding California’s Codes and Statutes ^

[6]
California’s codes are organized into up to five hierarchical heading levels: parts, titles, divisions, chapters, and articles. The arrangement of these levels varies amongst the different codes. Code sections appear at the lowest level of the hierarchy. A level, its heading, sub-levels, and all the sections contained within are together known as a «spread». The individual sections within the code hierarchy follow a consecutive numbering mechanism through the entire code. This means that all sections of any code can be referred to by number alone (multiple version of the same section are discussed later in this paper). An exception is the California Constitution which restarts the section numbering within each article.
[7]
The amending rules in California simplify the management of the codes. The amending rules allow a spread to be added or repealed. However, spreads cannot be amended or renumbered as a whole. Only individual sections and individual heading levels (the text and numbering of the level’s title) can be amended in a single action. Furthermore, with the exception of the State Constitution, sub-sections and sub-paragraphs cannot be amended. These amending rules simplify the information model by providing a clear model for versioning.

5.

Legislative Information Model ^

[8]
After learning much from the CAML, we have developed a second generation XML information model called SLIM5 . Now in its third iteration, SLIM is designed to offer a very simple and flexible model for marking up legislative information to the degree that is necessary to adequately handle legislation. Unlike CAML, SLIM is not restricted to California data. However, much like CAML, SLIM is a compound information model that relies on other models such as XHTML to provide markup in areas where an existing model is sufficient. Basic paragraphs and tables, employ XHTML for their markup.
[9]
SLIM is a model to represent three distinct types of information:

1. The metadata about the documents and the legislative process: SLIM’s model for metadata such authors, events, subject, and identity follows the conventions for the RDF (Dave Beckett, 2004) and maps easily into XML representations of RDF. This provides the basis for the semantic web upon which the system is built. Note that SLIM models not only the metadata within the document text, but also the ancillary information related to the document text such as the legislative history and all house and committee votes.

2. The hierarchical structure of codes and legislation: SLIM’s model for the hierarchical structure models bills, statutes, and codes is quite flexible. This is to allow the varying structure of legislation and codes to be modeled. A total of 18 different document types are encompassed within the SLIM model and more can be easily added.

3. The versioning and referencing mechanism: SLIM’s referencing and versioning includes a versioning model that conforms to most common legislative amending conventions. These mechanisms make it possible to manage the temporal aspects of the legislation following a rigorous configuration management regime even if the source data does not follow the same rigor in practice.
[10]
For point-in-time calculations, the most important aspects are versioning and referencing.
[11]
We are considering a migration from SLIM to AKOMA NTOSO6 (Monica Palmirani, 2010) to achieve a more broadly accepted model. The two information models are quite similar and share common goals. In the meantime, AKOMA NTOSO can be retrieved as a simple transformation.

5.1.

Versioning ^

[12]
Versioning in the information model is necessary in three places:

1. The first notion of versioning is the bills themselves as they progress through the legislative process to emerge as statutes upon passage and signing. This is simple document versioning – which can handle quirks, such as in California, where the initial version number is 99 and subsequent version numbers decrement downward.

2. The text of level headings and the sections within a level can be amended and are thus subject to versioning. The versioning model can be quite complex. Two aspects of how legislation is amended cause this complexity. For one thing, multiple versions may be effective simultaneously. Later on we will discuss the difference of how operative dates and conditions apply when there are multiple effective versions of the same section. The other issue is renumbering. When a heading or section is renumbered, its location and identity in the code are both potentially affected. Keeping track of this history of a heading or section that is renumbered is important.

3. Unlike the codes, the California Constitution may be amended at the sub-section level in addition to the section level. This practice implies a hierarchical versioning model.

5.2.

Identifiers ^

[13]
Many constructs within the legislative information model are given identifiers to ease retrieval. There are two basic notions of identifier: static and dynamic identifiers (Francisconi, 2010). Simply put, static identifiers refer to immutable objects such as a specific version of a code section, while dynamic identifiers refer to time-varying objects such as the code section as a whole – independent of time. In practice, this means that static identifiers refer to specific versions of objects while dynamic identifiers refer to the objects as a whole.
[14]
Identifiers in the Legix system are always constructed as URNs (K. Sollins (MIT/LCS), 1994) and thus are universally unique rather than simply being unique within the containing document itself. This is important as many aspects of legislative information can appear in multiple synthesized documents at the same time. For instance, think of the consolidated code as a synthesized «configuration» of sections collected from the statutes. Thus, the code becomes a complex collection of pointers. While California does not think of the Codes as synthesized configurations, managing them as such facilitates much of the point-in-time computation.
[15]
Identifiers are computed in a number of ways. Identifiers are usually human readable and convey useful information about the item being identified. In the case of versions of sections in the codes, this useful information includes the origin of the version with reference to the originating statute. Within the official databases at the State of California, versions of sections are identified by a string of text known as the «LAF Text» (Last Amended Form). This string of text identifies the statute and bill section number that the version being identified originated from. The Legix system uses a variant of this technique. When LAF Text is available, it is normalized into a URN form and used as the identifier for the version.
[16]
In some cases, the identifier includes information about the location of the item in the code hierarchy. When renumbering moves the item, the identifier is adjusted and historical information about the prior identifier is kept. In cases where information is missing or not available identifiers are globally unique identifiers expressed in the form of a URN.

5.3.

Referencing ^

[17]
The key to the point-in-time system is referencing. A well thought out referencing system simplifies the composition of information coming from multiple sources. The referencing mechanism is based on the FRBR (Functional Requirements for Bibliographical Records) (Francisconi, 2010) ( IFLA Study Group on the Functional Requirements for Bibliographic Records, 2009) (Tillett, Barbara, 2003) hierarchy, but is enhanced to allow various types of references to be made.
[18]
References to information known within the Legix system are always expressed in the form of a URN (Steve DeRose, 2002) with an optional fraction identifier or xpointer expression. References to external information are always expressed as URLs with optional fraction identifiers. For Legix-based references, it is possible to refer to individual actual documents and to synthetic composed documents. References may be to whole documents or to fragments of documents using either fraction identifiers (i.e. #{id}) or xpointer expressions (i.e. #xpointer({xpath})).
[19]
All documents within Legix are identified by URNs that follow the FRBR conventions. In FRBR, there are four levels in the identification hierarchy. We use three of those levels:

1. Work: In Legix, the work level identifies the basic document such as a bill, a statute, a code, or a hierarchical level within a code.

2. Expression: In Legix, the expression level identifies various facets (or aspects) of the document. This is actually the heart of the system. The «expression» can identify a metadata facet of the document such as the status, history, or related votes as well as a data facet such as the text of the document. While the default variant of any facet retrieved is usually the most recent version of that facet, it is possible to specify a prior variant using either versioning nomenclature or a specific date. For instance, the text can be specified to a specific version of the document or the document as it existed at a point in time. As already mentioned, documents in California are identified by a version number starting at 99 and decrementing downwards. So while an expression of «doc» will retrieve the latest version of the text, «doc-v99» will retrieve the earliest version. Similarly, votes are identified by a date and sequence number. Therefore, «vote» will retrieve the most recent vote on the bill but «vote-d20090512-1» will retrieve the first vote on May 12th of 2009.

3. Manifestation: Today, SLIM is the native format for all Legix documents. However, a number of alternate representations (or manifestations) can be retrieved using transformations specified at the end of the URN. Text intended to be read by a reader is transformed into XHTML. Metadata meant to be processed by semantic web engines can be extracted as RDF XML. Lists of documents can be transformed into RSS or ATOM feeds for consumption by feed readers. Also, all query results can be retrieved as ATOM feeds with the additional A9 query extensions.

6.

Time-Based Calculation of the Codes ^

[20]
When it comes to time traveling California’s Codes, there are three questions one can ask:
1. What was the code at a specific date in the past?
2. What laws have been passed that amends the code and become effective in the future?
3. What laws have been proposed that may change the code?
[21]
Answering these questions isn’t always easy. The reason is that there isn’t always a single answer to any of these questions. There may be a difference between when a law becomes effective and when it becomes operative. A law that is effective can only be enforced when it is operative. Typically a law becomes operative when it becomes effective. In California, laws become effective on January 1 of the following year, or if they are urgency bills, immediately upon the governor’s approval. However, it is possible for a statute to specify an operative date that differs from the effective date. This operative date may be in the future or it may be in the past if the law is to be retroactive. Additionally, operative conditions may also be specified which spell out conditions under which the law is to be operative. Operative dates and operative conditions create a number of complexities to deal with.
[22]
First, it is possible that multiple versions of a single section of code be effective at the same time – parallel versions. One of the versions will be operative at any one time. It is possible that the law would revert to the alternate version when an operative condition is no longer satisfied or an operative end date occurs.
[23]
Second, there is considerable complexity in instances where there exist multiple effective spreads which internally contain multiple effective sub-spreads or sections. This creates the situation in which determining what law is effective and operative needs to be hierarchically computed. While this situation is very rare, it does occur. A current example of this situation (in 2011) is found in the California Penal Code7 .
[24]
Third, there is little formalization in how operative dates or conditions are specified within the legislation. While the rules for effective dates are clear and precise, this is not the case for operative dates. This makes automation of operative dates and conditions troublesome. Often, pattern matching text in the last «miscellaneous» bill section of the statutes can be used to tease out the operative dates, but this approach is quite hit-or-miss. Determining how to represent or calculate operative conditions is an even larger problem. Today, the Legix system tries to find operative dates, but does not apply operative dates and conditions. The legislative model has facilities that allow the information that can be found to be represented.

6.1.

Examples ^

[25]
As stated earlier, the key to the Legix system is referencing. This referencing mechanism dictates how items are composed and retrieved. All California references are of the form:
[26]
Here are a few examples:
[27]
urn:legix:us-ca:20090ab0011:doc:html – This is simple reference to the latest version of 2009 Assembly Bill 11 in an HTML format. The work specified is «20090ab0011» which decodes as – 2009 (2009-2010 session) – 0 (regular session) – ab (assembly bill) – 0011 (bill number 11). The expression is «doc» which simply decodes as the latest version of the document text. The manifestation is «html» which requests transformation into HTML.
[28]
urn:legix:us-ca:20090ab0011:doc-v99:rdf – This a variation of the previous reference, this time asking for the metadata found within the text of the original introduced version. The expression asks for «doc-v99» which decodes as – doc (document text) – v99 (introduced version). As the metadata is desired, the information is to be manifested as RDF XML.
[29]
urn:legix:us-ca:20090ab0011:history:html – This variation requests the «history» aspect of the aforementioned bill in an HTML format.
[30]
urn:legix:us-ca:evid:doc:html#xpointer(//slim:Section[@num=«100»]) – This is a reference to the current version of section 100 of the Evidence Code. The work specified is «evid» which is the abbreviation for the Evidence Code. The expression requested is the current version of the text. The manifestation requested is a transformation into HTML. The appended xpointer requests Section 100 of the SLIM-based document. It is important to note that the xpointer expression is evaluated on the original SLIM document prior to transformation into HTML.
[31]
urn:legix:us-ca:evid:doc-d20090512:html#(//slim:Level[@type=«chapter» and @num=«5»]) – This is a reference to the spread of Chapter 5 of the Evidence Code as was effective on May 12 of 2009. All sections contained within Chapter 5 will be returned, calculated as was effective on the date in question. In cases where multiple versions of a section are effective on a particular date, all effective versions are returned with «operative codes» in the metadata as attributes to specify the dates and conditions upon which each is operative.

7.

Sources of Information Integrated ^

[32]
California releases all the information that we use to the public via the leginfo legislative portal site – albeit in a much more fragmented form. We integrate the following data sources:

1. The current text of the codes: This is published in the hierarchical structure of the codes without much structure and without any historical notes or annotations.

2. Prior year statutes dating from 1993 onwards: These contain much of the information necessary to derive earlier text and structure of the codes.

3. Current session statutes: These contain information regarding pending statutory law.

4. Current session legislation coupled with a publication known as TOSA8 . These documents contain proposed changes to the law.
[33]
In addition, a nightly update provides historical notes necessary to reconstruct the history of the codes when the statutes are not available electronically – currently anything prior to 1993.

8.

Current Implementation ^

[34]
Currently, Legix is implemented with a flat file repository of XML documents that are indexed using a commercial search engine. Metadata contained within the documents is cached in a relational database for easy lookup when composing request responses. The goal however is to move to a full XML repository that provides inherent XML indexing capabilities and metadata management when such a repository proves viable.
[35]
We are also looking to reconstruct earlier legislative history in Legix. Today we are limited to statutes going as far back as 1993 and metadata reaching back to 1989. However, earlier bill and statute texts do exist and will become available electronically in the near future.

9.

Future Goals ^

[36]
Our intent is to develop Legix as a proof-of-concept demonstrator of the power of using this XML-based approach to manage legislation. We would also like to open the infrastructure up such that is can become a platform for legislation in California and beyond. We would also like to re-implement our own LegisWeb bill tracking system on top of this infrastructure. The technologies we have chosen compatible with new developments in the semantic web.

10.

References ^

IFLA Study Group on the Functional Requirements for Bibliographic Records. (2009, Feb).Functional Requirements for Bibliographic Records. Retrieved January 14, 2011, from IFLA.org:www.ifla.org/en/publications/functional-requirements-for-bibliographic-records

California Law. (2010, December). Retrieved January 14, 2011, from Wikipedia:http://en.wikipedia.org/wiki/California_law

Dave Beckett, B.M. (2004, February 10).RDF/XML Syntax Specification (Revised). Retrieved January 14, 2011, from W3C:www.w3.org/TR/REC-rdf-syntax/

E. Dotson Wilson, B.S. (2006).California's Legislature. Sacramento: California's Office of State Publishing.

Francisconi, E. (2010). Naming Legislative Resources. In E.F. Mariangela Biasiotti,Managing Legal Resources in the Semantic Web. University of Bologna.

K. Sollins (MIT/LCS), L.M. (1994, December).Functional Requirements for Uniform Resource Names. Retrieved January 14, 2011, from IETF.org:http://tools.ietf.org/html/rfc1737

Monica Palmirani, F.V. (2010). Akoma Ntoso for Legal Documents. In E.F. Mariangela Biasiotti,Managing Legal Resources in the Semantic Web. University of Bologna.

Steve DeRose, R.D. (2002, August 16).XML Pointer Language (XPointer). Retrieved January 14, 2011, from W3C:www.w3.org/TR/xptr/

Tillett, Barbara. (2003, September).What is FRBR? A Conceptual Model for the Bibliographic Universe. Retrieved January 14, 2011, from Library of Congress:www.loc.gov/cds/downloads/FRBR.PDF

Tim Berners-Lee, J.H. (2001). The Semantic Web.Scientific American Magazine .



Grant Vergottini, Chief Architect, Xcential Group, LLC., 841 Second Street, Encinitas, CA 92124, U.S.A.,grant.vergottini@xcential.com


  1. 1 www.thelaw.tas.gov.au/about/index.w3p .
  2. 2 CAML – California Markup Language.
  3. 3 www.legisweb.com .
  4. 4 http://leginfo.ca.gov .
  5. 5 SLIM – Simplified Legislative Information Model.
  6. 6 http://akomantoso.org/
  7. 7 Section 12021.5 of the California Penal Code, effective as of January 2011.
  8. 8 TOSA – Table of Sections Affected.