1.1.
Daydream believers ^
«generated and deployed 97.42% of all possible useful texts of ten to 400 words in length (the remaining 2.58% has already been deployed in the last 2000 years).»
«[…] Qentis is responsible for over 97 percent of all feasible text that can be created in English, German, French, Russian, Polish, Portuguese, Italian and Spanish. Qentis aims to create 99.2% percent of all target-length Internet text, making it by far the largest copyright holder in the world.»
«The Qentis Corporation works with a powerful network of international law firms that represent our clients. The law firms notify authors, bloggers, news corporations, publishers and website owners whenever we feel they have breached the copyrights of our clients. As Qentis approaches 100 percent of content generation, all content owners will eventually have to pay royalties to our clients or face massive lawsuits.»3
If these claims were credible – and Qentis claims in particular to have generated the lyrics to Lady Gaga’s «Applause» four years before she did – then the consequences would be dramatic. No genuinely «new» works could be created in the future because everything that can possibly be expressed in the languages covered by Qentis' technology has already been said. Nor is this approach restricted to text. Musical works follow essentially the same idea. Even more ambitious is Qentis' claim to have generated images and even 3D objects. It claims that, since 2007, it has generated 3.23% of all possible images with dimensions up to 1000×800 pixels. By the end of 2020, its stated aim is to have generated all conceivable images.
1.2.
A barrel full of monkeys ^
Qentis is based on a simple idea: although the recursive nature of natural language allows for an infinite number of sentence constructions (Chomsky 2002), if we limit the permissible length of a piece of text then only a finite set of strings or texts can be generated. This idea has been famously generalised in the «infinite monkey theorem», first used explicitly by the French mathematician Émile Borel (Borel 1913). The well-known thought experiment illustrates a special case of the Borel–Cantelli lemma in probability theory that is useful for the proof of the Law of Large Numbers: a thousand monkeys, hitting keys at random on a typewriter keyboard for an infinite amount of time, will «almost surely» produce the complete works of William Shakespeare at some point. In some variations of the theorem, larger numbers of monkeys are used (also approaching infinite monkeys), but a single monkey with infinite time is all that is needed. For illustration, if we assume that a typewriter has 50 keys, and that every one of them has the same chance of being pressed by the monkey, then after 35,977,876,623 hits (35,977,876,618 + 5), there will be a 90% chance that the monkey has typed the word «Hamlet». The longer the monkey types, the higher the probability becomes, approaching 1 in the case of an infinite number of keystrokes. Paradoxically, this does not mean that it is impossible for the monkey not to type «Hamlet», even with infinite time – it could, for example, hit the letter «g» for all eternity. But the nature of infinity is such that the probability of it eventually typing «Hamlet» is nevertheless «almost surely». The same holds true for any string of letters of finite length, including the entirety of the play Hamlet. From a legal perspective, this also means that the owner of the monkey’s output could prove, for probabilistic reasons – that is, not just on the balance of probabilities but beyond a reasonable doubt – that a copy of Hamlet is in his possession, without any need to actually check the output of the typing simians. This idea is key to Qentis' «business model».
«He who believes this may as well believe that if a great quantity of the one-and-twenty letters, composed either of gold or any other matter, were thrown upon the ground, they would fall into such order as legibly to form the Annals of Ennius. I doubt whether fortune could make a single verse of them.» (Cicero 1961)
«Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat’s theorem, […], […] Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves — shelves that obliterate the day and on which chaos lies — ever reward them with a tolerable page.»
While the infinite monkey theorem originates in mathematics and centres around the mathematical concept of infinity, many researchers have speculated if it is possible to construct finite but physically realistic models. Richard Dawkins, for example, uses a similar idea in The Blind Watchmaker to illustrate the power of evolutionary processes. Dawkins' «Methinks it is a weasel» program starts from a randomly typed string of symbols, the «parent». New «generations» are created by replacing letters randomly, keeping only those letters that match the «methinks» target phrase from Hamlet.
Dawkins' approach remains a purely abstract realisation of the monkey theorem for finite applications. Unlike Qentis, it too requires «selection» towards a predefined goal, which speeds up the process of generation of meaningful strings by several orders of magnitude. A very similar approach by Jesse Adamson, which uses Amazon’s EC2 cloud computing system to increase the number of monkeys into the millions, has already typed up the whole of A Lover’s Complaint.5 The system generates random strings of nine characters which are then matched against Shakespeare’s oeuvre and kept when match is found. Both Dawkins' and Adamson’s approaches thus require human intervention and ingenuity to model the «information creating» aspect of natural selection. In this crucial respect they differ from Qentis' «brute force» approach. Below we discuss the copyright status of the works these approaches create.
2.
Monkey business ^
In Borges’ novel, even though they have all the library at their disposal, including all correct future predictions, its inhabitants fail to benefit from the works in it – either because they cannot find the ones they need, or they cannot determine which of the texts is a truthful account of the external world. Both problems would also arise for Qentis, if its business model were to produce content that someone actually wants to read or find useful. Since the overwhelming majority of the works created by their algorithm would be unintelligible, it would take many lifetimes for a user to find the right text for a given task, even if we could know a priori that it must be somewhere in the collection. But Qentis' business model is different. It does not claim to produce useful works. Rather, all the computer generated work is sold in bulk to commercial outfits aiming to use it for solely for copyright litigation. They need not check so much as a single document in their collection: probability theory means that a given work is «almost surely» in their collection, and this is all that is needed to discharge the civil burden of proof that requires them to have acquired the rights to that work. Qentis is the platonic ideal of the copyright troll: a right holder who cannot even in theory access the work they own, let alone use it for a meaningful propose, but who nevertheless can use its knowledge of its ownership to extract monetary benefit from its purely passive ownership.8 This is the political message the Qentis thought experiment delivers: copyright is now so far removed from the notion of societal usefulness that, at least in principle, something like Qentis has become conceivable.
A greater problem for Qentis is directly related to copyright law. Even if Qentis were to own the copyright in all possible future works, this alone does not of course mean that they could prevent people from writing and selling the product of their own creative work. Unlike patent law, copyright law permits parallel creation, so in addition to have the copyright in all possible works, they would have to show in each individual instance that the other author actively copied from their database. But as Borges’ story teaches us, it is not possible in the library of Babel to find any specific work. The noise always drowns out the signal. For the infinite monkey theorem, we can rigorously prove this using probability theory. Thus, just as pure mathematics tells us a priori and beyond reasonable doubt that Qentis has indeed created a copy of the specific work that is the subject of litigation, the defendant can also prove a priori, for the same reasons and beyond a reasonable doubt, that he never saw this work in Qentis'database, since it would take many times the lifetime of the universe to locate it there.
«But the interest of the suggestion lies in the revelation of the mental state of the person who can identify the «work» of Shakespeare with the series of letters printed on the pages of a book bearing that phrase as its title: and thinks, if he can be said to think at all, that an archaeologist of 10,000 years hence, recovering a compete text of Shakespeare from the sands of Egypt but unable to read a single word of English would possess Shakespeare’s dramatic and poetic works.»
At least for US law, therefore, the news is bad for both infinite and individual monkeys. We note however that the example given by the US Copyright Office mentions outputs that «lack discernible patterns». However, the infinite monkey theorem states that random processes do produce discernible patterns, and Dawkins' application in particular aims to dispel the lay perception that random processes are incapable of creating order. Furthermore, his implementation of the theorem does involve human intervention to add an element of «natural selection» to the process. Is this sufficient to argue that this is not a process without any human intervention?
3.
Conclusion: throwing a monkey wrench into copyright’s machinery ^
The idea of computer creativity and computer generated works is not new (see for example Gelernter 1994 and Schank and Cleary 1994). Questions regarding the copyright status of such works are almost as old as the first prototypes of computer creativity (for a discussion see Bridy 2012). What has changed over recent years, however, is that we now have viable business models that are able to utilise computer generated works. While earlier works aimed at «high art», contemporary applications focus on lesser examples of human creativity; the «small coinage» of German IP law. Short, technical articles and notes for online publication, data-driven journalism and summarisation services are most likely to avail themselves of this technology. A typical application could harvest customer reviews about their holiday in city X from the Internet and rewrite the information into a Wikipedia entry on X, or take business data and statistics and turn these into a report for shareholders. Services such as Narrative Science12 or Automated Insight13 focus on this segment of the market (for a scientific discussion see, for example, Lee et al 2012). Qentis' most realistic aspect is its restriction to texts of 400 words or fewer – it is indeed this size of article that is most likely to become generable by computers in the near future. This technology threatens established business models in the creative economy and will devalue certain forms of human creativity.14 It also disrupts the legal regulatory machinery. As we have seen, core concepts of copyright law fail to express adequately the issues that are at stake. The focus in the academic debate has been on the concept of «author», but our discussion indicates that the «idea vs expression» dichotomy is as least as problematic.
Unlike Qentis, these systems will not simply generate random texts, but learn from and incorporate text written by others. The need for such an approach was also recognized in the analysis of the infinite monkey theorem. Qentis' n-gram approach fails for the same reason that Dawkins' monkeys fail to be a proper analogue to evolution. In either case, only one letter at a time is typed/changed, independently of the other letters and without an evaluation of past experience. Hugh Petrie argues that, similarly, the evolution of written ideas requires to follow biological evolution in accounting for this historical context, and argues that we should equip the monkey with not just a typewriter, but what we would today call an expert system that incorporates «whole Elizabethan sentences and thoughts. It would have to include Elizabethan beliefs about human action patterns and the causes, Elizabethan morality and science, and linguistic patterns for expressing these» (Petrie 1981, p.132). This approach comes much closer to what working text generation systems attempt. They combine rules of composition distilled from past experience with texts and text fragments written by others. From a copyright perspective, we therefore face not just one but two questions: who owns the IP in the computer generated work and were they permitted to use the work of others to generate it? The second question does not apply to Qentis' probabilistic ex nihilo creation, but it is also not a straightforward question of impermissible copying: the input texts are not reproduced in any recognizable form; rather, their logical structure is analyzed and through purely mechanical ways reconstructed beyond recognition. The process is similar to that of a human reader who grasps the idea underlying a text and then expresses it entirely in her own words – an unproblematic and legally permissible process. But the distinction between idea and expression, per Searle or Collingwood, does not apply to computers, demonstrating how these technologies disrupt established regulatory ideas.
Secondly, we remember how Qentis fails as a business because the readable works it generates are drowned out by excess noise. Dawkins addresses this in his simulation of the infinite monkey theorem by introducing a pre-defined goal. Applications of computer generated text will also typically require a user to define permissible outcomes and to act as the equivalent of «natural selection». This insight allows us to think of the reader as a co-creator of computer generated works, an idea proposed from the perspective of literary theory in Garcia’s discussion of the infinite monkey theorem (Garcia 1996, pp. 122–125). Selecting goals, finding the right answer amongst the noise and acting as «natural selection» may well fulfil the minimum requirement of «creative human input» required by US law, though the type of creativity is different from a traditional writer and includes creative search strategies that are absent in Qentis.
4.
References ^
Borel, Émile, Mécanique Statistique et Irréversibilité. Journal Phys. 5e série, 3, (1913), pp 189–196.
Borges, Jorge Luis, The Total Library, Penguin (2007).
Bridy, Annemarie, Coding Creativity: Copyright and the Artificially Intelligent Author, Stan. Tech. L. Rev. (2012), pp 1–28.
Chomsky, Noam, Syntactic structures. Walter de Gruyter, (2002).
Cicero, Marcus Tullius, De natura deorum, Academica. Vol. 268. Harvard University Press, (1961).
Collingwood, Robin George, The Principles of Art, Galaxy Books (1958).
Curran, Luke S., Copyright Trolls, Defining the Line between Legal Ransom Letters and Defending Digital Rights: Turning Piracy into a Business Model or Protecting Creative from Internet Lawlessness Marshall Rev. Intell. Prop. L. [v] (2013–2014), pp 170–202.
DeBriyn, James, Shedding Light on Copyright Trolls: An Analysis of Mass Copyright Litigation in the Age of Statutory Damages, UCLA Entertainment Law Review 19.1 (2012).
Gracia, Jorge, Texts: Ontological Status, Identity, Author, Audience. SUNY Press (1996).
Gelernter, David. The muse in the machine: Computerizing the poetry of human thought. Simon and Schuster (1994).
McCutcheon, Jani, Curing the Authorless Void: Protecting Computer-Generated Works Following IceTV and Phone Directories. Melbourne ULR 37 (2013), pp 46–232.
Petrie, Hugh G., The dilemma of enquiry and learning. Living Control Systems Publ (2011).
Schank, Roger C., Cleary, Chip, Making Machines Creative. In: S Smith, T B Ward & R A Finke (eds.) The Creative Cognition Approach. MIT Press (1995), pp 229–247.
Searle, John, Minds, Brains and Programs, Behavioral and Brain Sciences 3 (3) (1980), pp 417–457.
Wershler-Henry, Darren Sean, The iron whim: A fragmented history of typewriting. Cornell University Press (2005).
David Komuves, PhD Fellow, CREATe and University of Edinburgh, SCRIPT Centre for IT and IP Law, Old College, EH8 9YL Edinburgh, UK, s1268366@sms.ed.ac.uk; http://www.create.ac.uk
Jesus Niebla Zatarain, PhD Researcher, University of Edinburgh, SCRIPT Centre for IT and IP Law, Old College, EH8 9YL Edinburgh, UK, j.niebla@ed.ac.uk; http://www.law.ed.ac.uk/research/students/viewstudent?ref=264
Burkhard Schafer, Professor of Computational Legal Theory and Director, SCRIPT Centre for IT and IP Law, University of Edinburgh, Old College, EH8 9YL Edinburgh, UK, b.schafer@ed.ac.uk; http://www.law.ed.ac.uk/people/burkhardschafer
Laurence Diver, Research Assistant, CREATe and University of Edinburgh, SCRIPT Centre for IT and IP Law, Old College, EH8 9YL Edinburgh, UK, laurence.diver@ed.ac.uk; http://www.law.ed.ac.uk/people/laurencediver
- 1 See for example https://torrentfreak.com/copyright-apocalypse-trolls-attack-the-net-from-the-future-140928/, last accessed 21 January 2015.
- 2 http://www.qentis.com, last accessed 21 January 2015.
- 3 http://www.qentis.com/work/work-13/, last accessed 21 January 2015.
- 4 http://www.artmarcovici.com/BIOGRAPHY, last accessed 21 January 2015.
- 5 http://www.jesse-anderson.com/2011/09/a-few-million-monkeys-randomly-recreate-shakespeare/, last accessed 21 January 2015.
- 6 The outcome and images of the monkeys at work can be seen at https://web.archive.org/web/20130120215600/http://www.vivaria.net/experiments/notes/publication/NOTES_EN.pdf, last accessed 21 January 2015.
- 7 http://www.create.ac.uk/blog/2014/08/07/quit-playing-around-monkey-stirs-up-copyright-controversy-with-selfie-guestpost-by-emily-goodhand/, last accessed 21 January 2015.
- 8 For more messy real life instantiations of this platonic ideal of copyright trolling as a business strategy for law firms, see for example DeBriyn (2012) or Curran, (2013–2014).
- 9 http://copyright.gov/comp3/chap300/ch300-copyrightable-authorship.pdf, last accessed 21 January 2015.
- 10 http://www.theguardian.com/technology/2014/aug/22/monkey-business-macaque-selfie-cant-be-copyrighted-say-us-and-uk, last accessed 21 January 2015.
- 11 In, for example, the field of DNA-based computing. See for example Kahan et al 2008.
- 12 http://www.slate.com/articles/technology/future_tense/2012/03/narrative_science_robot_journalists_customized_news_and_the_danger_to_civil_discourse_.single.html, last accessed 21 January 2015.
- 13 http://towcenter.org/blog/automated-stories-using-algorithms-to-craft-news-content/, last accessed 21 January 2015.
- 14 http://www.cbsnews.com/news/this-post-was-written-by-a-human/, last accessed 21 January 2015.