How Google Book Search transformed from impossible to inevitable

English: Google Digitization signs are all ove...

English: Google Digitization signs are all over the Michigan engineering library. (Photo credit: Wikipedia)

In a widely reported copyright fair use decision, Judge Denny Chin ruled that the Google Books program constituted fair use, denying claims of the Authors Guild that the scanning of 20 million library books and posting snippets of those works online infringed the rights of authors.

The litigation history reflects the transformation that has taken place on the internet in the past decade. In 2004 Google entered into an agreement with several universities, beginning with University of Michigan.

Google began the process of digitizing books at the nation’s great libraries, starting at the University of Michigan, the alma mater of company co-founder Larry Page. “Even before we started Google, we dreamed of making the incredible breadth of information that librarians so lovingly organize searchable online,” said Page. A 2005 lawsuit resulted in three years of negotiation and a proposed settlement in 2008. That settlement collapsed among antitrust concerns and fairness of the representatives of the plaintiffs’ sub-classes.

As the Google Books program evolved, two discrete projects operated. In the Partner Program “works are displayed with the permission of the rights holder.” The rights holders had the ability to opt out of the scanning, but in 2011 the Association of American Publishers settled with Google. According to the decision, “As of early 2012, the Partner Program included approximately 2.5 million books, with the consent of some 45,000 rights holders.” The participation suggests an industry voting with its feet.

Under the publisher agreement, Google stopped displaying ads with the publisher’s books. In turn, the publishers provide Google with the books. This settlement, even more than the two district court decisions, effectively ended the dispute – leaving the two lawsuits as mop-up activities.

In the HathiTrust litigation, Judge Harold Baer determined Google’s Library Project partners who comprised the HathiTrust partnership were entitled to fair use protection for the digitization of the 20,000,000 volumes copied and used by the libraries. The decision highlighted the benefits to visually-impaired students and researchers who had access to content not previously available through audio readers or braille, the benefits of digital search functionality, and the importance of protecting the library collections from physical harm and erosion.

In both opinions, the courts highlighted the new research opportunities created by the digital database:

Mass digitization allows new areas of non-expressive computational and statistical research, often called “textmining.” One example of text mining is research that compares the frequency with which authors used “is” to refer to the United States rather than “are” over time. Quoting the brief of the Digital Humanities amicus, “it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.”).

The Google decision followed the same path, highlighting the benefits of digital search, the limits placed on commercial exploitation by Google, and the pro-market effects agreed to by the publishers. “Google Books expands access to books.” With this simple sentence, the court highlights the essence of the eight years of litigation. In looking at the transformative nature of the fair use test, the court explained, “Google Books does not supersede or supplant books because it is not a tool to be used to read books.”

The court does not discuss the tremendous value the Google Books program benefits the search engine, speech recognition and other algorithms operated by Google. It also dismisses the intermediary copying as a necessary function to enable the research and archival function to be exploited. But it does highlight that Google “does not run ads on the About the Book pages that contain snippets” and that Google “does not engage in the direct commercialization of copyrighted works.”

Google’s settlements and decisions not to commercialize the Google Books program likely tipped the scales with the publishers and may have strongly influenced the courts. Unlike Judge Baer, Judge Chin does not even discuss the potential to license the digitized database to Google. Baer rejected the potential to license the database as speculative. Moreover, since new works are added by voluntary participation with the publishers, the licenses for new works are included.

The decision appears a simplistic fair use summary that could lead casual observers to wonder why it required eight years of litigation. But changes to the conduct of both parties are what really led to this simple decision. Google adapted its behavior to limit its commercialization of the works. Publishers shifted their position from one of demanding opt-in, ex ante control to recognizing that the opt-out partnership met their needs. Eight years of experience did not produce significant evidence of authors being harmed as a result of snippet-searches replacing library purchases of academic texts.

In addition, the role of digital texts has changed. The Amazon Kindle and Apple iPad have paved the way for a fundamental shift in the relationship authors have with electronic texts. Market forces proved Google correctly anticipated a highly reconstructed book industry. Google was only one of the players bringing about this change.

Both the HathiTrust litigation and the Authors Guild v. Google litigation will likely be appealed, but there is little appeal in undoing the transformations to publishing that the Google Books program began.

Advertisements

Industrial Internet reshapes the “Internet of Things”

In a term coined in 1999, the Internet of Things, relates to a world in which all objects are connected wirelessly to the Internet and therefore to each other. The model requires each device to have RFID or other near field communications technology to communicate, sharing information about the identity, status, activities, and other attributes of the device. Partnered with big data analytics, the information from these devices can paint a robust picture of how objects interact in the world and how people interact with them.

This week, the model was supercharged. According to a report in the New York Times, General Electric hopes to transform this model with what it terms the “Industrial Internet.”

The so-called Industrial Internet involves putting different kinds of sensors, sometimes by the thousands, in machines and the places they work, then remotely monitoring performance to maximize profitability. G.E., one of the world’s biggest makers of equipment for power generation, aviation, health care, and oil and gas extraction, has been one of its biggest promoters. … The executive in charge of the project for G.E. … said that by next year almost all equipment made by the company will have sensors and Big Data software.

Emerging technology allows devices to distribute usage and telemetry data, to receive instructions, to interact with other equipment, and to serve as the communications bridge extending network coverage so that the devices themselves expand the network on which the equipment communicates. The implications are quite interesting.

Perhaps the most important aspect of the development affects critical infrastructure – the fundamental systems operating our water, power, rail, and telecom infrastructure. Properly secured and interactive, the elements of our aging infrastructure could begin to trouble-spot and eventually provide small repairs without the need for 24-hour crews.

GE’s present equipment tends to be large devices, ranging from jet engines to MRI machines. But the concept could well extend to automobiles, bicycles, phones, cameras, and even clothing. Equipped automobiles, for example, could report mechanical efficiency for every system in the car. They could also share vehicle telemetry, providing a real-time map of how each car was driving in relation to every other car driving on the road. The information could be used to alert a driver to road hazards, to dangerous weather conditions, or to the driver’s weaving. The information could alert police to the same conditions and behaviors.

In the workplace, the Industrial Internet will improve atomization, which helps retain U.S. manufacturing but probably at the cost of fewer workers doing more specialized work. It should also be employed to improve worker safety but could easily be adapted to create a workplace in which every movement was tracked. With Industrial Internet name badges, doors would lock and unlock in response to the presence of authorized personnel, but the data analytics would also be able to see which employees spent the most time with which of their peers, and correlate such interactions with post-interaction productivity. Schools could similarly track student movements and behaviors, identifying which resources and faculty were actually utilized and which of those impacted learning outcomes – for better or worse.

Existing rules for workplace and education environments do not take the pervasive nature of the Industrial Internet into account. Assumptions that privacy is a zone around one’s home and person has little relevance to a cloud of data points broadcasting a picture of each person and how that person interacts.

The FTC has taken small steps to explore these issues and regulate obvious abuses, but legislators need to do much more. Absent legislation, current NSA practices will vacuum this data into its Orwellian data trove.

The Industrial Internet promises to translate the Internet of Things into very practical, valuable industrial improvements. Safer planes, smarter cars, more efficient homes all improve people’s lives. Proper regulation will encourage those uses while protecting civil liberties, privacy, and overreach. Perhaps we can craft the policies to avoid the outrage rather than in response to it.

One year later – DRM-free ebooks hugely positive for Tor

New York Times technology columnist David Pogue discussed the decision last year by Tor Books UK and US to drop copy protection. It just released a statement regarding the effect of the DRM-free ebooks after one year.

His column deftly discusses the tension between consumers who want the inconvenience of encryption eliminated and concerns that DRM targets lawful consumers far more than those acquiring illegally distributed copies. Although he does not address the plethora of DRM-free versions on bit torrent sites, he notes that the changes to DRM for commercial products might affect the rate of piracy, but not the existence of piracy.

The Tor announcement highlighted a few other features of their strategy. First, the strategy was about their authors and the goals of the authors to engage more effectively with their readers. Secondly, as a science fiction imprint, their readership is among the most capable of getting DRM-free copies, so the publisher needs to make the consumer happy more than it needs to protect itself from the consumer. And finally, the decision to eliminate DRM does not mean that the works are not for-profit, on-sale copies. This statement captures many of Tor’s concerns:

We had discussions with our authors before we made the move and we considered very carefully the two key concerns for any publisher when stripping out the DRM from ebooks: copyright protection and territoriality of sales. Protecting our author’s intellectual copyright will always be of a key concern to us and we have very stringent anti-piracy controls in place. But DRM-protected titles are still subject to piracy, and we believe a great majority of readers are just as against piracy as publishers are, understanding that piracy impacts on an author’s ability to earn an income from their creative work. As it is, we’ve seen no discernible increase in piracy on any of our titles, despite them being DRM-free for nearly a year.

Pogue suggests but does not state outright that DRM is an ineffective strategy for reducing piracy. But he is very explicit that the point of an anti-piracy policy is to increase sales and revenue. DRM-free does not mean without cost. iTunes sells its music even though it dropped DRM. He also points out that his own books have had fared similarly in the market.

If book consumers thought that everyone in the household could easily read the same book (in the manner that a family can share a physical book), it might be more willing to spend money to own the ebook. For works that have no physical cost, the increase in unauthorized copies is not the right metric. The right question is whether more customers will purchase the work. If more copies are sold, the work is more successful, even if more copies are also pirated.

Pogue makes another strong point that the ease of the transaction directly impacts sales. “Friction also matters. That’s why Apple and Amazon have had such success with the single click-to-buy button. To avoid piracy, it’s not enough to offer people a good product at a fair price. You also have to make buying as effortless as possible.” High transaction costs are reasonable only for expensive, infrequent purchases. Weight is a normal force on friction; only weighty purchases should have high friction.

Finally, Pogue addresses the pricing of ebooks. Frankly, he is more generous to the publishers than I would be on this issue by acknowledging the costs associated with “author advance, editing, indexing, design, promotion, and so on” but like the music industry, the investments are declining. The public is likely to value the fair price point of an ebook as a percentage of its physical counterpart. If the physical copy has a secondary market in the used bookstore, then the loss of resale also needs to be factored in for the consumer. Otherwise the consumer is only paying for the convenience of instant access, and if the instant access is undermined by cludgy DRM, there is no value to be had.

Tor heard this from its constituents:

But the most heartening reaction for us was from the readers and authors who were thrilled that we’d listened and actually done something about a key issue that was so close to their hearts. They almost broke Twitter and facebook with their enthusiastic responses. Gary Gibson, author of The Thousand Emperors tweeted: “Best news I’ve heard all day.” Jay Kristoff, author of Stormdancer, called it “a visionary and dramatic step . . . a victory for consumers, and a red-letter day in the history of publishing.”

Tor never says it has become more profitable, but the company does relish the role it is taking in leading the publishing industry towards a more consumer-friendly business model.

The move has been a hugely positive one for us, it’s helped establish Tor and Tor UK as an imprint that listens to its readers and authors when they approach us with a mutual concern—and for that we’ve gained an amazing amount of support and loyalty from the community. And a year on we’re still pleased that we took this step with the imprint and continue to publish all of Tor UK’s titles DRM-free.

So the lesson from Tor is simple – for low-cost impulse purchases DRM doesn’t add value. High quality, fairly priced, and easy to access works will continue to attract a growing market. These are the points of emphasis and differentiation for the marketplace. DRM may be a legal solution, but it is not a sound business strategy.

Cyber Defense Strategies and Responsibilities for Industry Call for Papers Now Open

The Northern Kentucky Law Review and Salmon P. Chase College of Law seek submissions for the third annual Law + Informatics Symposium on February 27-28, 2014.

2014 Law + Informatics Symposium on

Cyber Defense Strategies and Responsibilities for Industry

 The focus of the conference is to provide an interdisciplinary review of issues involving business and industry responses to cyber threats from foreign governments, terrorists, and corporate espionage. The symposium will emphasize the role of the NIST Cybersecurity Framework and industries providing critical infrastructure.

The symposium is an opportunity for academics, practitioners, consultants, and students to exchange ideas and explore emerging issues cybersecurity and informatics law as it applies to corporate strategies and the obligations of business leaders. Interdisciplinary presentations are encouraged. Authors and presenters are invited to submit proposals on topics relating to the theme, such as the following:

Cyber Warfare

  • Rules of Engagement
  • Offensive and defensive approaches
  • Responses to state actors
  • Engagement of non-state actors
  • Distinguishing corporate espionage from national defense
  • Proportionality and critical infrastructure
  • Cyber diplomacy
  • Cold War footing and concerns of human rights implications

Front Lines for Industry

  • Role of regulators such as FERC
  • Legacy systems and modern threats
  • NIST guidelines
  • NIST Cybersecurity Framework
  • Engaging Dept. of Homeland Security
  • Implications on various industries (electric power,  telecommunications and transportation systems, chemical facilities)
  • Health and safety issues
Global Perspectives

  • Concepts of cyber engagement in Europe
  • Perception of Internet and social media as threat to national soverignty
  • Rules of engagement outside U.S. and NATO
  • Implications for privacy and human rights
  • Stuxnet, Duqu, Gauss, Mahdi, Flame, Wiper, and Shamoon
  • Cyber engagement in lieu of kinetic attacks or as a component of kinetic engagement

 

Corporate Governance

  • Confidentiality and disclosure obligations
  • Responsibilities of the board of directors
  • Staffing, structures and responses
  • Data protection & obligations regarding data breaches
  • Corporate duty to stop phishing and other attacks for non-critical industries
  • Investment and threat assessment
  • Litigation and third party liability

 

Other Issues

  • Executive orders and legislative process
  • Lawyer responsibility in the face of potential threats
  • Practical implications of government notices
  • Perspective on the true nature of the threat

Submissions & Important Dates: 

  • Please submit materials to Nkylrsymposium@nku.edu
  • Submission Deadline for Abstracts: September 1, 2013
  • Submission Deadline for First Draft of Manuscripts: January 1, 2014
  • Submission Deadline for Completed Articles: February 1, 2014
  • Symposium Date: February 27-28, 2014

Law Review Published Article:  The Northern Kentucky Law Review will review, edit and publish papers from the symposium in the 2014 spring symposium issue.  Papers are invited from scholars and practitioners across all disciplines related to the program. Please submit a title and abstract (of 500-100 words) or draft paper for works in progress. Abstracts or drafts should be submitted by September 1, 2013. Submissions may be accepted on a rolling basis after that time until all speaking positions are filled.

Presentations (without publication) based on Abstracts:  For speakers interested in presenting without submitting a publishable article, please submit an abstract of the proposed presentation. Abstracts should be submitted by September 1, 2013. Submissions may be accepted on a rolling basis after that time until all speaking positions are filled.

Publication of Corporate Handbook on Cyber Defense: The Law + Informatics Institute may edit and publish a handbook for corporate counsel related to the topics addressed at the symposium. Scholars and practitioners interested in authoring book chapters are invited to submit their interest by September 1, 2013 which may be in addition to (or as an adaptation of) a submitted abstract for The Northern Kentucky Law Review. Submissions may be accepted on a rolling basis after that time until all chapter topics are filled.

About the Law and Informatics Institute:  The Law + Informatics Institute at Chase College of Law provides a critical interdisciplinary approach to the study, research, scholarship, and practical application of informatics, focusing on the regulation and utilization of information – including its creation, acquisition, aggregation, security, manipulation and exploitation – in the fields of intellectual property law, privacy law, evidence (regulating government and the police), business law, and international law.

Through courses, symposia, publications and workshops, the Law + Informatics Institute encourages thoughtful public discourse on the regulation and use of information systems, business innovation, and the development of best business practices regarding the exploitation and effectiveness of the information and data systems in business, health care, media, and entertainment, and the public sector.

For More Information Please Contact:

  • Professor Jon M. Garon, symposium faculty sponsor and book editor: garonj1@nku.edu or 859.572.5815
  • Lindsey Jaeger, executive director: JaegerL1@nku.edu or 859.572.7853
  • Aaren Meehan, symposium editor, meehana2@mymail.nku.edu or 859-912-1551

Beyond Google’s Looking Glass – The Internet of Things is Already Here

Seal of the United States Federal Trade Commis...

(photo: Wikipedia)

Perhaps triggered by the New York Times coverage of Google Glass, The FTC announced both a call for submissions and a workshop related to the Internet of Things and its implications on privacy, fair trade practice, and security implications for both data and people. The FTC announcement highlights both the benefits and risks of device connectivity.

Connected devices can communicate with consumers, transmit data back to companies, and compile data for third parties such as researchers, healthcare providers, or even other consumers, who can measure how their product usage compares with that of their neighbors.  The devices can provide important benefits to consumers:  they can handle tasks on a consumer’s behalf, improve efficiency, and enable consumers to control elements of their home or work environment from a distance. At the same time, the data collection and sharing that smart devices and greater connectivity enable, pose privacy and security risks.

The issue is not new. The ITU released a 2005 study discussing the implications of the Internet of Things. The ITU described a near, technological future in which “industrial products and everyday objects will take on smart characteristics and capabilities. … Such developments will turn the merely static objects of today into newly dynamic things, embedding intelligence in our environment, and stimulating the creation of innovative products and entirely new services.”

I have previously described some of these concerns in an article, Mortgaging the Meme.[1]

In each of these situations, an automated and consumer-defined relationship will replace the pre-existing activities. In many situations, this will create efficiency and convenience for the consumer, but it will also reduce the opportunities for human interaction and subtly rewrite the engagement between customer and company. Those that understand this change will adjust their technologies to improve the service and increase the customer‘s reliance on its systems. Companies that do not understand how this engagement will occur, risk alienating customers and losing markets quickly.

Beyond consumer interactions, other uses may arise. Ethical and privacy concerns regarding misuse tend to focus on government, business and organized crime. These include unwarranted surveillance, profiling, behavioral advertising and target pricing campaigns. As a result, as companies increasingly rely on these tools, they also bear a responsibility to do so in a socially positive manner that increases the public‘s estimation of the company.

Timing for the FTC submissions and workshop are overdue. Reading the New York Times quote regarding app developers, there is a sense that unlike the technology giants such as Microsoft and Google, the developers are thinking more about the technology’s potential than its potential impact. One such example from the Times: “‘You don’t carry your laptop in the bathroom, but with Glass, you’re wearing it,’ said Chad Sahlhoff, a freelance software developer in San Francisco. ‘That’s a funny issue we haven’t dealt with as software developers.’”

Many fields will benefit from increased device connectivity. Just a few:

  • Public transportation systems designed around real-time usage and traffic patterns.
  • Prescription monitoring to help patients take the right medications at the correct time.
  • Fresher, healthier produce.
  • Protection of pets and children.
  • Social connectivity, with photo-tagging and group-meeting moving into the real world.
  • Interactive games played on a real-world landscape.

There are also law enforcement uses that must be carefully considered. After the Boston Marathon attack, for example, calls for public surveillance will undoubtedly increase, including calls for adding seismic devices and real-time echo-location. Gunshots, explosions, and even loud arguments could become self-reporting.

Common household products sometimes become deadly in large quantities. RFID technology could be used to monitor quantity concentration of potentially lethal materials and provide that data to the authorities.

The consumer use, public use, and law enforcement use must be thoughtfully reviewed to balance the benefits of the technology with the intrusions into privacy and the legacy of retrievable information that such technology creates.

FTC staff will accept submissions through June 1, 2013, electronically through iot@ftc.gov or in written form. The workshop will be held on November 21st. These are the questions posed by the FTC thus far:

  • What are the significant developments in services and products that make use of this connectivity (including prevalence and predictions)?
  • What are the various technologies that enable this connectivity (e.g., RFID, barcodes, wired and wireless connections)?
  • What types of companies make up the smart ecosystem?
  • What are the current and future uses of smart technology?
  • How can consumers benefit from the technology?
  • What are the unique privacy and security concerns associated with smart technology and its data?  For example, how can companies implement security patching for smart devices?  What steps can be taken to prevent smart devices from becoming targets of or vectors for malware or adware?
  • How should privacy risks be weighed against potential societal benefits, such as the ability to generate better data to improve healthcare decision making or to promote energy efficiency?
  • Can and should de-identified data from smart devices be used for these purposes, and if so, under what circumstances?

While the FTC has asked some good questions, they are only the beginning. Please submit your thoughts and join the FTC conversation.


[1] Jon M. Garon, Mortgaging the Meme: Financing and Managing Disruptive Innovation, 10 NW. J. TECH. & INTELL. PROP. 441 (2012).

Remote Proctoring for the MOOC – an opening for the next wave in privacy excess

For those who herald such things, 2012 was the year of the MOOC – massive open online courses. Most MOOC courses are free, though some providers are attempting to monetize the offerings. The Chronicle of Higher Education reports that Coursera, the leading provider has exceeded one million students while Udacity is nearing that mark.

The MOOC movement represents a highly disruptive innovation in education. Content is provided for free (or low cost) to the public on a massive scale. While some courses are little more than correspondence programs, others are highly interactive – with student projects, effective feedback, and measurable learning outcomes.

Successful educational institutions will still sell the academic degrees as well as the more intimate experiential learning opportunities. Other universities, struggling financially, tend to see MOOCs as threats to revenue while other critics raise concerns about rigor and engagement.

Ironically, the open access for the MOOC raises concerns about the reliability of the authentication of the test taker. If the certification is valuable, then perhaps one can hire a stand-in to take the course and pass the exam. According to the Washington Post, “security measures suggest that people sometimes cheat in MOOCs, even when there are no course credits or money at stake.”

To expand its business model and improve the reliability of MOOC participation, Coursera has launched a “pilot project to check the identities of its students and offer “verified certificates” of completion, for a fee. A key part of that validation process will involve what Coursera officials call “keystroke biometrics”—analyzing each user’s pattern and rhythm of typing to serve as a kind of fingerprint.”

Keystroke biometrics are recognized for distinguishing between automated computer responses and human responses, so they are quite useful for separating human users from computer bots. They are less commonly used as an identity credential.

The keystroke biometrics are just part of the Coursera approach. It will also use photographs of the student’s ID and of the student taken from the computer to be compared by hand.

The most common way for online courses to be verified is for the student to take the exam at a test center. Such facilities exist throughout the county and sometime universities offer this service to each other as an accommodation for traveling students.

Using ineffective technologies will make a joke out of the credibility for MOOC certification. While the risk of being caught will deter some potential cheaters, it will incentivize others to work around the weak protections and harm the credibility of these programs.

Inevitably, the next step in student monitoring will be to remotely capture photos, video or audio of the students engaged while in the course. Products that remotely control onsite computers such as Apple Remote Desktop, LanSchool, and Net Orbit, can be adapted to the student’s home computer. In 2010, for example, a Philadelphia high school was sued for spying on its students without any prior notification.

Perhaps the use of live biometric voice recognition would improve the reliability and avoid the risk that the system could capture data surreptitiously, but such steps should be taken with caution.

Until the MOOC certificate is part of a college transcript, there is no reason to worry about verification. Schools offering college credit for these courses should extend their academic standards and honor codes to the courses.

Any monitoring of online students should be done in a manner that requires the student to log into the system and complete verification steps. It should not allow the system to reach into the student’s computer or turn on monitoring devices – including keystroke monitors, microphones or cameras. Any system that allows the school to choose when to monitor the student is likely to become intrusive and glean inappropriate information by the school.

There are many effective ways to verify the work of students – computer monitoring should not be one of them.