Digital Governance Book Review: Data Cartels (2023)

Lamdan, Sarah (2023). Data Cartels. The Companies That Control and Monopolize our Information. Stanford: Stanford University Press, 203 pages.

By Matthias Finger

This extremely well documented and researched book introduces the reader to some novel areas of platformization, namely the platformization of actors that manage information, knowledge and intelligence. The author, a law professor, shares her unique insider understanding of companies such as Bloomberg L.P., RELX (Reed, Elsevier, LexisNexis) and Thomson Reuters. In particular, we get to see how digital platforms aka “data analytics” companies work, along with the new business models they operate with. We also learn how governments, especially the US government collaborates with and actually needs these companies, thus jeopardizing some of their essential public service functions.

The main strength of this book are the unique insights it provides into a segment of the digital platforms world that has remained in the dark, where data cartels have operated so far. These insights are furthermore well captured by catchy and very commonsensical formulas, which make this book very pleasant to read. Even though the book is weak on theory and remedies, something it does not even aspire to do, many novel understandings emerge.

The book itself is structured into five types of activities performed by data analytics companies, namely mainly Thomson Reuters and RELX in the areas of data brokering (chapter 2), academic research (3), legal information (4), financial information (5) and news (6). What makes these two companies a “cartel” – in purely technical terms the word “duopoly” would probably be more appropriate until proven guilty –, is the fact that they dominate most of these markets and moreover the fact that they are the only ones to combine these five markets. In this review, I will highlight the activities of these data analytics companies in the less well-known areas of academic research, as well as legal and financial information. The first and the last chapters are a bit more conceptual in nature, the first one showing how these “data octopuses” link all these activities together and how they operate as a cartel. The last chapter is about remedies, in which the author builds on her rather intuitive idea of public services and how to continue to provide them even though the cartels are now ruling over government.

This review is structured as follows: I will first discuss some of the book’s chapters, mainly with the aim of helping the readers get an understanding of what digital platforms do in the areas of academic research, legal, and financial information. In a second step, I will crystalize what we can learn from this excellent and so far unique book.

Insights into the yet unknown activities of data analytics firms

Academic research

Elsevier (the “E” in RELX) and Thomson Reuters sell both academic content and the analyses (Scopus for Elsevier, Clarivate for Thomson Reuters) they make of the people and the organizations that create and use that content. Together they publish and analyze the majority of English scientific publications, a typical global cartel. Elsevier alone owns 25’000 journals and publishes half a million scientific articles annually. “Among librarians and scholars, RELX and Thomson Reuters aren’t known as benevolent – they are known as bullies. The companies exploit scholars’ free labor and then steal the fruits of their research, paywalling their work so that only subscribers can see it” (p. 5).

They are now academic data analytics companies, no longer publishing houses. They have come to that position in the science publishing market thanks to digitalizing academic content on the one hand and thanks to extracting information from the ones who produce that content on the other hand. Like with all the other already well-known digital platforms such as Amazon, they operate as an intermediary between the ones who (must) produce content, the ones who (must) consume that content and the ones who hire and employ the producers of that content, a typical three-sided market with very potent direct, indirect, and algorithmic network effects.

Their business model furthermore consists of extending the value chain to the pre-and post-publication activities. Also, they have successful imposed a series of academic impact metrics (impacts of articles, journals authors, etc.) as the(ir) criteria for academic “quality”. By now researchers, universities, and funding agencies are so hooked onto these metrics and corresponding platforms that academic research has become a self-referential system, an echo chamber fueled by Elsevier and Thomson Reuters algorithms. This is of course not good for science as an endeavor that should innovate and help humanity solve its problems.

But there are other problems as well: as a cartel, RELX and Thomson Reuters practice outrageous prices for accessing the knowledge that has been funded by the public sector, produced by cheap and oftentimes exploited academic labor. In other words, they have privatized information and knowledge that is fundamentally public, something that the UN calls a human right. This fuels of course inequality: only rich universities and their libraries can afford to access that knowledge. Not astonishingly then the best-ranked Universities are the ones where their researchers can have unrestricted access to RELX and Thomson Reuters databases … and they are not in Africa.

In other words, publicly funded knowledge which should be publicly accessible is being transformed “into fodder for RELX’s data analytics software” (p. 51), which turns it into abusive 30% profit margin. Reading a journal article typically costs 30 USD when it basically has been produced for free. In other words, “the academic information industry is built on unfair contracts and free labor” (p. 62). It transforms services to society and humanity into commodities und by doing so undermines the very idea of knowledge as a public service: And the author does not even talk about the biases that have been introduced by the algorithms into research orientations, scientific opinions and the very orientation of scientific research, an area that would certainly be worthwhile writing about as well.

Legal information

This may be a purely US American phenomenon, but the same duopoly RELX (and its subsidiary LexisNexis, the “LX” in RELX) and Thomson Reuters (and its subsidiary Westlaw) also prevail in the American legal information market. Legal information is not produced by researchers, but by legislative, judicial, and administrative actors. The two platforms digitalize this information and make it available again to case instructors, judges, policy-makers, and lawyers and even the citizens if they can afford the hefty prices. They also combine it with digitalized information from “secondary” law publications, i.e., writings that explain the law. Sometimes, LexisNexis and Westlaw are the only available accesses to such information. The US government does have its own system, but it is so bad that nobody can use it, so that even the government legal system uses the platforms. In addition, the strength of these platforms is their analytics software so as to create predictive and sometimes even prescriptive information, such as how cases will fare in various courts, so-called “data-driven legal insights”.

Again, this is a typical two-sided market – the producers and the consumers of legal information, some actors (for example judges) finding themselves on both sides of the market. The strength of these legal information companies is of course the wealth of information they have accumulated over time, but even more so their algorithms which make sense out of this wealth of information. Needless to say, that the search behavior of the lawyers, the judges, and interestingly even the inmates – who are sometimes offered access so as to be “able” to defend themselves in the absence of a lawyer – are analyzed in turn and fed into the algorithms. These algorithms (and the duopoly platforms) not only fluidify the legal process in the US, they actually shape the law. What is referenced and what gets prioritized by Westlaw and by LexisNexis (just like in Google’s search algorithms) will get cited and used, leading like in academic research to a self-referential system, an echo-chamber of lawyers and judges, ultimately becoming the prevalent legal doctrine.

And of course, all this is originally public legal information which has been privatized to the benefit of these lawyers or rather law firms (and their customers) that can afford it. While it should be available to everyone so to speak as a public service, the digital platforms turn it into a privatized commodity for the few.

Financial information

Financial information is a little bit different from academic research and legal information: it is less of a public service, even though financial information platforms may have huge consequences about how the stock market evolves and how your retirement fares; it is also somewhere in between data brokering and intelligence. But financial analytics companies – and there are mostly three (besides RELX’s LexisNexis, Refinitiv Eikon (a former Thomson Reuters Company) and Bloomberg L.P. – basically analyze financial market data along with financial analyses made by others, pushing both through their algorithms so as to make predictions and recommendations.

Again, this is a two or even a multisided market with platforms in the middle: just with academic researchers, analysists are both the consumers of the analyses provided by the digital platforms and the fodder for these same platforms, as their behavior on the platform is itself analyzed by the algorithms and turned into sellable knowledge, sometimes back to the same analysts that have provided the information to begin with. Says the author: “In the same year Thomson Reuters sold information before it went public, Bloomberg News reporters were caught snooping on investment bankers’ activities on their Bloomberg terminals” (p. 109). In the US (and in most other countries), the SEC (Securities and Exchange Commission) mandates and produces itself a lot of information which, because of its sheer numbers and legalistic jargon, becomes impossible to understand, even by the one who have produced it, thus their growing reliance on financial information curated by algorithms. And just like with the legal information, there is a danger that financial analysis becomes the echo-chamber of finance and the world economy. According to Lamdan, “[…] the financial information products aren’t powerful because they tell the financial sector what to do, and the financial sector listens. They’re not just predictive, they’re self-fulfilling financial prophesies” (p. 96), something that can be quite worrisome when the global economy is now essentially driven by financiers’ herding behavior on the almost automated trading floor.

And all this is of course not cheap, meaning that only the most affluent traders (and their clients) can afford the information needed, while the lay trader is left to his or her intuitions. The lack of regulation of these platforms is equally problematic: “There are regulations to prevent insider trading. But there are no rules that make it illegal for financial information companies to disclose information before it goes public. That means data companies can control the flow of financial information in ways that even the publicly traded companies themselves can’t” (p.109).

What can we learn from this book?

While these three chapters on academic research, legal, and financial information are clearly the most novel contributions, the book also contains chapters about digitalized news and data brokering, as both RELX and Thomson Reuters are also active in the news information and the data brokering markets. Actually, Reed – the “R” in RELX – started out as a newsprint company and both Thomson and Reuters were originally news enterprises. With digitalization, both evolved from news companies to data companies. But there is more than digitalization, the rise of these companies is also the result of government retreat: “When government didn’t provide high-quality access to academic research, financial information, and law codification, the data companies figured out how to privatize and monetize public information resources” (p. 117), a process that is admittedly more advanced in the US than in Europe.

But government did not only retreat from its public services obligations during the glorious times of neo-liberalism, it also largely missed digitalization, maybe precisely because it was defunded during these times. This in turn opened the door to institutional data brokering, i.e., to companies such as RELX and Thomson Reuters that sell data back to government so that government can do its job, especially the job that involves peoples’ rights and privileges: “Companies like RELX and Thomson Reuters participate in peoples’ surveillance, arrest, and deportation. […] They help government, your employer, your landlords, your insurance companies and all sorts of other big decision makers to spy on you. […] They are ‘big brother’s little helper’” (p. 28). RELX sells data brokering products to more than 7500 federal, state, and local government agencies, among which 2100 police departments and 955 sheriff departments, not to speak about the Department of Defense, the Department of Justice, as well as Intelligence agencies” (p. 31).

Written by a law professor, this book contains a lot of useful information and some quite original insights into an area of the digital world that is still quite unknown. In particular, it shows how two companies – RELX and Thomson Reuters in particular – have become the two probably global giants of data analytics, “the informational equivalent of Nestlé, the giant food company that owns everything from Cheerios to Hot Pockets and Perrier” (p. 3). Combined, the markets in which these two companies operate, “comprise much of the information that people need to make critical legal, financial and science-based decisions” (p. 3), at least in the United States. But while the author shows how these companies operate in and dominate the different areas, she still fails to convince the reader how they can and do capitalize on the synergies they derive from operating in these diverse areas.

Another very important and novel insight the author provides pertains to the co-dependency of RELX and Thomson Reuters with the US government, something that is probably quite unique worldwide, but might have its equivalent in China, if only we knew more: “Federal, state and local agencies themselves rely on the data companies’ products. Without Thomson Reuters and RELX databases, the legal information that lawmakers, courts and agencies rely on would stop flowing. Government surveillance programs wouldn’t work. […] they also would lose valuable revenue they get from selling our data” (p. 22). In short, “the (US) government is a data company partner, not a data company regulator” (p. 22), something that probably also applies to other better known globally operating digital platforms, not the least Facebook.

But when it comes to remedies (see the conclusion), the book is quite weak, not to say naïve and in any case very American. Overall, the author argues that government – the US government that is – should consider and regulate the data analytics markets and companies with the public interest in mind, considering it a “public data infrastructure” (p. 129). Government should stop falling for “tech exceptionalism” (p. 130) and treat data companies just like it considers brick-and-mortar public utilities working in the public interest. More concretely, these companies should be considered “information fiduciaries”(p. 141) having legally enforceable duties vis-à-vis the citizens. The essential information they have come to provide over time should be treated as a public resource and be unbundled from their business activities. More concretely, “data analytics octopuses” (p. 132) such as RELX and Thomson Reuters should not be allowed to blur the lines between the different informational markets, such as financial, health care, legal and research information. “There should be protective barriers between research and data analytics”, for example (p. 132). “The companies that funnel our personal data to the police and the FBI should not be the same companies that we depend on for our critical informational resources” (p. 141).

This edition of the Digital Governance Book Review was authored by: Matthias Finger, C4DT

Image credit: Cover of Data Cartels. The Companies That Control and Monopolize our Information by Sarah Lamdan, published by SUP.