Heavy and soft

Tue 06 June 2023, Philippe Ropenga Artificial intelligence Digital assets

Most people look at artificial intelligence (IA) from a data-centric perspective. Tech giants often show their strength by exposing machines that challenge humans at games and defeat them. Technology seems thus powerful and magic because everything happens instantly. Their trick is the infrastructure that they rely on. When one looks at it from a legal standpoint, one sees that it is both heavy (1) and soft (2). It is important to note that AI does not necessarily rely on massive infrastructures often called Big Data. Big Data can also be used without AI as in weather forecasting. This is why heavy and soft have been separated in this blog post. It however is possible to combine both to achieve better results.

1 Heavy

A heavy infrastructure is useful to manage the data load (1.1) and to analyse adjacent markets (1.2).

1.1 Managing the load

Any heavy infrastructure is designed to process a big quantity of something. Cortés, Bonnaire et al have highlighted several issues related to massive data processing. According to these authors, quantity may vary over time, and therefore, the infrastructure should be able to cope with peaks in a reasonable time. Moreover, things processed in bulk are more heterogeneous.¹ When one compares manual harvest to mechanical one, one understands that it is difficult to achieve the same level of heterogeneity with the two methods. This is also true for data.

1.2 Analysing adjacent markets

Competitors have been analysing adjacent markets prior to the introduction of AI. This is easily noticeable when one looks at anti-competitive behaviour. The OECD has paid a great attention to anti-competitive behaviours that have an effect on an adjacent market.² Let us take legal cases to see what adjacent may mean. The first case is known as Windows Media Player.

The Court would observe that several internal Microsoft documents in the file confirm that Microsoft made use, by leveraging, of its dominant position on the client PC operating systems market to strengthen its position on the work group server operating systems market.³

An excerpt of an e-mail written by one of the company's senior directors clearly indicates that it was trying to dominate the enterprise market first to dominate the Internet afterwards.⁴

When one looks at the audio player issue, one realises that the background of this matter goes back to the late 1990s⁵. Let us now look at a recent case to notice an ongoing trend that has attracted greater attention since the release of Chat GPT.

The case known as Google Shopping is interesting since massive data processing combined with IA enables a company to put two markets next to each other by creating places and inciting a potential customer to go from one place to another.

In summary, in the contested decision, the Commission sought to demonstrate that Google was positioning and promoting its comparison shopping service on its general results pages more favourably than competing comparison shopping services (Section 7.2.1 of the contested decision); that significant traffic, in other words, a high number of visits, was essential for comparison shopping services (Section 7.2.2 of the contested decision); that Google’s conduct increased traffic to its comparison shopping service and decreased traffic to competing comparison shopping services (Section 7.2.3 of the contested decision); that traffic from Google’s general results pages accounted for a large proportion of the traffic of those competing comparison services and could not be effectively replaced by other sources of traffic (Section 7.2.4 of the contested decision); that the conduct at issue could result in Google’s dominant position being extended to markets other than the market on which that position was already held, namely the markets for specialised comparison shopping search services (Section 7.3.1 of the contested decision); that even if comparison shopping services were included in wider markets also encompassing the services of online sales platforms, the same anticompetitive effects would be felt in the segment of those markets covering comparison shopping services (Section 7.3.2 of the contested decision); and that that conduct also protected Google’s dominant position on the markets for general search services (Section 7.3.3 of the contested decision). In particular, the Commission drew attention to the harm that could be caused to consumers as a result of the situation. It contested the arguments put forward by Google in challenging that analysis, to the effect that the legal criteria used were wrong (Section 7.4 of the contested decision). The Commission also rejected the reasons put forward by Google to demonstrate that its conduct was not abusive (Section 7.5 of the contested decision), whereby Google claimed that it was objectively necessary or that any resulting restrictions of competition were offset by efficiency gains benefiting consumers.⁶

Data aggregation and cross-use make this strategy, often called envelopment, efficient. This is why several countries restrict advanced data use.⁷ These issues are related to data since the Internet traffic is mainly a matter of data. During the 90s, data related to traffic were essential to put road signs where needed while newly connected households were gazing around, waiting for pages to load. It made sense to build a heavy infrastructure, mainly dedicated to ranking that could cope with the increasing number of webpages. It seems that today Tech giants do not only focus on the volume and nature of data process.

Let us see that an efficient infrastructure that processes a large amount of data does not only have to be heavy but also soft.

2 Soft

A soft item can change its shapes to envelop different things. Conceiving a soft infrastructure requires to break things down (2.1) to disrupt any market (2.2).

2.1 Breaking things down

When it comes to AI, there is no such thing as a thing.⁸ How many teddy bears did you see before you started recognising teddy bears? A machine has to analyse more pictures than a human to be able to recognise a new thing because it does not see anything; it just calculates and processes data. A training phase enables a machine to calculate a kind of pattern that matches the training examples better and is therefore likely to produce better results in real life since training is based on real life. This is an issue that has to be addressed by the trust and estate practitioner because digital assets may reduce digital autonomy as seen in an earlier post.⁹ I fear that the trust and estate practitioner may become a compliance officer in charge of personal data and taxation regimes.

Once things have been broken down and are not related to any specific market nor thing, data can be used to disrupt any market.

2.2 Disrupting any market

It has been seen that Alphabet could conquer the price-comparison market by matching relevant data from the web search to direct traffic from one service to the other. This is a data-centric approach. Let us now examine Chat GPT that is a very powerful toy, yet a toy that is pre-trained. It is powerful out of the box, i.e., without any further training. The end user can get relevant results simply by supplying his own data to the machine. Chat GPT may produce very interesting results under human supervision and is more than a search engine. Data aggregation is interesting in the context of AI training. Building tools upon Chat GPT does not require to re-train it, and therefore, data aggregation becomes less relevant. This issue is not entirely new since techniques such as fingerprinting makes tracing on the Internet less dependant on the storage of personal data. I wrote earlier this year that digital assets might reduce autonomy unlike tangible ones.¹⁰ It seems to me, now that Chat GPT becomes more and more popular, that pre-trained engines may simply be forked to match new needs. Forking is easier and cheaper than conceiving anything different. It however increases technological dependance to the forked technology. The infrastructure that supports this technology and its distribution is more important than the data that it processes. Should forking become common practice, competition on digital markets would be less dependant on data aggregation. This is a challenge as far as competition is concerned because its regulation still follows a data-centric approach. Focusing on the infrastructure or the organisation required to train or to distribute pre-trained machines is sensible because it reflects the difference between software development and AI. Once a machine has been trained, code and data matter less than the infrastructure or organisation that supports training. Data supplied to a pre-trained machine usually comes from a customer who may be the end user. Focusing on a customer or an end user does not help to identify the source of an anticompetitive effect. A data-centric approach is thus less relevant in the context of pre-trained engines.

Let us remind that once a thing has been broken down to produce data, these pieces of data can be used in various ways that have nothing to do with the original thing. They can be used to disrupt any market that is similar from the data perspective to one that is well known. A firm can take advantage of this new perspective to conquer a market that once seemed very different.

In brief, a greater attention should be paid to the organisation or infrastructure required to conceive, produce, and distribute AI services.

Cortés, Bonnaire, et al. « Stream Processing of Healthcare Sensor Data: Studying User Traces to Identify Challenges from a Big Data Perspective ». Procedia Computer Science, vol. 52, 2015, p. 1007 and f. DOI, https://doi.org/10.1016/j.procs.2015.05.093. ↩
OECD (2021), Ex ante regulation of digital markets, OECD Competition Committee Discussion Paper, §5.2.2, https://www.oecd.org/daf/competition/ex-ante-regulation-and-competition-in-digital-markets.htm ↩
EUCFI, Microsoft Corp v Commission of the European Communities. Case T-201/04, 17 September 2007 at 1347, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:62004TJ0201. ↩
Ibidem at 1348. ↩
Ibidem at 1046. ↩
EUCFI, Google LLC, Formerly Google Inc and Alphabet, Inc v European Commission. Case T-612/17, 10 November 2021 at 68, https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX%3A62017TJ0612. ↩
OECD (2022), Analytical note on the G7 inventory of new rules for digital markets, §2.7, https://www.oecd.org/competition/analytical-note-on-the-g7-inventory-of-new-rules-for-digital-markets.pdf. ↩
See Artificial intelligence and business models at 1.2. ↩
See Sorting apples or turning homes into castles at 2; Vulnerability and data. ↩
See Digital autonomy. ↩

Go Top