AI and Copyright- MEITY Sub-Committee Report

In India, there has been significant discourse lately surrounding copyright concerns in the development of Generative AI models, the most recent contribution being MEITY subcommittee’s Report on AI Governance in India, which declares that storing and copying works to create datasets for training foundation models constitutes infringement. Moreover, it isn’t protected under Section 52(1)(a)(i) of the Copyright Act.

While I have written extensively about these issues elsewhere, this piece focuses on what I believe is a fundamental misdirection in this debate—from both sides—whether it’s those claiming training-purpose usage is infringement or those arguing it constitutes “fair use.” Let us not even touch fair use. Training models using copyright works (including storing or making copies of them for training a model) is not infringement of any exclusionary right provided under Section 14, period.

The MEITY Sub-Committee’s broad conclusion that models infringe copyright holders’ exclusive rights simply by storing and making training copies of publicly available copyrighted works is deeply problematic. This stance, if accepted, would fundamentally overturn our understanding of copyright law. Here’s why:

Consider the implications of this statement. If the mere act of making and storing a copy constitutes copyright infringement, wouldn’t you be liable for printing or saving an article from my blog to read later? Could I legitimately sue you for that? If you showed it to someone else or uploaded oit n a public drive, then maybe, but otherwise could I?

The essence of copyright—whether it is reproduction, distribution, performance, or other rights—lies in the exclusive ability to express one’s original expression, translating to an ability/ or a right, to stop someone else from expressing one’s original expression. It is crucial to understand that to express is fundamentally a relative concept involving two human beings– the human “expresso” and the human “consumer” of that expression. Copyright claims, in respect of publicly available works, are only available, under law if one has substituted the position of the expressor (by becoming the expressor of someone else’s original expression)- not if someone is a mere consumer of the expression.  This relative relation does not exist in AI training. It merely involves consumption of the expression of the original creation by the model to learn and train itself.

What’s missing from the current debate is a crucial understanding: copyright protects against unauthorized sharing of my work with others, potentially depriving me of credit or economic compensation that I could have gotten by sharing it with them myself. In simpler terms, while I cannot express your original expression without your permission, I can certainly consume your publicly available original expression without the same (maybe (or not?) barring paywall circumvention, which isn’t part of this current debate). The law focuses on unauthorized expression of original publicly available content—not its unauthorized consumption, as making content public already waives that claim.

This is why I struggle to understand how storing or copying for purposes that don’t involve sharing/expressing the original expression, or a substantial part thereof with third parties (what academics often call non-expressive, consumptive copying) could be considered infringement at all. This question needs to be addressed before we even enter the fair use debate, which only becomes relevant after establishing prima facie infringement. If such copying were illegal, simply printing publicly available web pages for one’s learning/consumption would constitute copyright infringement. If I store content for learning, which I might use to produce a potentially competing article, is that infringement? By this logic, academia (a commercial enterprise), which more often than not requires storing and printing publicly available articles for learning the ideas embedded within them, would equal to an enterprise built on infringement of copyright. Fortunately (and thank god for that!) that is not the case.

Developers of models aren’t exposing any humans to the expression of the inputted works—they’re creating an alternate expression. If this alternate expression substantially resembles the original expression used for learning, that will indeed constitute infringement, but that’s fundamentally different from claiming that storing and copying for model training purposes is inherently infringing.

In short – (i) no, copyright is not the answer for your existential crises, and (ii) it is a “scope of rights” issue, not concerning itself with a backend defense of fair use.

The sooner we understand this and get over copyright, the sooner we will look for other arenas that actually resolve the existential concerns.

I welcome your thoughts on this perspective.

Indian Copyright Law and Generative AI: Part 2- Transformative and Extractive Use

Co-Authored with Sneha Jain

Having first considered the question of whether storing copyrightable works for training purposes is reproduction that amounts to copyright infringement under Section 51 of the Indian Copyright Act, 1957, in this second post of this series we will specifically be looking at transformative and extractive uses, applicability of exceptions and limitations under Indian Copyright law, as well as implications of Anti-Circumvention laws.

Transformative Use

India does not recognize the transformative use exception to copyright infringement within the parameters of Section 52 of the Copyright Act. However, the Division Bench of the Delhi High Court in University of Cambridge v. BD Bhandari [2011 SCC OnLine Del 3216][i], has held use of a work for purposes of making a guidebook to be a substantially different purpose from the purpose for which the original work of the Plaintiff was made. The Court recognised this purpose to be a transformative purpose, which did not impinge upon the expressive purpose for which the Plaintiff had an exclusive reproduction right. The reproduction right, or its scope, was thus, arguably restricted by the Court to the expressive purpose for which the original work was curated.

Can a similar analogy be extended to use for training genAI models, where genAI developers argue that not even a single human being is exposed to the expressive content of the work? Not even the Large Language Model (LLM) reads or experiences  the work in its expressive sense, and storage of a single copy merely enables the foundational model to discern, among other things, the “structure, syntax, and semantics of language,” including “grammar, sentence construction, and how words and phrases are related to each other” in order to facilitate the generation of “coherent and contextually appropriate output”[ii].

Unlike the United States where there is a contrast in statute- i.e., the Copyright Act, 1976 itself provides for transformed forms of works to be protectable derivatives, as well as provides fair transformative use to be exempted from infringement, the Indian statute is not clear on whether use of a work for an expressively different purpose, or in fact for a non-expressive purpose is within the domain of the creator’s market. The Division Bench of the Delhi High Court inUniversity of Cambridge (supra) recognised that if the use of the work is of a “transformative character” i.e., the purpose served by the use is different from the purpose for which the work was made, it is a limitation to copyright protection or its subject matter. The Court also held guide books to be a transformed work, not amounting to reproduction of the original. The Division Bench of the Calcutta High Court in Barbara Taylor Bradford v. Sahara Media Entertainment [2004 ILR (1) Cal 15] has also recognised that a work which is taken, and then used for producing a subsequent work that is so changed and muted as to make it transformed, and a different work altogether, would not generate an actionable claim for the owner.

This line of decisions presents an important question. Is use for the purposes of training, to enable the Gen AI model to produce accurate responses to user queries, a part of the expressive purpose for which the work was originally created? Or is it a transformed purpose that is beyond the circumscribed domain of exclusionary rights granted to the copyright owner? Is use for training purposes, when the work is primarily expressive, and meant to be expressively consumed as against used for non-expressive training, infringing? This would require an analysis of what really comprises the subject matter of protection for the owner- their primary and secondary markets – and how much of it is linked directly with the purpose for which the work was created- expressive purpose or training purpose? In other words, does use of a copyrighted work for a non-expressive/ non consumptive purpose amount to copyright infringement, or is it a distinct and transformative purpose outside copyright’s boundaries/scope of protection?

Extractive Use

A distinct question here deals with use and copying of even protected material for arguably extracting unprotectable elements, that would otherwise not be possible to be extracted. The affirmative essence of such use is to extract unprotectable elements from copyrighted works, elements which are not a subject matter of copyright protection.

In Akuate Internet Services Pvt. Ltd. v. Star India Pvt. Ltd [2013 SCC OnLine Del 3344][iii], the Division Bench of the Delhi High Court has recognised that copyright’s balance is maintained by ensuring that information, facts and knowledge embedded within expression cannot be monopolized using Copyright law. The Court has further held that protection cannot be extended to information and facts embedded in protectable works, even under the premise of unfair competition. Extending the same would inevitably restrict the ability to extract and disseminate information which is a critical component of Article 19(1)(a) of the Constitution of India. Thus, Indian Copyright jurisprudence clearly recognizes that information embedded within expression is not protectable and no monopoly can be extended in respect thereof. The said rationale of balancing copyright protection with access to unprotected information for the purposes of furthering expressive and speech values has also been recognised by the Division Bench of the Delhi High Court in Wiley Eastern Ltd. v. Indian Institute of Management [61(1996)DLT 281].

This is furthered by the idea expression dichotomy under Copyright law that is widely accepted in Indian Copyright jurisprudence. Useful information contained in any expressive work is not protected. It is only the form in which the said information is contained/presented that is a protectable expression for purposes of Copyright law. This is line with the fundamental purpose of Copyright law which is to reward and incentivize/enable production of creative expressive forms, that disseminate useful information. This, as Prof. Molly V. Houweling recognizes, is not because information and facts are not valuable enough to justify copyright but rather because they are so valuable that they belong to the public domain for everyone to be able to access.[iv]

For instance, in the case of a poem that expresses conceptions of thoughts, copyright in the poem gives no monopoly in the ideas or conceptions of facts expressed by the said words, but merely to the arrangement of the words used to express those thoughts. Others have a right to discern that information and exploit the information within, provided they do not substantially reproduce/adapt/communicate to the public, the concrete form in which the ideas have been arranged or put into shape.

The basic rationale for protecting uses of copyrighted expressions which are not reproductive of the expression or expressing form but are merely to extract the ideas or the unprotectable elements embedded within, flows from this idea expression dichotomy. For extraction however, it is arguably necessary and could be essential to access the whole copyrighted expression, and even store it, without exposing it in its expressive form to a single human being- which is exactly what GenAI systems often do. Without such access to the complete work, extraction of embedded information becomes impossible, inevitably extending copyright protection to such unprotectable elements. That, of course, is not a desired outcome of copyright policy. In other words, copyright does not give the “right to control access” to extract unprotectable elements (Anti-circumvention provisions do- which are dealt with below). It merely gives the right to exclude reproduction/adaptation/communication of the expressive form of the work (No wonder, Section 14 of the Copyright Act does not include “right to control access” within its sub-provisions).

Even well recognised doctrinal principles like the merger and scenes a faire doctrines in Copyright law provide scope for extractive uses of seemingly expressive elements. These doctrines recognize that unprotectable ideas, facts, stock characters, incidents, images and themes sometimes do not lend themselves to a wide variety of expressions. Thus, these doctrines prohibit protection of seemingly expressive elements that represent only a few limited ways of expressing certain ideas. Without being able to extract these seemingly expressive elements which have merged inseparably with the unprotectable limited ways of expressing ideas, and use them, the purpose of the idea-expression and merger doctrine is rendered illusory.

The analysis may, thus, focus on the nature of the expression used, and the purpose of storing that seemingly expressive expression i.e., merged into an idea – whether it is to extract informational content out of it, or for expressively reproducing it? Many a times, we will realize that without accessing, copying and using the entire expressive form that is protected, extracting unprotectable ideas out of such expressions would be impossible.

Codified Exceptions and Limitations:

Under Section 52 of the Indian Copyright Act, fair dealing for the purposes of private or personal use, including research is permissible. An important question that Courts will have to grapple with, as they deal with extension of legal personality to Artificial Intelligence Technologies (separate article soon!), is whether use by AI systems for training and for its models to learn would be private or personal use, that does not expose the expression to a single human being apart from the AI system. Moreover, whether private use by a corporate entity like Open AI for its own learning and development (for its models), even if that learning leads to a competitive product, is permissible or not will also have to be examined. Would the defense of private or personal use under Section 52(1)(a)(i) of the Copyright Act only extend to humans or also to corporates or juristic personalities?

On the side of research use, it is arguable that use for the purposes of extracting information embedded in expressions, without exposing a single individual to the expression, could amount to research use that is protectable under Section 51(1)(a)(i) of the Copyright Act. Importantly, the explanation to Section 52(1)(a) also provides that storage for fair dealing for a private or personal use, including research, is not infringing.

These questions at the back end, however, will only arise if Courts, in the first place, deem such storage and use for training purposes, to be a part of subject matter of protection under Section 14 of the Copyright Act.

Anti-Circumvention and the Training stage (Para-copyright right to “control access”)

Anti-circumvention provisions under Copyright laws are essentially to prevent unauthorized access to copyrighted works that are safeguarded in the digital realm using modes like, inter alia, paywalls etc. In the United States, New York Times in its complaint against Open AI has alleged that Open AI has trained its model by circumventing paywalls and unauthorizedly accessing its copyrighted protected articles that are behind technological protection tools that prevent circumvention. The allegation is synonymous to unauthorizedly circumventing its security measure put in place to prevent access, for purposes of training the model. Would a similar act be actionable under Indian Copyright law?

Section 65A(1) of the Copyright Act provides that circumvention of a technological protection measure is forbidden under the Indian Copyright law. It is the only provision that controls the “access” to copyrighted digital works and is a para-copyright measure to ensure that even unauthorized access is actionable. However, importantly, Section 65A(2) specifically prescribes that technological protection measures can be circumvented if it is for purposes that are legal, or not expressely prohibited by the Act. This provision was specifically inserted keeping in mind the importance of access for permitted purposes. The Standing Committee that was constituted for the 2010 Copyright Amendment Bill, that translated into the Copyright Amendment Act 2012, specifically argued that without a provision that allows circumvention of technological protection measures for permissible purposes under the Act, access to works for permissible purposes would be impossible and exceptions and limitations to Copyright Act would be rendered redundant – “In the absence of the owner of the works providing key to enjoy fair use, the only option was to circumvent the technology to enjoy fair use of works.”[v]

Thus, if Courts find use for training purposes transformative, extractive or outside the subject matter of protection, or for that matter, permitted under Section 52 of the Copyright Act, circumventing technological protection measures to enable extraction would be permissible under the Copyright Act.

Section 65A (2) however comes with a condition, i.e., every person facilitating the circumvention of a technological protection measure (“hacker”) has to maintain a complete record of the name, address, and all relevant particulars of the person (“fair dealer/user”), as well as the purpose for which he has been facilitated. So long as this is maintained by the hacker, Section 65A (2) allows circumvention of technological protection measures. Importantly, this also ensures keeping a record of every protected work that is accessed for training purposes, for the purposes of technologically facilitating attribution, which is a desirable goal of copyright policy.

In the next part of this series, we will transcend from the training stage to the output stage, to analyze whether outputs produced by GenAI systems would be violative of the owners reproduction or the adaptation/derivative rights.


[i] Special Leave Petition before the Supreme Court bearing – SLP(C) No. 029951 / 2011, dismissed vide order dated 27th January 2016

[ii] Understanding Generative AI and its relationship to Copyright, Written Testimony of Christopher Callison-Burch before the U.S. House of Representatives Judiciary Committee Subcommittee on Courts, Intellectual Property, and the Internet Hearing on Artificial Intelligence and Intellectual Property: Part I– Interoperability of AI and Copyright Law, available at <https://docs.house.gov/meetings/JU/JU03/20230517/115951/HHRG-118-JU03-Wstate-Callison-BurchC-20230517.pdf&gt;

[iii] SLP(C) No. 029629 / 2013 pending before the Supreme Court.

[iv] Molly S. Van Houweling, The Freedom to Extract in Copyright Law, (unpublished draft on file with the author)

[v] Standing Committee Report on the Copyright Amendment Bill 2010, available at https://prsindia.org/billtrack/the-copyright-amendment-bill-2010#:~:text=The%20Bill%20allows%20for%20the,for%20use%20by%20such%20persons.

Copyright Infringement is not a Cognizable and Non-bailable offense. It can never be. Period!

Note: This post, for a change, is in the context of a case law rendered by the Karnataka High Court in ANI Technologies Private Limited v. State of Karnataka holding Copyright infringement to be a Cognizable and Non-Bailable offense in Indian law. However, this post is much more than a legal comment on the interpretation resorted to by the High Court. Read on to find out more:

In a recent precedent from the Karnataka High Court, Copyright infringement involving an element of mens rea and qualifying within the contours of Section 63 of the Indian Copyright Act, 1957, has been held, once again, to be a cognizable and non-bailable offense.  Legally, this judgment clearly goes to the teeth of the Delhi High Court’s (J. Bakhru’s) ruling in Anurag Sanghi v. State, where it was clearly held that these offenses have to be non-cognizable and bailable, even if one disagrees, due to the binding Supreme Court precedent in Avinash Bhosale v. Union of India (2007) 14 SCC 325 – where it was clearly held that “up to 3 years”, because of being inclusive of offenses which are punishable for less than 3 years, has to be read to be categorized in Item III of Part 2 of Schedule I to the CrPC. Any other interpretation would lead to a situation where even an offense where the punishment prescribed is less than 3 years, is rendered non-bailable which cannot be permissible as per the stipulation in the schedule. The Karnataka HC has clearly ignored this precedent, and its reliance by the Delhi HC in both Anurag Sanghi (supra) and GNCTD v. Naresh Kumar Garg, rendering the decision clearly per incuriam, i.e., in ignorance of binding law. In any case, the offense of Copyright Infringement being raised to a level of being non-bailable, inspite of other offenses in part I of Schedule 1 of CrPC which are punishable for 3 years (sl. No. 181, 193) being bailable, clearly goes to the teeth of the rationale expended by the Karnataka HC. Further, in case of ambiguity in the statute (the Copyright Act does not specifically mention whether the offense is cognizable or not, and provides punishment up to 3 years, which is sort of a “no-man’s land” and a cause of confusion), the rule of lenity requires an interpretation in favour of the accused to be taken. Therefore, legally speaking the judgment is clearly flawed.

However, this post is not about that.

This post is about how this judgment is extremely unmindful and ignorant of the scheme and provisions of users’ rights, and limitations to copyright which are present within the Copyright Act itself. This post is about reiterating the fact that Copyright is not a natural monopoly, but rather a carefully constrained legal monopoly, which is not unconditional or a ground to curb liberty of citizens whatsoever.

Judgments holding Copyright infringement (whether conclusively determined by a Court of law or not) as a cognizable and non-bailable offense under Section 63 of the Act, have led to quite an uncertainty for those who seek to use Copyrighted works for uses that are protected and are recognized as limitations to copyright and fundamental to speech purposes, deterring them from resorting to practicing permitted speech, and in effect resulting in a chilling effect on culture. After all, the police, while arresting (and curbing liberties) cannot be expected to figure out what is permitted under Section 52 of the Act and what is not, right? How does the police figure out as to whether borderline uses/dealings are limitations to infringement- or infringement? Given that Courts have even held that even commercial uses can be termed as fair use (Super Cassettes v. Hamar Television), there is no reasonable way for Police to prima facie determine as to whether the alleged offense is one that is statutorily protected or not, with its genesis in other fundamental constitutional obligations (reiterated in Wiley v. IIM).

I ask myself (as colloquially used in courts!), can the liberty of an individual, practicing legitimate speech be statutorily curbed in spite of their being a chance that the speech is protected under the same very statute? What if the unlicensed use of copyrighted content by Ola Cabs in the said case before the Karnataka HC could have come within the domains of Section 52 of the Act? Could the police determine the same? Would arresting an individual, in spite of the possibility of the use being within the contours of Section 52 to be determined by the Court, be in any case justiciable?

In effect, in a country like India, with an indigenous culture that is primarily derivative, and dependent on existing inputs to develop further cultural outputs, as a mode of learning as well as a mode of cultural practice (be it qua musical works, the guru-shishya parampara, or many such works where transformative-ness and derivative usage are the core of cultural performance- similar to sampling in hip-hop cultures), more so in cases of religious cultural outputs (which are also infact protectable, shockingly), people who resort to doing the same may be arrested with their liberties being given less of a preference as compared to overarching proprietary claims which often rest with corporate entities which did not put in any “skill and judgment” to deserve such statutory incentives.

Could this ever be the intent of the law?

It is extremely essential for Courts in India to realize the “EQUAL EXISTANCE” of Section 52 in the same very Copyright Act. Yes- the same Act, provides for certain dealings and uses with the works, to be exempt from being termed as infringing, or as the Supreme Court of India has heldnot reproductions that are infringing for the purposes of Section 51 of the Act. Unless and until, a Court of law/ judicial authority clearly comes to the conclusion, at least on a prima facie basis that the use/dealing which is infringing does not come within one of the limitations, arresting anybody or taking cognizance merely by a police officer (not a judicial authority) on the basis of an FIR, would be completely contrary to the purpose of the existence of Section 52 within the scheme of the Act. It is also important for Courts to realize that Copyright is a statutory monopoly conferred to someone who imputes their skill and judgment. As the Supreme Court has held, it is against the general course of our constitutional schema- which discourages monopolies. To regulate and “create a balance”, regulatory safeguards to this monopoly in the form of limitations have been prescribed under Section 52.

Merely because there is an economic loss to companies who now own these copyrights (due to assignments or employment contracts), possible infringements [without a clear legal determination of the same by a judicial court of law (at least a magistrate)] cannot, ever, be a ground to deny liberty and protection qua permitted speeches to people, especially when there is still a chance of penal consequences without there being a need to curb human liberties. Moreover, it is also important to realize that the companies which bank on copyright transfers as the genesis of their business models are in fact legally de-risked and are treated at a higher pedestal than any other business dealing in any other normal (or essential) commodity. This is merely because of their “investment” in products of skill and judgment, and not their creation of the same. Therefore, to have them cry out loud in the case of every ‘possible’ infringement, and to ask courts to arrest individuals committing acts, which may or may not be infringement (until determined by courts) can never be desirable policy.

Section 52 is an equal right/ freedom, and not something to be overlooked. In fact, it is the section that renders the Copyright Act constitutional, and in fact saves it from being vulnerable to unconstitutionality under the Indian Constitutional schema. Courts ought to be mindful of the same while dealing with cases of alleged infringement under the Copyright Act, 1957.