Publishing Aggregated Information from Scientific Papers


#1

Dear all,
I have a more general question about publishing (open) data, but I feel that this category seems fit. Feel free to redirect me though.

I’m involved in a project which has collected data to be used for heat modelling. A lot of information came from scientific papers. Single values and tables were taken from a range of papers and rearranged in a new set of tables with a distinct structure. The origin was recorded with great care (like it should be done in any scientific research) and the source for all information is available in an extra column.

My question is the following: Are we allowed to even publish this data?

Technically what we created is a kind of remix of snippets of information from different papers - which is what science does. Were we to write a new scientific paper, say in a metastudy, it would be correct and good practice to cite all the sources and go ahead with publishing it, in a peer reviewed process. It fulfills a required level of scientific originality and creates an overall value.

Copyright law appears to be in our way, however. From what I understand, there is a copyright on the arrangement of information in a table. If you take a significant amount of any given table and reprint it in another context without asking for permission in advance, that’s a copyright infringement.

This leaves us in a kind of legal limbo. If we wanted to publish our data, we could probably do so with some scientific publisher. Most of the data comes from other papers, which in turn took numbers (partly) from yet more papers. It is common practice.

Publishing the same data in a paper with an open license is somewhat of a legal gray area, because (low likelihood aside) we’d be risking lawsuits from the publishers of the original information, whenever the source was from not-open publications. Publishing the data itself in a database is likely to be even more risky. Finally, publishing and licensing everything with an open license (which would be our ideal scenario) would be in a pretty dark legal gray zone - as far as we can tell.

This is a problem that should affect most scientific research and publishing open data, so I’d like to hear how all of you are dealing with this issue. Thank you in advance for any thoughts.


#2

Hi @christian.hofmann Sorry for taking a while to reply. Here are a few thoughts. I presume you are interested in German law. Let’s briefly look at the general case and then concentrate on non‑commercial scientific research. As indicated later, I will return to the general case in a future posting because it allows for licensing open content.

Release 02
Changes : improved material on payments, added expiry date

General use cases

This section applies, in the event that your work falls outside of exemptions provided under the UrhG and UrhWissG (described later) and/or that you want to attach an open license.

Wikipedia generally is highly risk averse to copyright violation. So a hunt through their contributor guidelines might yield something? Wikipedia EN operates under United States law and specifically 17 USC (US Government 2016) regarding copyright. I guess wikipedia DE operates under German law and specifically the German Copyright Act or Urheberrechtsgesetz (UrhG) (Juris 2018).

To be developed in a future posting.

Non‑commercial scientific research use cases

Copyright law in German offers a set of statutory exemptions, several of which cover scientific research. In contrast, the United States provides for a number of positive fair use defenses instead, which have been progressively refined by case law.

The German Urheberrechtsgesetz or UrhG is somewhat different from copyright law in other countries in that authors are accorded more protection and exposed to fewer exceptions. That used to be set against a higher threshold for creativity, but international harmonization has meant that the hurdles for attracting copyright in Germany have dropped substantially (Davidson 2008). Limits and exemptions to the UrhG are summarized on Wikipedia DE.

The UrhG was recently amended to cover text and data mining under §60d. Your use case does not fall under this provision. Rather that section covers the storage (but not republication) of large text and data corpuses, typically harvested by scraping the web.

There is also a new act: the Urheberrechts-Wissensgesellschafts-Gesetz (UrhWissG) or Act to Align Copyright Law with the Current Demands of the Knowledge‑based Society (Bundestag 2017) which entered into force on 1 March 2018. The UrhWissG reforms the terms‑of‑use of copyright protected works in the fields of research and education within Germany. There is only a small stub on this legislation on Wikipedia DE at the moment.

The UrhWissG amends the UrhG and other statutes. Most measures expire on 1 March 2023 unless extended by parliament following a mandatory review.

Regarding the UrhWissG, Stary (2018) reports on a legal blog (emphasis added):

Thus, up to 15 percent of a work may now be reproduced, distributed and made available to the public for the purpose of non‑commercial scientific research (UrhG §60c) and illustration for teaching in educational establishments (§60a). In addition, for personal scientific research, the part that may be reproduced amounts to 75 percent. Moreover, restrictive explicit references to means of communication, such as ‘by mail or fax transmission’ or ‘exclusively as graphical data’ have been removed.

Another novelty is the permission to use illustrations, individual contributions, other small‑scale works and out‑of‑print works to their full extent for the purpose of teaching and scientific research.

The phrase “reproduced, distributed and made available to the public” is extremely significant. That covers all use cases related to republication. Prior consent is not required. The relaxation of “personal” in relation to research (I never did understand what that was getting at, perhaps individual professorial chairs?) relative to the UrhG is also significant.

Note that “non‑commercial scientific research” includes research supported by public funding agencies but excludes university work commissioned by third parties who could benefit (this matter will be covered in a BMWi‑sponsored report on energy data scheduled for release at year end).

The “out‑of‑print works” provision goes far beyond that currently provided for orphan works more generally.

Exemptions for quoting now apply to illustrations as well (Pachali 2018).

It is possible that an “isolated article” from a “professional or scientific journal” might also class as a “small‑scale” work here — it does at least under UrhG §60c.2.

Because there is no associated license, recognizable or otherwise, with this process, it is contingent on the recipient of such material to know which exemptions apply and how they may exploit them. This is poor signaling in my view (who had even heard of the UrhWissG until now?).

Licensing

The question of additional licensing but not re‑licensing is covered in the UrhWissG (Stary 2018) (emphasis added):

Concerning the relationship between statutorily permitted use and licensing agreements, the new law provides for the predominance of the statutorily permitted use, as long as a licensing agreement on the use of the protected work has not been concluded. This means in practice that licensing agreements can be concluded for every use exceeding the permitted percentage of a work to be used without prior consent of its author. Where a licensing agreement has been concluded, the remuneration is paid in accordance with the licensing agreement; the remuneration system provided for in the law does not apply. At the same time, the new law does not preclude the possibility for authors to allow the cost‑free use of their work via open access.

Note that the phrase “open access” spans a range of contexts, with cost‑free downloads at one end of the spectrum and genuine open content at the other.

Open licensing

The question of whether one can add a CC‑BY‑4.0 license to works covered by UrhG/UrhWissG exemptions — or any derivatives thereof you might create — upon distribution remains unexamined (as best I can tell). A CC‑BY‑4.0 license of course adds attribution conditions that fall within the norms for science in any case, whilst noting that academic crediting and legal attribution are not entirely equivalent (Fisk 2006).

But there is surely another problem in attaching a CC‑BY‑4.0 license to original or derived works covered under an UrhG/UrhWissG exemption for science. And that is that a CC‑BY‑4.0 license removes the restriction regarding non‑commercial scientific research and makes all domains of application acceptable. The Open Definition (Open Knowledge International no_date), to which both licenses conform, requires that:

§2.1.8 Application to any purpose

The license must allow use, redistribution, modification, and compilation for any purpose. The license must not restrict anyone from making use of the work in a specific field of endeavor.

Therefore, to open license a derivative work (as the original question posed), one will need to argue that the facts and data that were extracted from the scientific record are not under copyright protection, given that they are not already suitably licensed.

Payment

There is another potential catch too (Stary 2018) (emphasis added):

The various permitted uses of works without the prior consent of their author finally all give rise to an equitable remuneration in an amount to be determined on a flat‑rate basis or via a usage‑related calculation, based on a representative sample of usage. The adequate remuneration has to be collected by a collecting society.

Collecting agencies, such as VG Wort, will continue to collect from universities using flat‑rate formulas and sampled estimates for usage, with the details to be agreed between the education ministry and the academic publishing houses.

Conclusion

My assessment is that one can republish as follows:

  • A paper in a scientific journal containing presumed previously copyrighted material under the statutory exemptions provided by the UrhG/UrhWissG. In which case, the previous authors will retain copyright to their contributions and you will gain copyright to your contributions. Science can proceed but openness cannot.
  • Establish that the extracted data does not attract copyright. In which case, you are free to include it in your own work as you see fit and to publish under conditions that best suit your purpose, including a CC‑BY‑4.0 license or a CC0‑1.0 waiver.

I’ll return to this second point in a follow‑up posting. It involves looking at the general use case in more detail. There does not seem to be much legal analysis or case law on this though. The treatment of numerical data in intellectual property jurisprudence is mostly notable through its absence.

Closure

I recommend reading Stary (2018) in full. There is material on teaching and on institutional libraries that is not directly relevant to the original question but nonetheless useful.

The best solution — indeed the only viable solution if public transparency is also a criteria — is for science to move to CC‑BY‑4.0 licensing if legal attribution is required and CC0‑1.0 otherwise. Normative scientific crediting can apply irrespective of the license type.

The underlying problem with the UrhG and the UrhWissG — and also the EU database directive 96/9/EC — is that use cases considered in these statutes and the realities encountered in the wild are utterly divergent. Moreover, legislators don’t seem to be able to bring themselves to understand the characteristics, applications, and advantages of genuine open licensing, both for science and elsewhere.

Comments and corrections welcome. HTH, R.

References

Bundestag (30 July 2017). Gesetz zur Angleichung des Urheberrechts an die aktuellen Erfordernisse der Wissensgesellschaft (Urheberrechts-Wissensgesellschafts-Gesetz — UrhWissG) [Act to align copyright law with the current demands of the knowledge‑based society (Urheberrechts-Wissensgesellschafts-Gesetz — UrhWissG)] (in German). Köln, Germany: Bundesanzeiger Verlag. ISSN 0720-2946. No official translation into English.

Davidson, Mark J (January 2008). The legal protection of databases. Cambridge, United Kingdom: Cambridge University Press. ISBN 978-0-521-04945-0. Paperback edition.

Fisk, Catherine L (2006). “Credit where it’s due: the law and norms of attribution”. Georgetown Law Journal. 95: 49–117. ISSN 0016-8092.

Juris (2018). Act on Copyright and Related Rights (Urheberrechtsgesetz, UrhG) — Amendments to 1 September 2017 — Official translation. Saarbrücken, Germany: Juris.

Open Knowledge International (no_date). Open Definition 2.1 — Defining open in open data, open content and open knowledge. Open Knowledge International (OKI). Cambridge, United Kingdom.

Pachali, David (1 March 2018). Neues Urheberrecht für Bildung und Wissenschaft: Das gilt ab dem 1. März [New copyright for education and science: this applies from 1 March] (in German). iRights — Kreativität und Urheberrecht in der digitalen Welt. Berlin, Germany.

Stary, Catherine (15 January 2018). German reform on the use of copyright protected works in the fields of education and research will come into force soon. Kluwer Copyright Blog. Alphen aan den Rijn, the Netherlands.

US Government (December 2016). Copyright Law of the United States and Related Laws Contained in Title 17 of the United States Code. Washington DC, USA: United States Copyright Office.


#3

Wow. Thank you @robbie.morrison for this comprehensive, yet specific response, complete with basic information, clear display of imponderables and topped with sources and conclusion. This is really helpful. We’ll take some time to let that sink in.