Breakout group on openmod working paper

I’d like to run a breakout group to continue / revive the openmod working paper:

https://openmod-initiative.github.io/openmod-working-paper/

I will put together some notes on how I think we can most effectively get this going, and we can discuss here in advance. Stay tuned and feel free to add ideas/comments/expressions of interest here.

Sounds like a good plan! How is the paper to be differentiated from this one:

http://www.sciencedirect.com/science/article/pii/S0301421516306516

Not having seen this draft before, I read it with a fresh pair of eyes.

I am not sure who the intended audience is or what the ultimate objective is. So my comments may reflect my own misinterpretations in this regard.

The early part of the document seems to rather dwell on problems associated with the sourcing and cleaning of data, whereas, in practice, a number of projects are now successfully tackling these issues — the document could be a bit more upbeat here.

Moreover, when discussing case studies, some additional energy database projects (perhaps OPSD) should be included.

The first half makes a number of more-or-less unsupported statements about open data and open-source development, while the second half discusses specific projects and their experiences. There needs to be a better integration between the two, but I am not sure the best way of tackling this problem though.

A few comments:

  • a need to separate the concepts of data availability (a retrieval issue) and data use and reuse (a licensing issue)
  • the section on data processing should make reference to the need to establish community data standards and perhaps ultimately published norms
  • the distinction between optimization models and simulation models is too polarized: LP and MIP models contain elements of simulation and simulation models can use optimization to resolve processes like market clearing
  • more focus on real-world examples may be needed early on, especially in relation to the open-source ideals of cooperation and community development
  • the discussion on mathematical programming is too technical and contributes little or nothing
  • the release date and details of the open government license (OGL) for the UKTM example are needed (I was not aware that UKTM has been published and it is now 2017, correct me if I am wrong)
  • more analysis on real-world license choice issues would help — for instance, how has license choice impacted on the success of various projects (I tend to the view that the licensing of energy models is not particularly significant, we are not talking systems programming or library development here)
  • licensing issues could be accorded their own section
  • given that a fair chunk of model development is undertaken by PhD students, some discussion on multiply-authored code in this context is probably warranted
  • in the discussion, the list of benefits accruing from the UKTM being open-source needs to be backed up with secondary sources

And a few trivial points:

  • the “NaN” term is rather obscure, knowledge of floating point arithmetic should not be a requirement for the reader
  • incomplete references make some parts hard to follow
  • the term “open code” might be better than “open source” in some places, including figure 1
  • I have no idea what this is getting at: “Reproducibility is not guaranteed due to possible changes in the back-end software.”
  • figure 2 should note that hybrid typologies exist
  • if you are going to discuss mathematical programming languages, you should mention GLPK MathProg
  • why do open-source platforms “regularly lack … sound support” — my experience is quite the opposite
  • I don’t agree with the statement “the most important point is that publishing something is better than publishing nothing” — the best open-source projects have a tremendous commitment to software quality (for just one example, the GLPK solver)
  • describing copyleft licenses as “viral” is somewhat pejorative and unnecessary
  • the GPL-licensing of GAMS code is described as “incompatible” — this needs a reference
  • I don’t get the point about the use of commercial solvers being problematic, the only issue is cost surely as there should be no problem linking CLPEX to your own open-source C++ model (or am I missing something)

To address Tom’s question about the differentiation with Pfenninger et al (2017). The statement offers a number of case studies that the paper does not. The statement is directed toward practitioners (I would guess) whereas the paper is more general. That said, there are some distinct similarities.

Another key difference is that the statement traverses data and code licensing in substantially more detail. (I presume that is a central objective of the statement.) However that treatment would benefit from being pulled out of the main thread and accorded dedicated sections on licensing and copyright for data and code. Database rights should also be mentioned. A paragraph on the licensing of documentation (under either a GNU documentation license or a Creative Commons license) would complete the discussion.

Finally, and without being too fat-headed, the two Wikipedia pages I started on open data and open code contain useful background and references which could contribute to this statement. Ditto for the openmod article. On that note, which license is going to be used for this document? I suppose CC BY 4.0 (like the openmod wiki), in which case material cannot be simply copied from Wikipedia.

Stefan suggested I post a further thought that I mentioned to him in passing.

The question of metered energy data being made available to the account holder might also warrant a mention (depending on where the article heads). Some jurisdictions do not mandate that electricity companies pass back the information that their smart meters collect. This is because (somewhat ridiculously) the household profile is considered proprietary information that might give an alternative supplier a head start if shared. This real-time metered data should of course be unencumbered. Home automation is beginning to gain traction and is a central component of the nascent smart grid concept.

Thanks for the extensive comments @robbie.morrison.

Differentiation from paper posted by @tom_brown could be a focus on the conceptual model of the “modeling cycle”, and the case studies with lessons learned from them.

I mentioned to Stefan a long while ago (probably at the London workshop) that I’d be happy to help out with this. We ended up discussing some of these questions yesterday, so I thought I’d note them here:

To expand on how to differentiate this… I see the Energy Policy paper (and Stefan’s more recent Nature comment) as rallying calls: stating why open modelling and data is important. They state why, perhaps this longer paper could be the bible on how to do it. I for one would value an easy-to-digest overview of the various considerations - what license to use - what even the different licenses mean - where you can host your code - repositories you can use to make your model more visible - etc. etc.

I see the target audience as PhD students and other early-stage researchers (the ones with open hearts and minds) who are developing datasets and code, and want to know what to do with it to help benefit the wider community and gain some recognition. I think it is still by chance that many people find this community, and thus a resource on how to open up their models (at least, that was my experience). If we plant a big flag in the sand saying “this is what you could do to improve the situation” hopefully more will come.

Stefan and I thought about the paper structure as follows:

1 Introduction / Motivation would be a short summary of the ‘wonderful reasons for openness’ that are in the existing paper and comment

2 Modelling Process needs some consideration in light of Robbie’s comments

3 Key Choices in Going Open – I see this as the real value in this paper - combining the legal and practical knowledge of this group and distilling it in a way that others can easily understand and follow

4 Case Studies – this is merged in with 3 at the moment, but I think it should be separate and perhaps come earlier (as Robbie suggests). UKTM is perhaps not the greatest example, as far as I’m aware it is still not actually available outside of UCL… Stefan & I wondered if we should have a diverse range of models, both covering different paradigms and bringing in people from around the world. An initial suggestion would be case studies on Calliope, Temoa, Osemosys, the Renewables.ninja, and Open Power Systems Data…