Pioneering the Linked Open Research Cloud

License
CC BY 4.0

Summary

The first Pioneering the Linked Open Research Cloud (Decentralisation Scholarly Communication) session took place at the Linked Data on the Web workshop at WWW2017, on 3rd April 2017, from 15:30 to 16:30.

Participants

Speakers of the minutes:

  • RV: Ruben Verborgh
  • JD: John Domingue
  • SC: Sarven Capadisli (moderator)
  • BR: Blake Regalia
  • CD: Christophe Debruyne
  • IH: Ivan Herman
  • JL: Jens Lehmann
  • AG: Amy Guy (scribe)
  • DS: Daniel Schwabe
  • A: Armin Haller
  • SD: Stefan Dietze
  • SN: Sebastian Neumaier

Complete transcript of talks and discussions

RV
The magic word is "incentives"
  • incentives for the community
    • practical applications of Linked Data do not get academic visibility/credit
  • individual incentives
    • “people do it just because they are the kind of people who want to do the right thing, without immediate reward” ~ timbl's TED talk
    • for RV personally: making this data visible helped (cfr. his presentation)
JD
Two strong things happening. Career progression, and funding bodies that give away money. REF in UK determines the quality of outputs of departments at universities, which ultimately is turned into money by the govt. The cost of the last REF was 1/4 billion GBP. Panel members reading 1500 papers each. How does the technology we have relate and support careerprogression. and can we really lower the process in the evaluation? If you're responsible for evaluating a department, because of the sheer volume, you use the publication venue as a proxy for quality. That lowers the process cost. Just supporting sparql queries is a long way from someone getting promoted. Maybe more people would have a higher citation count from the LOD cloud, but until that shows...
SC
we're given a package by the publisher of how to submit research. Still we want incentives and quality. Just because we want self publishing doesn't mean we want to give up quality.
JL
if the publishers would agree to have submissions in formats which allow annotations we would like to have and we could integrate them into the process, we wouldn't need to overhaul the publishing model. Can we ease into this with other technology, eg. semantic annotations in PDF?
SC
This community is ahead of the curve, it's not like we don't have expertise or tooling. Just because we can, but there's also nobody asking for it. You're waiting for your publication to actually appear in some publication, why bother? For this event we said you can still do PDF/latex but you can also do HTML/RDFa. The end result is that the proceedings are at CEUR. It's not like there is any dependency on a publisher to have it in a certain format. So why did we have 10 submissions made in PDF and 1 in HTML/linked data friendly format? Even though it was open.
A
Agree we need incentives. It's not just about people like us who like it. Search engines are doing a great job at improving search results for schema.org.. we need to convince the major publishers that we get a benefit from embedding RDFa. Better visibility, quicker resolution of entities. The two main publishers are the ones who provide incentives.
A
We produce pdfs because they ask for it. Producing HTML/RDFa we don't get anything in return. But if publishers made use of structured data, your dblp entry could be resolved better or something, that would be an incentives. Publishers need to resolve and use this data correctly.
RV
I also have doubts about the perceived quality that institutions put into their publishers. They shouldn't have all the trust. There's some correlation with quality but it isn't always the case. Putting too much weight into the publishers is too much, it's definitely a complex problem with different angles. Springer does a great job of generating HTML, but first they need a PDF. Incentives for them.. what's their incentive to do it? Springer has the scigraph thingy now. They say if you publish you get more value, they're getting the idea, but it took a lot of internal pressure to get to there. If we only have to wait for the publishers, we can wait forever. We need to attack on multiple fronts.
??
We deal with raw specimen collections, instruments, and we have a registry of datasets and reports - research assets of the organisation. We see the value of having this as LD with persistent identifiers. We are using it for physical objects. The discovery of research assets in the organisation. That's how we see the value.
SC
What are the incentives for any of these bubbles to get their stuff out there as LOD? A lot of funding went into the LOD Cloud... Some of them due to open data initiatives in some countries, health data, whatever. We can agree that there needs to be some incentives. Whether those subject to particular group or company to provide. Who do we wait for? There's a question of having it accessible. We're often subject to giving exclusive rights to these institutions. Even if the data is available as LD, can I access it if it's behind a paywall? It's only available to a small set of people. How do we approach the problem from both directions?
CD
A lot of our research is funded with public money so why aren't we outputting this way anyway? That could maybe be the ideal compromise, publishing results but not papers.
SD
Two very different questions in the end. The Open access problem, is entirely disconnected from the format.
SC
If it's going to end up in the LOD Cloud, it's got to be human and machine readable. We have a challenge of separating the issues themselves. if we have to deal with one we have to deal with the others.
JD
Maybe it's three issues. The third might be trying to link a modern technology like LOD to a 15th century medium of rooms of paper. At some point you may have to give up. Why is it that we can do these wonderful things with LD but then say "a paper is only 15 pages long with an abstract and references". That's another thing.. it's almost the analogy of when film making was invented.. the first thing people did for decades was to film plays. Took 50 years for editing to start. It's take the old process and add LOD as a layer on top, rather than saying what does it mean to share human knowledge?
SC
Nobody told Darwin to limit his ideas on origin of species to 5 pages. We're here today because some of these constraints were not in place throughout history. Now we have reviews, volumes, publishers, has to fit within their boundaries. You have to have all of these checkboxes ticked before it is considered a contribution to the body of human knowledge. You need funding to attend a conference, you have to get an okay from reviewers, give rights to publisher. You can always argue what's a good way of representing something vs having this media which gives us the possibility to have a more interactive or social angle. It's about communicating and educating at the end of the day. If people can't access or read my output then what good does it serve? Great that it's 12 pages, but if nobody can read it..
JD
There are many constraints. The order of the authors... that can affect your career! I've seen blood spilled. Either you have to be the first author or the last. Why can't we have something like a film credit with people's contributions? Why do we have to stick that just because a paper has authors with no metadata? It's mad.
SC
It's social, these things can change. It's arbitrary.
RV
Back to incentives... if we did that, some authors would eventually disappear. If their name isn't there, they don't get money for the lab. For ISWC I'll submit in latex/pdf because that's the currency there. We need to be pragmatic as well. Trying to do a bit of both.
JD
Maybe academia is one of the few areas where we say we do ti this way because it has been this way for 100 years. Imagine if your doctor said that.
RV
We're supposed to be advancing the state of the art.. it doesn't make any sense as web researchers. Every once in a while you should just publish something as Linked Research.
SC
As far as the LDOW workshop goes, it's reputation or influence. The whole idea is can we pioneer this? Can we break out of the mould and get things moving? It has influence on other LD/SW events. We can always wait for this whole hting to be set up, infrastructure, publishers, institutions on board, people having their own websites... we can wait for all that... OR we can take incremental steps on any part, pick a point and start. What do we have to do to make those steps forward? When I do it and some of my friends do it and some people want to do it that's not critical, it's a hobby. How do we get to a point where next year all the LDOW publications, the articles and the reviews, are in the LOD Cloud? Ruben just presented his work on how he does it and how it can be queried. There's no particular reason why no-one else does that.
RV
If LDOW next year makes this mandatory.. would fewer people submit or is it feasible? Who is willing to try it?
JL
Many people maybe take existing work... fi they have to switch formula they might not..
SD
Many people might like the idea. I think it would be interesting. It might even make the workshop more interactive.
IH
The problem we should not underestimate is that most of the submissions and authors use tools like ms word or pages or whatever and they are not willing to get away from that. That's because the tooling to produce high quality html in an easy way is very poor. That's the problem. We're having discussions with publishers to publish in proper HTML and do 21st C publications. But it was very clear that if we discuss that we have to have an environment where the publishers do conversion because there will be submissions coming in in ms word (as the most widespread, some communities latex). Even with this workshop the problem will arise. There will be people who say yeah I can't do a really nice HTML because I don't author HTML. You can't refuse them. You have to have a pipeline which converts from latex and word to HTML. The quality of word to html conversion... it's not an easy thing to do.
RV
I follow you in general but for this specific community.....
IH
I'm happy to bet with you. My bet is that even in this community, even in SW people, they may be very good in OWL whatever but they have never ever authored HTML.
SD
It only works if we as conference organisers take care of the transformation.
RV
I beg to differ. It's about incentives... we have this project of creating slides in HTML. We got people in veterinary and psychology writing slides in HTML because they see the benefits. It's a leap you have to take. It's not rocket science it's HTML.
SC
Hands up if you want your work to be accessible and discoverable? We all want that. What if we raise the bar? This is a very specific community. We're specifically about advancing linked data on the Web. From my pov the bar for participating, just like any other requirements - number of pages, review - it doesn't take much to say sorry but if youw ant your stuff out there it has to be a bit webby. However you get the HTML or any other linked data friendly thing.. the question is whether we can raise the bar and it's fine for this community. doesn't have to apply to everything.
IH
There is no authoring tools that would also do adding RDFa or microdata into HTML. THat has to be done by hand. I must admit, I know RDFa and adding it by hand into an HTML file is a pain. We had to learn over the years, we were a bit naive that it was an easy thing to do. No it's a headache. It's a huge problem. I know we do not agree on this. I wouldn't want authors to put in RDFa. Turtle or JSON-LD maybe..
SC
I agree it's painful to handcode. It's not about the syntax, it's about whether it's webby. Do whatever you want. Can I query for your hypothesis? You can do all the conneg you want... can I at least get ahold of it. I completely agree that HTML+RDFa.. I know it's painful, I'm working on tooling because I don't want people to handcode.
JL
Do we have to force HTML? Can we use latex plus annotations?
SC
The challenges I was doing with LinkedResearch stuff.. if you can do it with a jpeg file, go for it. Can I run some sort of a query to find your research results? Can I build a citation network out of this?
JL
Latex to html plus semantic annotations might not be so hard.
SC
It's on you to do whatever you have to do to make people find your work. If latex plus annotations works then go for it. Is your latex going to end up in this cloud?
RV
Clear need for tools. Why are we not building tools? THere's no incentives to build tools.. this is not what gets us into conferences with.
SD
We need to solve the OA problem to provide an incentive for people to publish in HTML. For the time being it's still the publishers. I guess the importance is to get the publishers on board. They are already using thing slike shcema.org. If peopel would actually work with them they might be open to doing it properly.
SN
I don't really see writing HTML as a barrier. From a student perspective, at some point you have to start writing, and mayb eyou're not used to writing latex so you have this barrier of writing latex. Can be tricky. I think it's just if you would ask people to submit in HTML they have the same barrier. It's not as high a barrier as latex.
SC
Who used latex when you started your academic career?
BR
Talking about incentives... look at stackoverflow. There are people answering questions for free all the time. Why? It maybe is because their ego perhaps.. they get reputation, points, If you really want an incentive for people to submit papers in this format, maybe just show as an immediate incentive how many more citations you're likely to get or how quickly you can be discovered. Now it's not so obvious. If there were more discoverability interfaces out there that would streamline this process...
JL
Closely related to what JD said. For publishing it's very indirect. You don't know exactly the benefit.
IH
I don't disagree, but there is more to it. I fyou look at journals that have gone down the proper 21st century publishing way. PeerJ Computer Science: if i am simply a reader it's incomparably more pleasant than reading in a PDF file. It gives me a user interface it gives me adaptation to my mobile device, it gives me an easy way of finding references. A lot of tools because it uses the possibilities of the Web. If this workshop does something in this direction. It shouldn't just say "I published HTML" that by itself is nice but we should really exploit the web, the fact that at that time I have a paper on the Web and again it's worth looking at what PeerJ does. There are a number of these startup publishers that add a number of facilities that suddenly makes journal reading a completely different experience. That would be a real win.
SC
And it's not just the articles themselves, it's the reviews. Giving credit and attribution to people who have put time into all that. We don't know how that will unfold. There are services like pubpeer or publons, which are trying to move in that direction so that people who put in their time get credit.
IH
Science.ai doing that too.
IH
In a sense, the paper you wrote remains a living thing, it's not frozen. You can reply to comments that come years after you published. That's what it gives you.
JL
As a side effect you become more frequently cited.
SC
It opens up the conversations. Not just authors and reviewers, anyone who comes across this work. Normally what we do is have 2-5 reviews, and then we assume that it is worthwhile contribution and there are enough checks in place. But why stop there? Anyone could potentially say something useful. I don't have to sell Web Annotations, but solutions for this exist, eg. hypothesis. We could get to the point where we have better tooling and so on.
JL
How many people are interested to follow the HTML only model
AG
EDSC workshop at ESWC requires native web submissions
various
you might have some problems
AG
We'll take the hit for progress..
RV
People need guidence. It's new for publishing. Secondly, you could try a two step model. If you're submitting HTML this is your word limit, if's PDF you have a page limit. You still give people a fallback option. But strongly incentivise them to go for HTML. Going all the way is too strict, but with a backup option.
JL
For workshop papers page count is not such a big issue.
RV
Or a different deadline. Just giving an example.
A
You need a stylesheet, and a template already with annotations in there.
RV
Even better can we give them an editor?
SC
Poll, who would welcome HTML next year?... who objects? No objections.
RV
Just try it. LDOW is about advancing linked data on the Web. If you can't do it nobody can. If it doesn't work, that's a very important sign for the LD community. An interesting lesson learned in any case.
JL
Exactly. What are we doing the workshop for?
IH
I'm not sure I fully understood your question. There are two sides. One is the workshop papers are all published in HTML, with some stylesheets etc. To which I say yes. The other quesiton is all submissions must be in HTML full stop? That's when I'm saying I'm not sure it would work.
SC
the HTML should have some connection to RDF.
IH
Which format should it be submitted? If I want to write my paper in world and I give you a turtle file because I have the meta data. Is this acceptable?
RV
people who write in word will also write turtle files?
JL
Someone needs to do the work.
SC
Is some tool or source code or something considered as a submission? If we're getting into pure data which doesn't have a human interface, is that okay? If the majority felt comfortable with HTML we stick to that.
RV
For some papers there's extra data in turtle. We should be open to accepting that. Makes sense to accept that.
JL
Could be difficult to have too much diversity.. if we want a nice interface.
RV
If you have lots of diversity you can still query it, shows the strength of linked data.
SC
I would say no to the msword+turtle question... someone said earlier the submission guidelines are already too open. If we go into this turtle thing or whatever that's as wide as it gets and it doesn't have an interface. It's not a human friendly interface. It should be human and machine friendly. We should consider human friendly as perhaps more html, javascript.
IH
I'm just trying to be realistic. In my view a workshop like this has to accept word. If you just want to ask people to send HTML I'm very pessimistic about that. It's not the linked data side, it' publishing in general. Just asking people to submit HTML, I don't believe it works. I would love if I was wrong. That's where the web conference failed in its earlier attempts. We have learned the hard way that most authors want to submit word or latex. Two or three formats you are prepared to accept, and you have to have the tools to convert.
SC
If we can indicate what people need to include so that we can create HTML out of then it's fine. It's irrelevant what the webpage structure is in the end.
IH
You can put the RDF content into a script element.
DS
Ultimately people just want the tools they are used to. If we can provide tools that people can use comfortably and we can extract what we want, then that has a better chance of being accepted. There are tools out there, we need to look around. It's not THAT bad.
IH
You don't have to start from scratch. Pandoc cannot handle word files, but it may be able to handle latex... it's not like you start from scratch.
SC
There may be another layer in there that gives us added RDFa on it. That woudl give us plain HTML but would not give us one more step.
RV
Smaller steps.
IH
If you can generate the RDF that's what you want, then not necessarily RDFa, and put it in a script element, you got it.
A
There are many more people in the world who can write HTML than latex. People who are writing in word are not writing latex. In this group, we are Computer Scientists or related fields. Everyone can write HTML. What's the barrier there?
IH
I write markdown and convert to HTML
RV
So as a community we list tooling and options that people can generate HTML -> see https://linkedresearch.org/resources (AG)
RV
We should write this down for the CfP for next year
SC
I can send out a rough thing people can give feedback on. It needs to reflect what the community wants not what one or two people want. I'm glad most of us are on board with moving towards this, whatever is. We agree that tooling is a challenge there, not so much the format. I don't particularly care how we get to the end result. I'm working on tooling myself to solve some of these problems and half the time I'm handcoding because the tooling can only cover a certain range of things. There's a bit of a tradeoff. Just as we have latex editing interfaces, I'm sure that most of us are familiar with handcoding latex because there are some things that the tooling will not give us. Same with HTML.
JD
I think you'll have to think about your evaluation criteria for accepting papers. as now you think about how clear and legible the English language is, now you also have to assess how well HTML has been used.
DS
And the quality of the linking.
SC
it's up to us where we set the bar. We can say to submit it has to somehow end up in the LORCloud. You can publish it on your insitutions website, your personal website or so on. As long as we can get ahold of it. We'r eturning the other way and saying we don't care about the format...
JD
What happens if I write semantic statements that have no meaning? What happens if I write statements that contradict the text of my paper?
AG
if we get people writing contradictory RDF to mess with us, we're doing okay?
SC
If people can publish their contributions, we can work with it
JL
If it's in the evaluation criteria, people will do it rather than being rejected
SC
We can provide a template
JL
We should have follow up discussions and we can make some decisions as the organisation committee.
SC
I don't want to send out individual emails. Start with public-lod? I'll start a thread.
JD
Probably because of you [SC] we may well have a panel at ESWC on open publication. We'll get someone from Springer, Elsevier. We'll try to get an editor in chief from some journal. and someone like [Sarven]...
SC
Publishers vs. the little guy...