Note

Created: 2021-02-20 10:27:27 Platform: Email Email
Title Analysing the Requirements for an Open Research Knowledge Graph: Use Cases, Quality Requirements and Construction Strategies
Note:
#knowledgegraph #kg #openscience #graph

In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science
Tags:
URLs in Note:
DOIs in Note:

Referred Content

Platform: Referred Content
Referred Content Text
Tags:
URLs in Note:
Attachments
Title
Author
URLs and Resources in Attachments
References

1. Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Craw-
ford, M., Downey, D., Dunkelberger, J., Elgohary, A., Feldman,
S., Ha, V., Kinney, R., Kohlmeier, S., Lo, K., Murray, T., Ooi,
H., Peters, M.E., Power, J., Skjonsberg, S., Wang, L.L., Wil-
helm, C., Yuan, Z., van Zuylen, M., Etzioni, O.: Construction
of the literature graph in semantic scholar. In: S. Bangalore,
J. Chu-Carroll, Y. Li (eds.) Proceedings of the 2018 Conference
of the North American Chapter of the Association for Compu-
tational Linguistics: Human Language Technologies, NAACL-
HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vol-
ume 3 (Industry Papers), pp. 84–91. Association for Computa-
tional Linguistics (2018). DOI 10.18653/v1/n18-3011. URL
https://doi.org/10.18653/v1/n18-3011

2. Aryani, A., Wang, J.: Research graph: Building a distributed
graph of scholarly works using research data switchboard.
In: Open Repositories CONFERENCE (2017). DOI
10.4225/03/58c696655af8a. URL https://figshare.
com/articles/Research_Graph_Building_
a_Distributed_Graph_of_Scholarly_Works_
using_Research_Data_Switchboard/4742413

3. Auer, S., Mann, S.: Towards an open research knowledge graph.
The Serials Librarian 76(1-4), 35–41 (2019). DOI 10.1080/
0361526X.2019.1540272. URL https://doi.org/10.
1080/0361526X.2019.1540272

4. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.:
Semeval 2017 task 10: Scienceie - extracting keyphrases and re-
lations from scientific publications. In: S. Bethard, M. Carpuat,
M. Apidianaki, S.M. Mohammad, D.M. Cer, D. Jurgens (eds.)
Proceedings of the 11th International Workshop on Semantic
Evaluation, SemEval@ACL 2017, Vancouver, Canada, August
3-4, 2017, pp. 546–555. Association for Computational Lin-
guistics (2017). DOI 10.18653/v1/S17-2091. URL https:
//doi.org/10.18653/v1/S17-2091

5. Badie, K., Asadi, N., Mahmoudi, M.T.: Zone identification based
on features with high semantic richness and combining results of
separate classifiers. J. Inf. Telecommun. 2(4), 411–427 (2018).
DOI 10.1080/24751839.2018.1460083. URL https://doi.
org/10.1080/24751839.2018.1460083

6. Balog, K.: Entity-oriented search. Springer (2018). DOI 10.
1007/978-3-319-93935-3. URL https://eos-book.org

7. Bechhofer, S., Buchan, I.E., Roure, D.D., Missier, P., Ainsworth,
J.D., Bhagat, J., Couch, P.A., Cruickshank, D., Delderfield, M.,
Dunlop, I., Gamble, M., Michaelides, D.T., Owen, S., New-
man, D.R., Sufi, S., Goble, C.A.: Why linked data is not enough
for scientists. Future Gener. Comput. Syst. 29(2), 599–611
(2013). DOI 10.1016/j.future.2011.08.004. URL https://
doi.org/10.1016/j.future.2011.08.004

8. Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper
recommender systems: a literature survey. Int. J. Digit. Libr.
17(4), 305–338 (2016). DOI 10.1007/s00799-015-0156-0. URL
https://doi.org/10.1007/s00799-015-0156-0

9. Beltagy, I., Lo, K., Cohan, A.: SciBERT: A pretrained language
model for scientific text. In: K. Inui, J. Jiang, V. Ng, X. Wan
(eds.) Proceedings of the 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the 9th International

Joint Conference on Natural Language Processing, EMNLP-
IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp.
3613–3618. Association for Computational Linguistics (2019).
DOI 10.18653/v1/D19-1371. URL https://doi.org/10.
18653/v1/D19-1371

10. Bizer, C.: Quality-Driven Information Filtering- In the Context
of Web-Based Information Systems. VDM Verlag, Saarbrücken,
DEU (2007)

11. Bodenreider, O.: The unified medical language system (UMLS):
integrating biomedical terminology. Nucleic Acids Res.
32(Database-Issue), 267–270 (2004). DOI 10.1093/nar/gkh061.
URL https://doi.org/10.1093/nar/gkh061

12. Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.:
Freebase: a collaboratively created graph database for structuring
human knowledge. In: J.T. Wang (ed.) Proceedings of the ACM
SIGMOD International Conference on Management of Data,
SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pp.
1247–1250. ACM (2008). DOI 10.1145/1376616.1376746. URL
https://doi.org/10.1145/1376616.1376746

13. Booch, G., Rumbaugh, J., Jacobson, I.: Unified Modeling Lan-
guage User Guide, The (2nd Edition) (Addison-Wesley Object
Technology Series). Addison-Wesley Professional (2005)

14. Bornmann, L., Mutz, R.: Growth rates of modern science: A
bibliometric analysis based on the number of publications and
cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222
(2015). DOI 10.1002/asi.23329. URL https://doi.org/
10.1002/asi.23329

15. Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-
independent extraction of scientific concepts from research arti-
cles. In: J.M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro,
M.J. Silva, F. Martins (eds.) Advances in Information Retrieval
- 42nd European Conference on IR Research, ECIR 2020, Lis-
bon, Portugal, April 14-17, 2020, Proceedings, Part I, Lecture
Notes in Computer Science, vol. 12035, pp. 251–266. Springer
(2020). DOI 10.1007/978-3-030-45439-5\ 17. URL https:
//doi.org/10.1007/978-3-030-45439-5_17

16. Brack, A., Hoppe, A., Stocker, M., Auer, S., Ewerth, R.: Re-
quirements analysis for an open research knowledge graph. In:
M.M. Hall, T. Mercun, T. Risse, F. Duchateau (eds.) Digital
Libraries for Open Knowledge - 24th International Conference
on Theory and Practice of Digital Libraries, TPDL 2020, Lyon,
France, August 25-27, 2020, Proceedings, Lecture Notes in Com-
puter Science, vol. 12246, pp. 3–18. Springer (2020). DOI
10.1007/978-3-030-54956-5\ 1. URL https://doi.org/
10.1007/978-3-030-54956-5_1

17. Brack, A., Müller, D., Hoppe, A., Ewerth, R.: Coreference reso-
lution in research papers from multiple domains. In: Proceedings
of ECIR 2021 (accepted for publication) (2021)

18. Braun, R., Benedict, M., Wendler, H., Esswein, W.: Proposal for
requirements driven design science research. In: B. Donnellan,
M. Helfert, J. Kenneally, D.E. VanderMeer, M.A. Rothenberger,
R. Winter (eds.) New Horizons in Design Science: Broadening
the Research Agenda - 10th International Conference, DESRIST
2015, Dublin, Ireland, May 20-22, 2015, Proceedings, Lecture
Notes in Computer Science, vol. 9073, pp. 135–151. Springer
(2015). DOI 10.1007/978-3-319-18714-3\ 9. URL https:
//doi.org/10.1007/978-3-319-18714-3_9

19. Brodaric, B., Reitsma, F., Qiang, Y.: Skiing with DOLCE: to-
ward an e-science knowledge infrastructure. In: C. Eschen-
bach, M. Grüninger (eds.) Formal Ontology in Information
Systems, Proceedings of the Fifth International Conference,
FOIS 2008, Saarbrücken, Germany, October 31st - Novem-
ber 3rd, 2008, Frontiers in Artificial Intelligence and Appli-
cations, vol. 183, pp. 208–219. IOS Press (2008). DOI 10.
3233/978-1-58603-923-3-208. URL https://doi.org/
10.3233/978-1-58603-923-3-208

https://doi.org/10.18653/v1/n18-3011
https://figshare.com/articles/Research_Graph_Building_a_Distributed_Graph_of_Scholarly_Works_using_Research_Data_Switchboard/4742413
https://figshare.com/articles/Research_Graph_Building_a_Distributed_Graph_of_Scholarly_Works_using_Research_Data_Switchboard/4742413
https://figshare.com/articles/Research_Graph_Building_a_Distributed_Graph_of_Scholarly_Works_using_Research_Data_Switchboard/4742413
https://figshare.com/articles/Research_Graph_Building_a_Distributed_Graph_of_Scholarly_Works_using_Research_Data_Switchboard/4742413
https://doi.org/10.1080/0361526X.2019.1540272
https://doi.org/10.1080/0361526X.2019.1540272
https://doi.org/10.18653/v1/S17-2091
https://doi.org/10.18653/v1/S17-2091
https://doi.org/10.1080/24751839.2018.1460083
https://doi.org/10.1080/24751839.2018.1460083
https://eos-book.org
https://doi.org/10.1016/j.future.2011.08.004
https://doi.org/10.1016/j.future.2011.08.004
https://doi.org/10.1007/s00799-015-0156-0
https://doi.org/10.18653/v1/D19-1371
https://doi.org/10.18653/v1/D19-1371
https://doi.org/10.1093/nar/gkh061
https://doi.org/10.1145/1376616.1376746
https://doi.org/10.1002/asi.23329
https://doi.org/10.1002/asi.23329
https://doi.org/10.1007/978-3-030-45439-5_17
https://doi.org/10.1007/978-3-030-45439-5_17
https://doi.org/10.1007/978-3-030-54956-5_1
https://doi.org/10.1007/978-3-030-54956-5_1
https://doi.org/10.1007/978-3-319-18714-3_9
https://doi.org/10.1007/978-3-319-18714-3_9
https://doi.org/10.3233/978-1-58603-923-3-208
https://doi.org/10.3233/978-1-58603-923-3-208


Requirements Analysis for an Open Research Knowledge Graph 13
Ta

bl
e

2:
C

ha
ra

ct
er

is
ti

cs
of

da
ta

se
ts

an
d

pe
rf

or
m

an
ce

m
ea

su
re

s
fo

r
se

nt
en

ce
cl

as
si

fi
ca

ti
on

in
re

se
ar

ch
pa

pe
rs

.

D
at

as
et

D
om

ai
ns

#
P

ap
er

s
C

ov
er

ag
e

Se
nt

en
ce

C
la

ss
es

In
te

r-
co

de
r

A
gr

ee
m

en
t

P
er

fo
rm

an
ce

P
ub

M
ed

-2
0k

[3
1]

B
io

m
ed

ic
in

e
20

,0
00

ab
st

ra
ct

s

B
ac

kg
ro

un
d

O
bj

ec
tiv

e
M

et
ho

ds
R

es
ul

ts
,C

on
cl

us
io

n

n/
a

92
.9

%
F

1
[2

4]

N
IC

TA
-P

IB
O

S
O

[5
9]

B
io

m
ed

ic
in

e
1,

00
0

ab
st

ra
ct

s

B
ac

kg
ro

un
d

In
te

rv
en

ti
on

S
tu

dy
P

op
ul

at
io

n
O

ut
co

m
e,

O
th

er

62
.0

%
k

84
.7

%
F

1
[2

4]

C
S

A
B

S
T

R
U

C
T

[2
4]

C
om

pu
te

r
S

ci
en

ce
2,

18
9

ab
st

ra
ct

s

B
ac

kg
ro

un
d

O
bj

ec
tiv

e
M

et
ho

d
R

es
ul

t,
O

th
er

75
.0

%
k

83
.1

%
F

1
[2

4]

C
S

-A
bs

tr
ac

ts
[4

7]
C

om
pu

te
r

S
ci

en
ce

65
4

ab
st

ra
ct

s

B
ac

kg
ro

un
d

O
bj

ec
tiv

e
M

et
ho

ds
R

es
ul

ts
,C

on
cl

us
io

ns

n/
a

74
.6

%
F

1
[4

7]

E
m

er
al

d
10

0k
[9

8]
M

an
ag

em
en

t
In

fo
rm

at
io

n
S

ci
en

ce
E

ng
in

ee
ri

ng
10

3,
45

7
ab

st
ra

ct
s

P
ur

po
se

D
es

ig
n/

m
et

ho
do

lo
gy

/a
pp

ro
ac

h
F

in
di

ng
s

O
ri

gi
na

li
ty

/v
al

ue
S

oc
ia

li
m

pl
ic

at
io

ns
P

ra
ct

ic
al

im
pl

ic
at

io
ns

R
es

ea
rc

h
li

m
it

at
io

ns
/i

m
pl

ic
at

io
ns

n/
a

n/
a

M
A

Z
E

A
[2

8]
P

hy
si

cs
E

ng
in

ee
ri

ng
L

if
e

an
d

H
ea

lt
h

S
ci

en
ce

s
1,

33
5

ab
st

ra
ct

s

B
ac

kg
ro

un
d

G
ap

,P
ur

po
se

M
et

ho
d

R
es

ul
t,

C
on

cl
us

io
n

59
,4

%
k

66
.0

%
ac

cu
ra

cy
[2

8]

S
af

de
r

et
al

.[
93

]
C

om
pu

te
r

S
ci

en
ce

92
fu

ll
te

xt

A
lg

or
it

hm
ic

E
ffi

ci
en

cy
D

at
as

et
D

es
cr

ip
ti

on
A

lg
or

it
hm

ic
T

im
e

C
om

pl
ex

it
y

O
th

er

n/
a

78
.5

%
ac

cu
ra

cy
[9

3]

D
r.

In
ve

nt
or

[4
2]

C
om

pu
te

r
G

ra
ph

ic
s

40
fu

ll
te

xt

B
ac

kg
ro

un
d

C
ha

ll
en

ge
A

pp
ro

ac
h

O
ut

co
m

e,
F

ut
ur

e
W

or
k

66
.7

%
k

72
.5

%
ac

cu
ra

cy
[5

]

A
R

T
/C

or
eS

C
[6

8]
C

he
m

is
tr

y
C

om
pu

ta
ti

on
al

L
in

gu
is

ti
c

22
5

fu
ll

te
xt

B
ac

kg
ro

un
d

M
ot

iv
at

io
n,

G
oa

l
H

yp
ot

he
si

s
O

bj
ec

t
M

od
el

,M
et

ho
d

E
xp

er
im

en
t,

R
es

ul
t

O
bs

er
va

ti
on

,C
on

cl
us

io
n

57
.0

%
k

51
.6

%
F

1
[6

7]



14 Brack et al.

Table
3:C

haracteristics
of

datasets
and

perform
ance

m
easures

for
binary

and
n-ary

relation
extraction

in
research

papers.
*

For
S

O
F

C
-E

xp
corpus,perform

ance
values

w
ere

obtained
w

ith
ground

truth
conceptm

entions.

D
ataset

D
om

ains
#

P
apers

C
overage

C
ardinality

R
elation

T
ypes

Scope
Inter-coder

A
greem

ent
#

R
elations

P
erform

ance

S
em

E
val17

[4]
C

om
puter

S
cience

M
aterialS

ciences
P

hysics
500

abstract
binary

synonym
-of

hyponym
-of

intra-sentence
60.0%

k
672

28.0%
F

1
[4]

S
em

E
val18

[89]
C

om
p.L

inguistics
500

abstract
binary

usage
result
m

odel
part-w

hole
topic
com

parison

intra-sentence
90.8%

F
1

1595
49.3%

F
1

[89]

C
hem

P
rot[63]

B
iom

edicine
2482

abstract
binary

U
P

R
E

G
U

L
A

T
O

R
A

C
T

IV
A

T
O

R
D

O
W

N
R

E
G

U
L

A
T

O
R

IN
H

IB
IT

O
R

A
G

O
N

IS
T

A
N

TA
G

O
N

IS
T

S
U

B
S

T
R

A
T

E

intra-sentence
n/a

10,031
83.64%

F
1

[9]

S
ciE

R
C

[70]
A

rt.Intelligence
500

abstract
binary

hyponym
-of

com
pare

part-of
conjunction
evaluate-for
feature-of
used-for

cross-sentence
67.8%

F
1

4,716
39.3%

F
1

[70]

P
W

C
[58]

A
rt.Intelligence

731
fulltext

n-ary

(Task,
D

ataset,
M

etric,
S

core)

docum
ent-level

n/a
2,295

28.7%
F

1
[58]

C
K

B
[56]

B
iom

edicine
343

fulltext
n-ary

(D
rug,G

ene,
M

utation)
docum

ent-level
n/a

2,025
52.8%

F
1

[56]

S
O

F
C

-E
xp

[43]
M

aterialS
ciences

45
fulltext

n-ary

(A
nodeM

aterial,
C

athodeM
aterial,

D
evice,

E
lectrolyteM

aterial,
F

uelU
sed,

InterlayerM
aterial,

O
penC

ircuitV
oltage,

P
ow

erD
ensity,

R
esistance,

W
orkingTem

perature)

docum
ent-level

n/a
n/a

56.4%
F

1
*

[43]



Requirements Analysis for an Open Research Knowledge Graph 15

Ta
bl

e
4:

C
ha

ra
ct

er
is

ti
cs

of
da

ta
se

ts
an

d
pe

rf
or

m
an

ce
m

ea
su

re
s

fo
r

sc
ie

nt
ifi

c
co

nc
ep

t
ex

tr
ac

ti
on

in
re

se
ar

ch
pa

pe
rs

.*
Fo

r
S

O
F

C
-E

xp
co

rp
us

,p
er

fo
rm

an
ce

va
lu

es
w

er
e

ob
ta

in
ed

w
it

h
gr

ou
nd

tr
ut

h
se

nt
en

ce
s

de
sc

ri
bi

ng
ex

pe
ri

m
en

ts
.

D
at

as
et

D
om

ai
ns

#
P

ap
er

s
#

C
on

ce
pt

s
C

ov
er

ag
e

C
on

ce
pt

T
yp

es
In

te
r-

co
de

r
A

gr
ee

m
en

t
P

er
fo

rm
an

ce

S
em

E
va

l1
7

[4
]

C
om

pu
te

r
S

ci
en

ce
M

at
er

ia
lS

ci
en

ce
s

P
hy

si
cs

50
0

9,
94

6
ab

st
ra

ct
P

ro
ce

ss
Ta

sk
M

at
er

ia
l

60
.0

%
κ

56
.9

%
F

1
[8

0]

S
T

M
[1

5]

A
gr

ic
ul

tu
re

A
st

ro
no

m
y

B
io

lo
gy

C
he

m
is

tr
y

C
om

pu
te

r
S

ci
en

ce
E

ar
th

S
ci

en
ce

E
ng

in
ee

ri
ng

M
at

er
ia

ls
S

ci
en

ce
M

at
he

m
at

ic
s

M
ed

ic
in

e

11
0

6,
12

7
ab

st
ra

ct

P
ro

ce
ss

M
et

ho
d

M
at

er
ia

l
D

at
a

76
.0

%
κ

65
.5

%
F

1
[1

5]

S
ci

E
R

C
[7

0]
A

rt
.I

nt
el

li
ge

nc
e

50
0

8,
08

9
ab

st
ra

ct

Ta
sk

M
et

ho
d

M
et

ri
c

M
at

er
ia

l
O

th
er

G
en

er
ic

76
.9

%
κ

75
.2

%
F

1
[8

0]

A
C

L
2

[8
9]

C
om

p.
L

in
gu

is
ti

cs
30

0
6,

81
8

ab
st

ra
ct

M
et

ho
d

To
ol

L
an

gu
ag

e
R

es
ou

rc
e

(L
R

)
L

R
pr

od
uc

t
M

od
el

M
ea

su
re

s/
M

ea
su

re
m

en
ts

O
th

er

63
.0

%
F

1
69

.9
%

F
1

[8
0]

B
5C

D
R

[6
6]

B
io

m
ed

ic
in

e
15

00
28

,7
85

ab
st

ra
ct

C
he

m
ic

al
D

is
ea

se
91

.8
%

F
1

88
.9

%
F

1
[9

]

N
C

B
I-

di
se

as
e

[3
4]

B
io

m
ed

ic
in

e
79

3
6,

89
2

ab
st

ra
ct

D
is

ea
se

88
.0

%
F

1
96

.9
%

F
1

[9
]

S
O

F
C

-E
xp

[4
3]

M
at

er
ia

lS
ci

en
ce

s
45

4,
00

4
fu

ll
te

xt
M

at
er

ia
l

D
ev

ic
e

V
al

ue
95

.8
%

F
1

81
.5

%
F

1*
[4

3]



16 Brack et al.

20. Burton, A., Aryani, A., Koers, H., Manghi, P., Bruzzo, S.L.,
Stocker, M., Diepenbroek, M., Schindler, U., Fenner, M.: The
scholix framework for interoperability in data-literature infor-
mation exchange. D Lib Mag. 23(1/2) (2017). DOI 10.1045/
january2017-burton. URL https://doi.org/10.1045/
january2017-burton

21. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H.,
Mitchell, T.M.: Toward an architecture for never-ending lan-
guage learning. In: M. Fox, D. Poole (eds.) Proceedings of
the Twenty-Fourth AAAI Conference on Artificial Intelligence,
AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010. AAAI
Press (2010). URL http://www.aaai.org/ocs/index.
php/AAAI/AAAI10/paper/view/1879

22. CB Insights: The data flywheel: How enlightened
self-interest drives data network effects. https:
//www.cbinsights.com/research/team-blog/
data-network-effects/. Accessed: 2020-11-10

23. Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural
scaffolds for citation intent classification in scientific publica-
tions. In: J. Burstein, C. Doran, T. Solorio (eds.) Proceed-
ings of the 2019 Conference of the North American Chap-
ter of the Association for Computational Linguistics: Human
Language Technologies, NAACL-HLT 2019, Minneapolis, MN,
USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp.
3586–3596. Association for Computational Linguistics (2019).
DOI 10.18653/v1/n19-1361. URL https://doi.org/10.
18653/v1/n19-1361

24. Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pre-
trained language models for sequential sentence classification.
In: K. Inui, J. Jiang, V. Ng, X. Wan (eds.) Proceedings of the 2019
Conference on Empirical Methods in Natural Language Process-
ing and the 9th International Joint Conference on Natural Lan-
guage Processing, EMNLP-IJCNLP 2019, Hong Kong, China,
November 3-7, 2019, pp. 3691–3697. Association for Compu-
tational Linguistics (2019). DOI 10.18653/v1/D19-1383. URL
https://doi.org/10.18653/v1/D19-1383

25. Cohen, K.B., Lanfranchi, A., Choi, M.J., Bada, M., Jr., W.A.B.,
Panteleyeva, N., Verspoor, K., Palmer, M., Hunter, L.E.: Coref-
erence annotation and resolution in the colorado richly anno-
tated full text (CRAFT) corpus of biomedical journal articles.
BMC Bioinform. 18(1), 372:1–372:14 (2017). DOI 10.1186/
s12859-017-1775-9. URL https://doi.org/10.1186/
s12859-017-1775-9

26. Consortium, T.G.O.: The gene ontology resource: 20 years and
still going strong. Nucleic Acids Res. 47(Database-Issue),
D330–D338 (2019). DOI 10.1093/nar/gky1055. URL https:
//doi.org/10.1093/nar/gky1055

27. Constantin, A., Peroni, S., Pettifer, S., Shotton, D.M., Vitali, F.:
The document components ontology (doco). Semantic Web 7(2),
167–181 (2016). DOI 10.3233/SW-150177. URL https://
doi.org/10.3233/SW-150177

28. Dayrell, C., Jr., A.C., Lima, G., Jr., D.M., Copestake,
A.A., Feltrim, V.D., Tagnin, S.E.O., Aluı́sio, S.M.: Rhetor-
ical move detection in english abstracts: Multi-label sen-
tence classifiers and their annotated corpora. In: N. Cal-
zolari, K. Choukri, T. Declerck, M.U. Dogan, B. Maegaard,
J. Mariani, J. Odijk, S. Piperidis (eds.) Proceedings of the
Eighth International Conference on Language Resources and
Evaluation, LREC 2012, Istanbul, Turkey, May 23-25, 2012,
pp. 1604–1609. European Language Resources Association
(ELRA) (2012). URL http://www.lrec-conf.org/
proceedings/lrec2012/summaries/734.html

29. Degbelo, A.: A snapshot of ontology evaluation criteria and
strategies. In: R. Hoekstra, C. Faron-Zucker, T. Pellegrini,
V. de Boer (eds.) Proceedings of the 13th International Con-
ference on Semantic Systems, SEMANTICS 2017, Amsterdam,
The Netherlands, September 11-14, 2017, pp. 1–8. ACM (2017).

DOI 10.1145/3132218.3132219. URL https://doi.org/
10.1145/3132218.3132219

30. Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden,
M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M., Ash-
burner, M.: Chebi: a database and ontology for chemical enti-
ties of biological interest. pp. 344–350 (2008). DOI 10.1093/
nar/gkm791. URL https://doi.org/10.1093/nar/
gkm791

31. Dernoncourt, F., Lee, J.Y.: Pubmed 200k RCT: a dataset for se-
quential sentence classification in medical abstracts. In: G. Kon-
drak, T. Watanabe (eds.) Proceedings of the Eighth Interna-
tional Joint Conference on Natural Language Processing, IJC-
NLP 2017, Taipei, Taiwan, November 27 - December 1, 2017,
Volume 2: Short Papers, pp. 308–313. Asian Federation of
Natural Language Processing (2017). URL https://www.
aclweb.org/anthology/I17-2052/

32. Dessı̀, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta,
E., Sack, H.: AI-KG: an automatically generated knowledge
graph of artificial intelligence. In: J.Z. Pan, V.A.M. Tamma,
C. d’Amato, K. Janowicz, B. Fu, A. Polleres, O. Seneviratne,
L. Kagal (eds.) The Semantic Web - ISWC 2020 - 19th Inter-
national Semantic Web Conference, Athens, Greece, Novem-
ber 2-6, 2020, Proceedings, Part II, Lecture Notes in Com-
puter Science, vol. 12507, pp. 127–143. Springer (2020). DOI
10.1007/978-3-030-62466-8\ 9. URL https://doi.org/
10.1007/978-3-030-62466-8_9

33. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-
training of deep bidirectional transformers for language under-
standing. In: J. Burstein, C. Doran, T. Solorio (eds.) Proceed-
ings of the 2019 Conference of the North American Chap-
ter of the Association for Computational Linguistics: Human
Language Technologies, NAACL-HLT 2019, Minneapolis, MN,
USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp.
4171–4186. Association for Computational Linguistics (2019).
DOI 10.18653/v1/n19-1423. URL https://doi.org/10.
18653/v1/n19-1423

34. Dogan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: A re-
source for disease name recognition and concept normalization.
J. Biomed. Informatics 47, 1–10 (2014). DOI 10.1016/j.jbi.
2013.12.006. URL https://doi.org/10.1016/j.jbi.
2013.12.006

35. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Mur-
phy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a
web-scale approach to probabilistic knowledge fusion. In: S.A.
Macskassy, C. Perlich, J. Leskovec, W. Wang, R. Ghani (eds.)
The 20th ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’14, New York, NY,
USA - August 24 - 27, 2014, pp. 601–610. ACM (2014). DOI
10.1145/2623330.2623623. URL https://doi.org/10.
1145/2623330.2623623

36. D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ew-
erth, R.: The STEM-ECR dataset: Grounding scientific entity
references in STEM scholarly content to authoritative encyclo-
pedic and lexicographic sources. In: N. Calzolari, F. Béchet,
P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isa-
hara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk,
S. Piperidis (eds.) Proceedings of The 12th Language Resources
and Evaluation Conference, LREC 2020, Marseille, France, May
11-16, 2020, pp. 2192–2203. European Language Resources
Association (2020). URL https://www.aclweb.org/
anthology/2020.lrec-1.268/

37. Färber, M.: The microsoft academic knowledge graph: A linked
data source with 8 billion triples of scholarly data. In:
C. Ghidini, O. Hartig, M. Maleshkova, V. Svátek, I.F. Cruz,
A. Hogan, J. Song, M. Lefrançois, F. Gandon (eds.) The
Semantic Web - ISWC 2019 - 18th International Seman-
tic Web Conference, Auckland, New Zealand, October 26-

https://doi.org/10.1045/january2017-burton
https://doi.org/10.1045/january2017-burton
http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879
http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1879
https://www.cbinsights.com/research/team-blog/data-network-effects/
https://www.cbinsights.com/research/team-blog/data-network-effects/
https://www.cbinsights.com/research/team-blog/data-network-effects/
https://doi.org/10.18653/v1/n19-1361
https://doi.org/10.18653/v1/n19-1361
https://doi.org/10.18653/v1/D19-1383
https://doi.org/10.1186/s12859-017-1775-9
https://doi.org/10.1186/s12859-017-1775-9
https://doi.org/10.1093/nar/gky1055
https://doi.org/10.1093/nar/gky1055
https://doi.org/10.3233/SW-150177
https://doi.org/10.3233/SW-150177
http://www.lrec-conf.org/proceedings/lrec2012/summaries/734.html
http://www.lrec-conf.org/proceedings/lrec2012/summaries/734.html
https://doi.org/10.1145/3132218.3132219
https://doi.org/10.1145/3132218.3132219
https://doi.org/10.1093/nar/gkm791
https://doi.org/10.1093/nar/gkm791
https://www.aclweb.org/anthology/I17-2052/
https://www.aclweb.org/anthology/I17-2052/
https://doi.org/10.1007/978-3-030-62466-8_9
https://doi.org/10.1007/978-3-030-62466-8_9
https://doi.org/10.18653/v1/n19-1423
https://doi.org/10.18653/v1/n19-1423
https://doi.org/10.1016/j.jbi.2013.12.006
https://doi.org/10.1016/j.jbi.2013.12.006
https://doi.org/10.1145/2623330.2623623
https://doi.org/10.1145/2623330.2623623
https://www.aclweb.org/anthology/2020.lrec-1.268/
https://www.aclweb.org/anthology/2020.lrec-1.268/


Requirements Analysis for an Open Research Knowledge Graph 17

30, 2019, Proceedings, Part II, Lecture Notes in Computer
Science, vol. 11779, pp. 113–129. Springer (2019). DOI
10.1007/978-3-030-30796-7\ 8. URL https://doi.org/
10.1007/978-3-030-30796-7_8

38. Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data
quality of dbpedia, freebase, opencyc, wikidata, and YAGO. Se-
mantic Web 9(1), 77–129 (2018). DOI 10.3233/SW-170275.
URL https://doi.org/10.3233/SW-170275

39. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowl-
edge graph representing research findings by semantifying sur-
vey articles. In: J. Kamps, G. Tsakonas, Y. Manolopoulos, L.S.
Iliadis, I. Karydis (eds.) Research and Advanced Technology for
Digital Libraries - 21st International Conference on Theory and
Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece,
September 18-21, 2017, Proceedings, Lecture Notes in Com-
puter Science, vol. 10450, pp. 315–327. Springer (2017). DOI
10.1007/978-3-319-67008-9\ 25. URL https://doi.org/
10.1007/978-3-319-67008-9_25

40. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database.
Language, Speech, and Communication. MIT Press, Cambridge,
MA (1998)

41. Fink, A.: Conducting Research Literature Reviews: From the In-
ternet to Paper. SAGE Publications (2014)

42. Fisas, B., Saggion, H., Ronzano, F.: On the discoursive struc-
ture of computer graphics research papers. In: A. Meyers, I. Re-
hbein, H. Zinsmeister (eds.) Proceedings of The 9th Linguis-
tic Annotation Workshop, LAW@NAACL-HLT 2015, June 5,
2015, Denver, Colorado, USA, pp. 42–51. The Association for
Computer Linguistics (2015). DOI 10.3115/v1/w15-1605. URL
https://doi.org/10.3115/v1/w15-1605

43. Friedrich, A., Adel, H., Tomazic, F., Hingerl, J., Benteau, R.,
Marusczyk, A., Lange, L.: The sofc-exp corpus and neural ap-
proaches to information extraction in the materials science do-
main. In: D. Jurafsky, J. Chai, N. Schluter, J.R. Tetreault (eds.)
Proceedings of the 58th Annual Meeting of the Association
for Computational Linguistics, ACL 2020, Online, July 5-10,
2020, pp. 1255–1268. Association for Computational Linguistics
(2020). DOI 10.18653/v1/2020.acl-main.116. URL https:
//doi.org/10.18653/v1/2020.acl-main.116

44. Gábor, K., Buscaldi, D., Schumann, A., QasemiZadeh, B.,
Zargayouna, H., Charnois, T.: Semeval-2018 task 7: Seman-
tic relation extraction and classification in scientific papers.
In: M. Apidianaki, S.M. Mohammad, J. May, E. Shutova,
S. Bethard, M. Carpuat (eds.) Proceedings of The 12th Interna-
tional Workshop on Semantic Evaluation, SemEval@NAACL-
HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018,
pp. 679–688. Association for Computational Linguistics (2018).
DOI 10.18653/v1/s18-1111. URL https://doi.org/10.
18653/v1/s18-1111

45. Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Pre-
dicting completeness in knowledge bases. In: M. de Rijke,
M. Shokouhi, A. Tomkins, M. Zhang (eds.) Proceedings of the
Tenth ACM International Conference on Web Search and Data
Mining, WSDM 2017, Cambridge, United Kingdom, February
6-10, 2017, pp. 375–383. ACM (2017). DOI 10.1145/3018661.
3018739. URL https://doi.org/10.1145/3018661.
3018739

46. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE:
association rule mining under incomplete evidence in ontologi-
cal knowledge bases. In: D. Schwabe, V.A.F. Almeida, H. Glaser,
R. Baeza-Yates, S.B. Moon (eds.) 22nd International World Wide
Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17,
2013, pp. 413–422. International World Wide Web Conferences
Steering Committee / ACM (2013). DOI 10.1145/2488388.
2488425. URL https://doi.org/10.1145/2488388.
2488425

47. Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for
sentence classification in biomedical and computer science ab-
stracts. Neural Comput. Appl. 32(11), 6793–6807 (2020). DOI
10.1007/s00521-019-04334-2. URL https://doi.org/
10.1007/s00521-019-04334-2

48. Groza, T., Handschuh, S., Möller, K., Decker, S.: SALT - seman-
tically annotated latex for scientific publications. In: E. Fran-
coni, M. Kifer, W. May (eds.) The Semantic Web: Research and
Applications, 4th European Semantic Web Conference, ESWC
2007, Innsbruck, Austria, June 3-7, 2007, Proceedings, Lecture
Notes in Computer Science, vol. 4519, pp. 518–532. Springer
(2007). DOI 10.1007/978-3-540-72667-8\ 37. URL https:
//doi.org/10.1007/978-3-540-72667-8_37

49. Hars, A.: Structure of scientific knowledge, pp. 83–185. Springer
Berlin Heidelberg, Berlin, Heidelberg (2003). DOI 10.
1007/978-3-540-24737-1 3. URL https://doi.org/10.
1007/978-3-540-24737-1_3

50. Hevner, A.R., March, S.T., Park, J., Ram, S.: De-
sign science in information systems research. MIS Q.
28(1), 75–105 (2004). URL http://misq.org/
design-science-in-information-systems-research.
html

51. Hoppe, A., Hagen, J., Holzmann, H., Kniesel, G., Ewerth,
R.: An analytics tool for exploring scientific software and re-
lated publications. In: E. Méndez, F. Crestani, C. Ribeiro,
G. David, J.C. Lopes (eds.) Digital Libraries for Open Knowl-
edge, 22nd International Conference on Theory and Practice
of Digital Libraries, TPDL 2018, Porto, Portugal, Septem-
ber 10-13, 2018, Proceedings, Lecture Notes in Computer Sci-
ence, vol. 11057, pp. 299–303. Springer (2018). DOI 10.
1007/978-3-030-00066-0\ 27. URL https://doi.org/
10.1007/978-3-030-00066-0_27

52. Horvath, I.: Comparison of three methodological approaches of
design research. In: S.n. (ed.) Proceedings of the 16th Interna-
tional Conference on Engineering Design, ICED’07, pp. 1–11.
Ecole Central Paris (2007). Null ; Conference date: 28-08-2007
Through 30-08-2007

53. Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identifi-
cation of tasks, datasets, evaluation metrics, and numeric scores
for scientific leaderboards construction. In: A. Korhonen, D.R.
Traum, L. Màrquez (eds.) Proceedings of the 57th Conference of
the Association for Computational Linguistics, ACL 2019, Flo-
rence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp.
5203–5213. Association for Computational Linguistics (2019).
DOI 10.18653/v1/p19-1513. URL https://doi.org/10.
18653/v1/p19-1513

54. Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: Scirex: A
challenge dataset for document-level information extraction. In:
D. Jurafsky, J. Chai, N. Schluter, J.R. Tetreault (eds.) Proceed-
ings of the 58th Annual Meeting of the Association for Com-
putational Linguistics, ACL 2020, Online, July 5-10, 2020, pp.
7506–7516. Association for Computational Linguistics (2020).
DOI 10.18653/v1/2020.acl-main.670. URL https://doi.
org/10.18653/v1/2020.acl-main.670

55. Jaradeh, M.Y., Oelen, A., Prinz, M., Stocker, M., Auer, S.:
Open research knowledge graph: A system walkthrough. In:
A. Doucet, A. Isaac, K. Golub, T. Aalberg, A. Jatowt (eds.) Dig-
ital Libraries for Open Knowledge - 23rd International Confer-
ence on Theory and Practice of Digital Libraries, TPDL 2019,
Oslo, Norway, September 9-12, 2019, Proceedings, Lecture
Notes in Computer Science, vol. 11799, pp. 348–351. Springer
(2019). DOI 10.1007/978-3-030-30760-8\ 31. URL https:
//doi.org/10.1007/978-3-030-30760-8_31

56. Jia, R., Wong, C., Poon, H.: Document-level n-ary relation ex-
traction with multiscale representation learning. In: J. Burstein,
C. Doran, T. Solorio (eds.) Proceedings of the 2019 Conference

https://doi.org/10.1007/978-3-030-30796-7_8
https://doi.org/10.1007/978-3-030-30796-7_8
https://doi.org/10.3233/SW-170275
https://doi.org/10.1007/978-3-319-67008-9_25
https://doi.org/10.1007/978-3-319-67008-9_25
https://doi.org/10.3115/v1/w15-1605
https://doi.org/10.18653/v1/2020.acl-main.116
https://doi.org/10.18653/v1/2020.acl-main.116
https://doi.org/10.18653/v1/s18-1111
https://doi.org/10.18653/v1/s18-1111
https://doi.org/10.1145/3018661.3018739
https://doi.org/10.1145/3018661.3018739
https://doi.org/10.1145/2488388.2488425
https://doi.org/10.1145/2488388.2488425
https://doi.org/10.1007/s00521-019-04334-2
https://doi.org/10.1007/s00521-019-04334-2
https://doi.org/10.1007/978-3-540-72667-8_37
https://doi.org/10.1007/978-3-540-72667-8_37
https://doi.org/10.1007/978-3-540-24737-1_3
https://doi.org/10.1007/978-3-540-24737-1_3
http://misq.org/design-science-in-information-systems-research.html
http://misq.org/design-science-in-information-systems-research.html
http://misq.org/design-science-in-information-systems-research.html
https://doi.org/10.1007/978-3-030-00066-0_27
https://doi.org/10.1007/978-3-030-00066-0_27
https://doi.org/10.18653/v1/p19-1513
https://doi.org/10.18653/v1/p19-1513
https://doi.org/10.18653/v1/2020.acl-main.670
https://doi.org/10.18653/v1/2020.acl-main.670
https://doi.org/10.1007/978-3-030-30760-8_31
https://doi.org/10.1007/978-3-030-30760-8_31


18 Brack et al.

of the North American Chapter of the Association for Compu-
tational Linguistics: Human Language Technologies, NAACL-
HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1
(Long and Short Papers), pp. 3693–3704. Association for Com-
putational Linguistics (2019). DOI 10.18653/v1/n19-1370. URL
https://doi.org/10.18653/v1/n19-1370

57. Kannan, A.V., Fradkin, D., Akrotirianakis, I., Kulahcioglu, T.,
Canedo, A., Roy, A., Yu, S., Malawade, A.V., Faruque, M.A.A.:
Multimodal knowledge graph for deep learning papers and code.
In: M. d’Aquin, S. Dietze, C. Hauff, E. Curry, P. Cudré-Mauroux
(eds.) CIKM ’20: The 29th ACM International Conference on
Information and Knowledge Management, Virtual Event, Ire-
land, October 19-23, 2020, pp. 3417–3420. ACM (2020). DOI
10.1145/3340531.3417439. URL https://doi.org/10.
1145/3340531.3417439

58. Kardas, M., Czapla, P., Stenetorp, P., Ruder, S., Riedel, S.,
Taylor, R., Stojnic, R.: Axcell: Automatic extraction of results
from machine learning papers. In: B. Webber, T. Cohn, Y. He,
Y. Liu (eds.) Proceedings of the 2020 Conference on Empir-
ical Methods in Natural Language Processing, EMNLP 2020,
Online, November 16-20, 2020, pp. 8580–8594. Association
for Computational Linguistics (2020). DOI 10.18653/v1/2020.
emnlp-main.692. URL https://doi.org/10.18653/
v1/2020.emnlp-main.692

59. Kim, S., Martı́nez, D., Cavedon, L., Yencken, L.: Auto-
matic classification of sentences to support evidence based
medicine. BMC Bioinform. 12(S-2), S5 (2011). DOI 10.1186/
1471-2105-12-S2-S5. URL https://doi.org/10.1186/
1471-2105-12-S2-S5

60. Kitchenham, B.A., Charters, S.: Guidelines for per-
forming systematic literature reviews in software engi-
neering. Tech. Rep. EBSE 2007-001, Keele Univer-
sity and Durham University Joint Report (2007). URL
https://www.elsevier.com/__data/promis_
misc/525444systematicreviewsguide.pdf

61. Klampanos, I.A., Davvetas, A., Koukourikos, A., Karkaletsis,
V.: ANNETT-O: an ontology for describing artificial neural
network evaluation, topology and training. Int. J. Metadata
Semant. Ontologies 13(3), 179–190 (2019). DOI 10.1504/
IJMSO.2019.099833. URL https://doi.org/10.1504/
IJMSO.2019.099833

62. Kolitsas, N., Ganea, O., Hofmann, T.: End-to-end neural entity
linking. In: A. Korhonen, I. Titov (eds.) Proceedings of the
22nd Conference on Computational Natural Language Learn-
ing, CoNLL 2018, Brussels, Belgium, October 31 - Novem-
ber 1, 2018, pp. 519–529. Association for Computational Lin-
guistics (2018). DOI 10.18653/v1/k18-1050. URL https:
//doi.org/10.18653/v1/k18-1050

63. Kringelum, J., Kjærulff, S.K., Brunak, S., Lund, O., Oprea, T.I.,
Taboureau, O.: Chemprot-3.0: a global chemical biology diseases
mapping. Database J. Biol. Databases Curation 2016 (2016).
DOI 10.1093/database/bav123. URL https://doi.org/
10.1093/database/bav123

64. Lange, C.: Ontologies and languages for representing mathe-
matical knowledge on the semantic web. Semantic Web 4(2),
119–158 (2013). DOI 10.3233/SW-2012-0059. URL https:
//doi.org/10.3233/SW-2012-0059

65. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D.,
Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer,
S., Bizer, C.: Dbpedia - A large-scale, multilingual knowledge
base extracted from wikipedia. Semantic Web 6(2), 167–195
(2015). DOI 10.3233/SW-140134. URL https://doi.org/
10.3233/SW-140134

66. Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C., Leaman, R.,
Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative
V CDR task corpus: a resource for chemical disease relation
extraction. Database J. Biol. Databases Curation 2016 (2016).

DOI 10.1093/database/baw068. URL https://doi.org/
10.1093/database/baw068

67. Liakata, M., Saha, S., Dobnik, S., Batchelor, C.R., Rebholz-
Schuhmann, D.: Automatic recognition of conceptualization
zones in scientific articles and two life science applica-
tions. Bioinform. 28(7), 991–1000 (2012). DOI 10.
1093/bioinformatics/bts071. URL https://doi.org/10.
1093/bioinformatics/bts071

68. Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.:
Corpora for the conceptualisation and zoning of scien-
tific papers. In: N. Calzolari, K. Choukri, B. Maegaard,
J. Mariani, J. Odijk, S. Piperidis, M. Rosner, D. Tapias
(eds.) Proceedings of the International Conference on Lan-
guage Resources and Evaluation, LREC 2010, 17-23 May
2010, Valletta, Malta. European Language Resources Asso-
ciation (2010). URL http://www.lrec-conf.org/
proceedings/lrec2010/summaries/644.html

69. Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.:
S2ORC: the semantic scholar open research corpus. In: D. Ju-
rafsky, J. Chai, N. Schluter, J.R. Tetreault (eds.) Proceedings
of the 58th Annual Meeting of the Association for Computa-
tional Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 4969–
4983. Association for Computational Linguistics (2020). DOI
10.18653/v1/2020.acl-main.447. URL https://doi.org/
10.18653/v1/2020.acl-main.447

70. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task iden-
tification of entities, relations, and coreference for scientific
knowledge graph construction. In: E. Riloff, D. Chiang, J. Hock-
enmaier, J. Tsujii (eds.) Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing, Brussels,
Belgium, October 31 - November 4, 2018, pp. 3219–3232. Asso-
ciation for Computational Linguistics (2018). DOI 10.18653/
v1/d18-1360. URL https://doi.org/10.18653/v1/
d18-1360

71. Lubani, M., Noah, S.A.M., Mahmud, R.: Ontology population:
Approaches and design aspects. J. Inf. Sci. 45(4) (2019). DOI
10.1177/0165551518801819. URL https://doi.org/10.
1177/0165551518801819

72. Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N.,
Schirrwagen, J., Principe, P.: The openaire research graph data
model (2019). DOI 10.5281/zenodo.2643199. URL https:
//doi.org/10.5281/zenodo.2643199

73. Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben,
G.: Semantic annotation of data processing pipelines in scien-
tific publications. In: E. Blomqvist, D. Maynard, A. Gangemi,
R. Hoekstra, P. Hitzler, O. Hartig (eds.) The Semantic Web -
14th International Conference, ESWC 2017, Portorož, Slove-
nia, May 28 - June 1, 2017, Proceedings, Part I, Lecture Notes
in Computer Science, vol. 10249, pp. 321–336 (2017). DOI
10.1007/978-3-319-58068-5\ 20. URL https://doi.org/
10.1007/978-3-319-58068-5_20

74. Nasar, Z., Jaffry, S.W., Malik, M.K.: Information extraction from
scientific articles: a survey. Scientometrics 117(3), 1931–1990
(2018). DOI 10.1007/s11192-018-2921-5. URL https://
doi.org/10.1007/s11192-018-2921-5

75. Nguyen, V.B., Svátek, V., Rabby, G., Corcho, Ó.: Ontolo-
gies supporting research-related information foraging using
knowledge graphs: Literature survey and holistic model map-
ping. In: C.M. Keet, M. Dumontier (eds.) Knowledge
Engineering and Knowledge Management - 22nd Interna-
tional Conference, EKAW 2020, Bolzano, Italy, September
16-20, 2020, Proceedings, Lecture Notes in Computer Sci-
ence, vol. 12387, pp. 88–103. Springer (2020). DOI 10.
1007/978-3-030-61244-3\ 6. URL https://doi.org/
10.1007/978-3-030-61244-3_6

76. Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A re-
view of relational machine learning for knowledge graphs.

https://doi.org/10.18653/v1/n19-1370
https://doi.org/10.1145/3340531.3417439
https://doi.org/10.1145/3340531.3417439
https://doi.org/10.18653/v1/2020.emnlp-main.692
https://doi.org/10.18653/v1/2020.emnlp-main.692
https://doi.org/10.1186/1471-2105-12-S2-S5
https://doi.org/10.1186/1471-2105-12-S2-S5
https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf
https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf
https://doi.org/10.1504/IJMSO.2019.099833
https://doi.org/10.1504/IJMSO.2019.099833
https://doi.org/10.18653/v1/k18-1050
https://doi.org/10.18653/v1/k18-1050
https://doi.org/10.1093/database/bav123
https://doi.org/10.1093/database/bav123
https://doi.org/10.3233/SW-2012-0059
https://doi.org/10.3233/SW-2012-0059
https://doi.org/10.3233/SW-140134
https://doi.org/10.3233/SW-140134
https://doi.org/10.1093/database/baw068
https://doi.org/10.1093/database/baw068
https://doi.org/10.1093/bioinformatics/bts071
https://doi.org/10.1093/bioinformatics/bts071
http://www.lrec-conf.org/proceedings/lrec2010/summaries/644.html
http://www.lrec-conf.org/proceedings/lrec2010/summaries/644.html
https://doi.org/10.18653/v1/2020.acl-main.447
https://doi.org/10.18653/v1/2020.acl-main.447
https://doi.org/10.18653/v1/d18-1360
https://doi.org/10.18653/v1/d18-1360
https://doi.org/10.1177/0165551518801819
https://doi.org/10.1177/0165551518801819
https://doi.org/10.5281/zenodo.2643199
https://doi.org/10.5281/zenodo.2643199
https://doi.org/10.1007/978-3-319-58068-5_20
https://doi.org/10.1007/978-3-319-58068-5_20
https://doi.org/10.1007/s11192-018-2921-5
https://doi.org/10.1007/s11192-018-2921-5
https://doi.org/10.1007/978-3-030-61244-3_6
https://doi.org/10.1007/978-3-030-61244-3_6


Requirements Analysis for an Open Research Knowledge Graph 19

Proc. IEEE 104(1), 11–33 (2016). DOI 10.1109/JPROC.2015.
2483592. URL https://doi.org/10.1109/JPROC.
2015.2483592

77. Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate
FAIR literature surveys with scholarly knowledge graphs. In:
R. Huang, D. Wu, G. Marchionini, D. He, S.J. Cunningham,
P. Hansen (eds.) JCDL ’20: Proceedings of the ACM/IEEE
Joint Conference on Digital Libraries in 2020, Virtual Event,
China, August 1-5, 2020, pp. 97–106. ACM (2020). DOI
10.1145/3383583.3398520. URL https://doi.org/10.
1145/3383583.3398520

78. Okoli, C.: A guide to conducting a standalone systematic liter-
ature review. Commun. Assoc. Inf. Syst. 37, 43 (2015). URL
http://aisel.aisnet.org/cais/vol37/iss1/43

79. Papers with code. https://paperswithcode.com/. Ac-
cessed: 2019-09-12

80. Park, S., Caragea, C.: Scientific keyphrase identification and
classification by pre-trained language models intermediate task
transfer learning. In: D. Scott, N. Bel, C. Zong (eds.) Pro-
ceedings of the 28th International Conference on Computa-
tional Linguistics, COLING 2020, Barcelona, Spain (Online),
December 8-13, 2020, pp. 5409–5419. International Committee
on Computational Linguistics (2020). DOI 10.18653/v1/2020.
coling-main.472. URL https://doi.org/10.18653/
v1/2020.coling-main.472

81. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical nat-
ural language processing: An evaluation of BERT and ELMo
on ten benchmarking datasets. In: D. Demner-Fushman, K.B.
Cohen, S. Ananiadou, J. Tsujii (eds.) Proceedings of the 18th
BioNLP Workshop and Shared Task, BioNLP@ACL 2019, Flo-
rence, Italy, August 1, 2019, pp. 58–65. Association for Compu-
tational Linguistics (2019). DOI 10.18653/v1/w19-5006. URL
https://doi.org/10.18653/v1/w19-5006

82. Peroni, S., Shotton, D.M.: Fabio and cito: Ontologies for describ-
ing bibliographic resources and citations. J. Web Semant. 17, 33–
43 (2012). DOI 10.1016/j.websem.2012.08.001. URL https:
//doi.org/10.1016/j.websem.2012.08.001

83. Pertsas, V., Constantopoulos, P.: Scholarly ontology: mod-
elling scholarly practices. Int. J. Digit. Libr. 18(3), 173–190
(2017). DOI 10.1007/s00799-016-0169-3. URL https://
doi.org/10.1007/s00799-016-0169-3

84. Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavit-
sanos, E.: Ontology population and enrichment: State of the
art. In: G. Paliouras, C.D. Spyropoulos, G. Tsatsaronis (eds.)
Knowledge-Driven Multimedia Information Extraction and On-
tology Evolution - Bridging the Semantic Gap, Lecture Notes
in Computer Science, vol. 6050, pp. 134–166. Springer (2011).
DOI 10.1007/978-3-642-20795-2\ 6. URL https://doi.
org/10.1007/978-3-642-20795-2_6

85. Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelz-
imer, A., d’Alché-Buc, F., Fox, E.B., Larochelle, H.: Improving
reproducibility in machine learning research (A report from the
neurips 2019 reproducibility program). CoRR abs/2003.12206
(2020). URL https://arxiv.org/abs/2003.12206

86. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment.
Commun. ACM 45(4), 211–218 (2002). DOI 10.1145/505248.
506010. URL https://doi.org/10.1145/505248.
506010

87. Pujara, J., Singh, S.: Mining knowledge graphs from text. In:
Y. Chang, C. Zhai, Y. Liu, Y. Maarek (eds.) Proceedings of
the Eleventh ACM International Conference on Web Search and
Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February
5-9, 2018, pp. 789–790. ACM (2018). DOI 10.1145/3159652.
3162011. URL https://doi.org/10.1145/3159652.
3162011

88. Q. Zadeh, B., Handschuh, S.: The ACL RD-TEC: A dataset
for benchmarking terminology extraction and classification in

computational linguistics. In: Proceedings of the 4th Inter-
national Workshop on Computational Terminology (Comput-
erm), pp. 52–63. Association for Computational Linguistics
and Dublin City University, Dublin, Ireland (2014). DOI
10.3115/v1/W14-4807. URL https://www.aclweb.org/
anthology/W14-4807

89. QasemiZadeh, B., Schumann, A.: The ACL RD-TEC 2.0:
A language resource for evaluating term extraction and en-
tity recognition methods. In: N. Calzolari, K. Choukri,
T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mar-
iani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (eds.) Pro-
ceedings of the Tenth International Conference on Language
Resources and Evaluation LREC 2016, Portorož, Slovenia,
May 23-28, 2016. European Language Resources Association
(ELRA) (2016). URL http://www.lrec-conf.org/
proceedings/lrec2016/summaries/681.html

90. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,
000+ questions for machine comprehension of text. In: J. Su,
X. Carreras, K. Duh (eds.) Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Processing, EMNLP
2016, Austin, Texas, USA, November 1-4, 2016, pp. 2383–
2392. The Association for Computational Linguistics (2016).
DOI 10.18653/v1/d16-1264. URL https://doi.org/10.
18653/v1/d16-1264

91. Richardson, S., Wilson, M., Nishikawa, J., Hayward, R.: The
well-built clinical question: a key to evidence-based decisions.
ACP journal club 123(3), A12–13 (1995)

92. Ruiz-Iniesta, A., Corcho, Ó.: A review of ontologies for de-
scribing scholarly and scientific documents. In: A.G. Cas-
tro, C. Lange, P.W. Lord, R. Stevens (eds.) Proceedings of the
4th Workshop on Semantic Publishing co-located with the 11th
Extended Semantic Web Conference (ESWC 2014), Anissaras,
Greece, May 25th, 2014, CEUR Workshop Proceedings, vol.
1155. CEUR-WS.org (2014). URL http://ceur-ws.org/
Vol-1155/paper-07.pdf

93. Safder, I., Hassan, S., Visvizi, A., Noraset, T., Nawaz, R., Tu-
arob, S.: Deep learning-based extraction of algorithmic meta-
data in full-text scholarly documents. Inf. Process. Manag.
57(6), 102269 (2020). DOI 10.1016/j.ipm.2020.102269. URL
https://doi.org/10.1016/j.ipm.2020.102269

94. Salatino, A.A., Thanapalasingam, T., Mannocci, A., Birukou, A.,
Osborne, F., Motta, E.: The computer science ontology: A com-
prehensive automatically-generated taxonomy of research areas.
Data Intell. 2(3), 379–416 (2020). DOI 10.1162/dint\ a\ 00055.
URL https://doi.org/10.1162/dint_a_00055

95. Say, A., Fathalla, S., Vahdati, S., Lehmann, J., Auer, S.: Se-
mantic representation of physics research data. In: D. Aveiro,
J.L.G. Dietz, J. Filipe (eds.) Proceedings of the 12th Interna-
tional Joint Conference on Knowledge Discovery, Knowledge
Engineering and Knowledge Management, IC3K 2020, Volume
2: KEOD, Budapest, Hungary, November 2-4, 2020, pp. 64–75.
SCITEPRESS (2020). DOI 10.5220/0010111000640075. URL
https://doi.org/10.5220/0010111000640075

96. Singh, M., Barua, B., Palod, P., Garg, M., Satapathy, S., Bushi,
S., Ayush, K., Rohith, K.S., Gamidi, T., Goyal, P., Mukherjee,
A.: OCR++: A robust framework for information extraction from
scholarly articles. In: N. Calzolari, Y. Matsumoto, R. Prasad
(eds.) COLING 2016, 26th International Conference on Com-
putational Linguistics, Proceedings of the Conference: Techni-
cal Papers, December 11-16, 2016, Osaka, Japan, pp. 3390–
3400. ACL (2016). URL https://www.aclweb.org/
anthology/C16-1320/

97. Soldatova, L.N., King, R.D.: An ontology of scientific ex-
periments. Journal of The Royal Society Interface 3(11),
795–803 (2006). DOI 10.1098/rsif.2006.0134. URL
https://royalsocietypublishing.org/doi/
abs/10.1098/rsif.2006.0134

https://doi.org/10.1109/JPROC.2015.2483592
https://doi.org/10.1109/JPROC.2015.2483592
https://doi.org/10.1145/3383583.3398520
https://doi.org/10.1145/3383583.3398520
http://aisel.aisnet.org/cais/vol37/iss1/43
https://paperswithcode.com/
https://doi.org/10.18653/v1/2020.coling-main.472
https://doi.org/10.18653/v1/2020.coling-main.472
https://doi.org/10.18653/v1/w19-5006
https://doi.org/10.1016/j.websem.2012.08.001
https://doi.org/10.1016/j.websem.2012.08.001
https://doi.org/10.1007/s00799-016-0169-3
https://doi.org/10.1007/s00799-016-0169-3
https://doi.org/10.1007/978-3-642-20795-2_6
https://doi.org/10.1007/978-3-642-20795-2_6
https://arxiv.org/abs/2003.12206
https://doi.org/10.1145/505248.506010
https://doi.org/10.1145/505248.506010
https://doi.org/10.1145/3159652.3162011
https://doi.org/10.1145/3159652.3162011
https://www.aclweb.org/anthology/W14-4807
https://www.aclweb.org/anthology/W14-4807
http://www.lrec-conf.org/proceedings/lrec2016/summaries/681.html
http://www.lrec-conf.org/proceedings/lrec2016/summaries/681.html
https://doi.org/10.18653/v1/d16-1264
https://doi.org/10.18653/v1/d16-1264
http://ceur-ws.org/Vol-1155/paper-07.pdf
http://ceur-ws.org/Vol-1155/paper-07.pdf
https://doi.org/10.1016/j.ipm.2020.102269
https://doi.org/10.1162/dint_a_00055
https://doi.org/10.5220/0010111000640075
https://www.aclweb.org/anthology/C16-1320/
https://www.aclweb.org/anthology/C16-1320/
https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2006.0134
https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2006.0134


20 Brack et al.

98. Stead, C., Smith, S., Busch, P.A., Vatanasakdakul, S.: Emerald
110k: A multidisciplinary dataset for abstract sentence classifica-
tion. In: M. Mistica, M. Piccardi, A. MacKinlay (eds.) Proceed-
ings of the The 17th Annual Workshop of the Australasian Lan-
guage Technology Association, ALTA 2019, Sydney, Australia,
December 4-6, 2019, pp. 120–125. Australasian Language Tech-
nology Association (2019). URL https://aclweb.org/
anthology/papers/U/U19/U19-1016/

99. Stocker, M., Prinz, M., Rostami, F., Kempf, T.: Towards research
infrastructures that curate scientific information: A use case in
life sciences. In: S. Auer, M. Vidal (eds.) Data Integration in
the Life Sciences - 13th International Conference, DILS 2018,
Hannover, Germany, November 20-21, 2018, Proceedings, Lec-
ture Notes in Computer Science, vol. 11371, pp. 61–74. Springer
(2018). DOI 10.1007/978-3-030-06016-9\ 6. URL https:
//doi.org/10.1007/978-3-030-06016-9_6

100. Suchanek, F.M., Gross-Amblard, D., Abiteboul, S.: Water-
marking for ontologies. In: L. Aroyo, C. Welty, H. Alani,
J. Taylor, A. Bernstein, L. Kagal, N.F. Noy, E. Blomqvist
(eds.) The Semantic Web - ISWC 2011 - 10th International
Semantic Web Conference, Bonn, Germany, October 23-27,
2011, Proceedings, Part I, Lecture Notes in Computer Sci-
ence, vol. 7031, pp. 697–713. Springer (2011). DOI 10.
1007/978-3-642-25073-6\ 44. URL https://doi.org/
10.1007/978-3-642-25073-6_44

101. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of se-
mantic knowledge. In: C.L. Williamson, M.E. Zurko, P.F. Patel-
Schneider, P.J. Shenoy (eds.) Proceedings of the 16th Interna-
tional Conference on World Wide Web, WWW 2007, Banff, Al-
berta, Canada, May 8-12, 2007, pp. 697–706. ACM (2007). DOI
10.1145/1242572.1242667. URL https://doi.org/10.
1145/1242572.1242667

102. Talburt, J.R.: 2 - principles of information quality. In:
J.R. Talburt (ed.) Entity Resolution and Information Qual-
ity, pp. 39 – 62. Morgan Kaufmann, Boston (2011).
DOI https://doi.org/10.1016/B978-0-12-381972-7.00002-6.
URL http://www.sciencedirect.com/science/
article/pii/B9780123819727000026

103. Teufel, S., Siddharthan, A., Batchelor, C.R.: Towards domain-
independent argumentative zoning: Evidence from chemistry and
computational linguistics. In: Proceedings of the 2009 Con-
ference on Empirical Methods in Natural Language Processing,
EMNLP 2009, 6-7 August 2009, Singapore, A meeting of SIG-
DAT, a Special Interest Group of the ACL, pp. 1493–1502. ACL
(2009). URL https://www.aclweb.org/anthology/
D09-1155/

104. Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.: Se-
mantic representation of scientific publications. In: A. Doucet,
A. Isaac, K. Golub, T. Aalberg, A. Jatowt (eds.) Digital Libraries
for Open Knowledge - 23rd International Conference on The-
ory and Practice of Digital Libraries, TPDL 2019, Oslo, Nor-
way, September 9-12, 2019, Proceedings, Lecture Notes in Com-
puter Science, vol. 11799, pp. 375–379. Springer (2019). DOI
10.1007/978-3-030-30760-8\ 37. URL https://doi.org/
10.1007/978-3-030-30760-8_37

105. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative
knowledgebase. Commun. ACM 57(10), 78–85 (2014). DOI
10.1145/2629489. URL https://doi.org/10.1145/
2629489

106. de Waard, A., Tel, G.: The ABCDE format enabling seman-
tic conference proceedings. In: M. Völkel, S. Schaffert (eds.)
SemWiki2006, First Workshop on Semantic Wikis - From Wiki
to Semantics, Proceedings, co-located with the ESWC2006,
Budva, Montenegro, June 12, 2006, CEUR Workshop Pro-
ceedings, vol. 206. CEUR-WS.org (2006). URL http://
ceur-ws.org/Vol-206/paper8.pdf

107. Wang, R.Y., Strong, D.M.: Beyond accuracy: What data qual-
ity means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33
(1996). URL http://www.jmis-web.org/articles/
1002

108. Weikum, G., Dong, L., Razniewski, S., Suchanek, F.M.: Machine
knowledge: Creation and curation of comprehensive knowledge
bases. CoRR abs/2009.11564 (2020). URL https://arxiv.
org/abs/2009.11564

109. Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for
academic search via knowledge graph embedding. In: R. Barrett,
R. Cummings, E. Agichtein, E. Gabrilovich (eds.) Proceedings of
the 26th International Conference on World Wide Web, WWW
2017, Perth, Australia, April 3-7, 2017, pp. 1271–1279. ACM
(2017). DOI 10.1145/3038912.3052558. URL https://doi.
org/10.1145/3038912.3052558

110. Yaman, B., Pasin, M., Freudenberg, M.: Interlinking scigraph
and dbpedia datasets using link discovery and named entity
recognition techniques. In: M. Eskevich, G. de Melo, C. Fäth,
J.P. McCrae, P. Buitelaar, C. Chiarcos, B. Klimek, M. Dojchi-
novski (eds.) 2nd Conference on Language, Data and Knowl-
edge, LDK 2019, May 20-23, 2019, Leipzig, Germany, OASICS,
vol. 70, pp. 15:1–15:8. Schloss Dagstuhl - Leibniz-Zentrum für
Informatik (2019). DOI 10.4230/OASIcs.LDK.2019.15. URL
https://doi.org/10.4230/OASIcs.LDK.2019.15

111. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J.,
Auer, S.: Quality assessment for linked data: A survey. Seman-
tic Web 7(1), 63–93 (2016). DOI 10.3233/SW-150175. URL
https://doi.org/10.3233/SW-150175

112. Zhang, Y., Wang, M., Saberi, M., Chang, E.: From big schol-
arly data to solution-oriented knowledge repository. Frontiers
Big Data 2, 38 (2019). DOI 10.3389/fdata.2019.00038. URL
https://doi.org/10.3389/fdata.2019.00038

https://aclweb.org/anthology/papers/U/U19/U19-1016/
https://aclweb.org/anthology/papers/U/U19/U19-1016/
https://doi.org/10.1007/978-3-030-06016-9_6
https://doi.org/10.1007/978-3-030-06016-9_6
https://doi.org/10.1007/978-3-642-25073-6_44
https://doi.org/10.1007/978-3-642-25073-6_44
https://doi.org/10.1145/1242572.1242667
https://doi.org/10.1145/1242572.1242667
http://www.sciencedirect.com/science/article/pii/B9780123819727000026
http://www.sciencedirect.com/science/article/pii/B9780123819727000026
https://www.aclweb.org/anthology/D09-1155/
https://www.aclweb.org/anthology/D09-1155/
https://doi.org/10.1007/978-3-030-30760-8_37
https://doi.org/10.1007/978-3-030-30760-8_37
https://doi.org/10.1145/2629489
https://doi.org/10.1145/2629489
http://ceur-ws.org/Vol-206/paper8.pdf
http://ceur-ws.org/Vol-206/paper8.pdf
http://www.jmis-web.org/articles/1002
http://www.jmis-web.org/articles/1002
https://arxiv.org/abs/2009.11564
https://arxiv.org/abs/2009.11564
https://doi.org/10.1145/3038912.3052558
https://doi.org/10.1145/3038912.3052558
https://doi.org/10.4230/OASIcs.LDK.2019.15
https://doi.org/10.3233/SW-150175
https://doi.org/10.3389/fdata.2019.00038

1 Introduction
2 Related work
3 Requirements analysis
4 ORKG construction strategies
5 Conclusions
A Comparative Overviews for Information Extraction Datasets from Scientific Text