The database snapshot, <http://unpaywall.org/products/snapshot>Simple
Query Tool, <http://unpaywall.org/products/simple-query-tool>REST API,
<http://unpaywall.org/products/api> and Data Feed
<http://unpaywall.org/products/data-feed> products all return
JSON-formatted data. For simplicity, that data is organized under the
same schema in all cases; that schema is informally described on this
Regardless of the source, each record returned consists of one DOI
Object, <http://unpaywall.org/data-format#doi-object> containing
resource metadata. Each DOI Object in turn contains a list of zero or
more OA Location Objects.
The DOI object is more or less a row in our main database...it's
everything we know about a given DOI-assigned resource, including
metadata about the resource itself, and information about its OA status.
It includes a list of zero or more OA Location Objects
<http://unpaywall.org/data-format#oa-location-object>, as well as a
best_oa_location property that's probably the OA Location you'll want to
The best OA Location Object
<http://unpaywall.org/data-format#oa-location-object> we could find for
The "best" location is determined using an algorithm that prioritizes
publisher-hosted content first (eg Hybrid or Gold), then prioritizes
versions closer to the version of record (PublishedVersion over
AcceptedVersion), then more authoritative repositories (PubMed Central
Returns null if we couldn't find any OA Locations.
Indicates the data collection approaches used for this resource.
1 First-generation hybrid detection. Uses only data from the Crossref
API to determine hybrid status. Does a good job for Elsevier articles
and a few other publishers, but most publishers are not checked for
2 Second-generation hybrid detection. Uses additional sources, checks
all publishers for hybrid. Gets about 10x as much hybrid.
data_standard==2 is the version used in the paper we wrote about the
The DOI of this resource.
This is always lowercase.
The DOI in hyperlink form.
This field simply contains "https://doi.org/" prepended to the doi
field. It expresses the DOI in its correct format according to the
Crossref DOI display guidelines.
The type of resource.
Currently the genre is identical to the Crossref-reported type
<https://api.crossref.org/types> of a given resource. The
"journal-article" type is most common, but there are many others.
Is the item an ancillary part of a journal, like a table of contents?
See here for more information on how we determine whether an article is
Is there an OA copy of this resource.
Convenience attribute; returns true when best_oa_location is not null.
Is this resource published in a DOAJ-indexed <https://doaj.org/>
Useful for defining whether a resource is Gold OA (depending on your
definition, see also journal_is_oa).
Is this resource published in a completely OA journal.
Useful for defining whether a resource is Gold OA. Includes any fully-OA
journal, regardless of inclusion in DOAJ. This includes journals by
all-OA publishers and journals that would otherwise be all Hybrid or
Bronze OA. See here for more information on OA journals.
Any ISSNs assigned to the journal publishing this resource.
Separate ISSNs are sometimes assigned to print and electronic versions
of the same journal. If there are multiple ISSNs, they are separated by
commas. Example: 1232-1203,1532-6203
A single ISSN for the journal publishing this resource.
An ISSN-L can be used as a primary key for a journal when more than one
ISSN is assigned to it. Resources' journal_issns are mapped to ISSN-Ls
using the issn.org table
with some manual corrections.
The name of the journal publishing this resource.
The same journal may have multiple name strings (eg, "J. Foo", "Journal
of Foo", "JOURNAL OF FOO", etc). These have not been fully normalized
within our database, so use with care.
List of all the OA Location
<http://unpaywall.org/data-format#oa-location-object> objects associated
with this resource.
This list is unnecessary for the vast majority of use-cases, since you
probably just want the best_oa_location. It's included primarily for
The OA status, or color, of this resource.
Classifies OA resources by location and license terms as one of: gold,
hybrid, bronze, green or closed. See here for more information on how we
assign an oa_status.
The date this resource was published.
As reported by the publishers, who unfortunately have inconsistent
definitions of what counts as officially "published." Returned as an
ISO8601-formatted <https://xkcd.com/1179/> timestamp, generally with
The name of this resource's publisher.
Keep in mind that publisher name strings change over time, particularly
as publishers are acquired or split up.
The title of this resource.
It's the title. Pretty straightforward.
Time when the data for this resource was last updated.
Returned as an ISO8601-formatted <https://xkcd.com/1179/> timestamp.
The year this resource was published.
Just the year part of the published_date
List of Crossref Contributor objects
The authors of this resource.
These are formatted as a list of Crossref Contributor objects, which are
described in the Crossref API docs here.
OA Location object
The OA Location object describes particular place where we found a given
OA article. The same article is often available from multiple locations,
and there may be differences in format, version, and license depending
on the location; the OA Location object describes these key attributes.
An OA Location Object is always a Child of a DOI Object.
How we found this OA location.
Used for debugging. Don’t depend on the exact contents of this for
anything, because values are subject to change without warning. Example
oa journal (via journal title in doaj) We found the name of the journal
that publishes this article in the DOAJ database.
oa repository (via pmcid lookup) We found this article in an index of
PubMed Central articles.
The type of host that serves this OA location.
There are two possible values:
publisher means this location is served by the article’s publisher (in
practice, this usually means it is hosted on the same domain the DOI
repository means this location is served by an Open Access repository.
Preprint servers are considered repositories even if the DOI resolves
Is this location the best_oa_location for its resource.
See the DOI object's best_oa_location description for more on how we
select which location is "best."
The license under which this copy is published.
We return several types of licenses:
Creative Commons licenses are uniformly abbreviated and lowercased.
Publisher-specific licenses are normalized using this format:
acs-specific: authorchoice/editors choice usage agreement
When we have evidence that an OA license of some kind was used, but it’s
not reported directly on the webpage at this location, this field
OAI-PMH endpoint where we found this location.
This is primarily for internal debugging. It's Null for locations that
weren't found using OAI-PMH.
Time when the data for this location was last updated.
Returned as an ISO8601-formatted <https://xkcd.com/1179/> timestamp.
The url_for_pdf if there is one; otherwise landing page URL.
When we can't find a url_for_pdf (or there isn't one), this field uses
the url_for_landing_page, which is a useful fallback for some use cases.
The URL for a landing page describing this OA copy.
When the host_type is "publisher" the landing page usually includes HTML
The URL with a PDF version of this OA copy.
Pretty much what it says.
The content version accessible at this location.
We use the DRIVER Guidelines v2.0 VERSION standard
to define versions of a given article; see those docs for complete
definitions of terms. Here's the basic idea, though, for the three
version types we support:
submittedVersion is not yet peer-reviewed.
acceptedVersion is peer-reviewed, but lacks publisher-specific
publishedVersion is the version of record.
DOIs in Note:
Referred Content Text
URLs in Note: