Note

Created: 2020-07-23 10:28:20 Platform: Email Email
Title Unpaywall Schema
Note: Data FormatOverview The database snapshot, <http://unpaywall.org/products/snapshot>Simple Query Tool, <http://unpaywall.org/products/simple-query-tool>REST API, <http://unpaywall.org/products/api> and Data Feed <http://unpaywall.org/products/data-feed> products all return JSON-formatted data. For simplicity, that data is organized under the same schema in all cases; that schema is informally described on this page. Regardless of the source, each record returned consists of one DOI Object, <http://unpaywall.org/data-format#doi-object> containing resource metadata. Each DOI Object in turn contains a list of zero or more OA Location Objects. <http://unpaywall.org/data-format#oa-location-object> DOI object The DOI object is more or less a row in our main database...it's everything we know about a given DOI-assigned resource, including metadata about the resource itself, and information about its OA status. It includes a list of zero or more OA Location Objects <http://unpaywall.org/data-format#oa-location-object>, as well as a best_oa_location property that's probably the OA Location you'll want to use. best_oa_location Object|null The best OA Location Object <http://unpaywall.org/data-format#oa-location-object> we could find for this DOI. The "best" location is determined using an algorithm that prioritizes publisher-hosted content first (eg Hybrid or Gold), then prioritizes versions closer to the version of record (PublishedVersion over AcceptedVersion), then more authoritative repositories (PubMed Central over CiteSeerX). Returns null if we couldn't find any OA Locations. data_standard Integer Indicates the data collection approaches used for this resource. Possible values 1 First-generation hybrid detection. Uses only data from the Crossref API to determine hybrid status. Does a good job for Elsevier articles and a few other publishers, but most publishers are not checked for hybrid. 2 Second-generation hybrid detection. Uses additional sources, checks all publishers for hybrid. Gets about 10x as much hybrid. data_standard==2 is the version used in the paper we wrote about the dataset. doi String The DOI of this resource. This is always lowercase. doi_url String The DOI in hyperlink form. This field simply contains "https://doi.org/" prepended to the doi field. It expresses the DOI in its correct format according to the Crossref DOI display guidelines. <https://www.crossref.org/display-guidelines/> genre String The type of resource. Currently the genre is identical to the Crossref-reported type <https://api.crossref.org/types> of a given resource. The "journal-article" type is most common, but there are many others. is_paratext Boolean Is the item an ancillary part of a journal, like a table of contents? See here for more information on how we determine whether an article is paratext. <https://support.unpaywall.org/support/solutions/articles/44001894783> is_oa Boolean Is there an OA copy of this resource. Convenience attribute; returns true when best_oa_location is not null. journal_is_in_doaj Boolean Is this resource published in a DOAJ-indexed <https://doaj.org/> journal. Useful for defining whether a resource is Gold OA (depending on your definition, see also journal_is_oa). journal_is_oa Boolean Is this resource published in a completely OA journal. Useful for defining whether a resource is Gold OA. Includes any fully-OA journal, regardless of inclusion in DOAJ. This includes journals by all-OA publishers and journals that would otherwise be all Hybrid or Bronze OA. See here for more information on OA journals. <https://support.unpaywall.org/a/solutions/articles/44001792752-how-do-we-decide-if-a-given-journal-is-fully-oa-> journal_issns String Any ISSNs assigned to the journal publishing this resource. Separate ISSNs are sometimes assigned to print and electronic versions of the same journal. If there are multiple ISSNs, they are separated by commas. Example: 1232-1203,1532-6203 journal_issn_l String A single ISSN for the journal publishing this resource. An ISSN-L can be used as a primary key for a journal when more than one ISSN is assigned to it. Resources' journal_issns are mapped to ISSN-Ls using the issn.org table <https://www.issn.org/understanding-the-issn/assignment-rules/the-issn-l-for-publications-on-multiple-media/>, with some manual corrections. journal_name String The name of the journal publishing this resource. The same journal may have multiple name strings (eg, "J. Foo", "Journal of Foo", "JOURNAL OF FOO", etc). These have not been fully normalized within our database, so use with care. oa_locations List List of all the OA Location <http://unpaywall.org/data-format#oa-location-object> objects associated with this resource. This list is unnecessary for the vast majority of use-cases, since you probably just want the best_oa_location. It's included primarily for research purposes. oa_status String The OA status, or color, of this resource. Classifies OA resources by location and license terms as one of: gold, hybrid, bronze, green or closed. See here for more information on how we assign an oa_status. <https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-> published_date String|Null The date this resource was published. As reported by the publishers, who unfortunately have inconsistent definitions of what counts as officially "published." Returned as an ISO8601-formatted <https://xkcd.com/1179/> timestamp, generally with only year-month-day. publisher String The name of this resource's publisher. Keep in mind that publisher name strings change over time, particularly as publishers are acquired or split up. title String The title of this resource. It's the title. Pretty straightforward. updated String Time when the data for this resource was last updated. Returned as an ISO8601-formatted <https://xkcd.com/1179/> timestamp. Example: 2017-08-17T23:43:27.753663 year Integer|Null The year this resource was published. Just the year part of the published_date z_authors List of Crossref Contributor objects The authors of this resource. These are formatted as a list of Crossref Contributor objects, which are described in the Crossref API docs here. <https://github.com/CrossRef/rest-api-doc/blob/master/api_format.md#contributor> OA Location object The OA Location object describes particular place where we found a given OA article. The same article is often available from multiple locations, and there may be differences in format, version, and license depending on the location; the OA Location object describes these key attributes. An OA Location Object is always a Child of a DOI Object. <http://unpaywall.org/data-format#doi-object> evidence String How we found this OA location. Used for debugging. Don’t depend on the exact contents of this for anything, because values are subject to change without warning. Example values: oa journal (via journal title in doaj) We found the name of the journal that publishes this article in the DOAJ database. oa repository (via pmcid lookup) We found this article in an index of PubMed Central articles. host_type String The type of host that serves this OA location. There are two possible values: publisher means this location is served by the article’s publisher (in practice, this usually means it is hosted on the same domain the DOI resolves to). repository means this location is served by an Open Access repository. Preprint servers are considered repositories even if the DOI resolves there. is_best Boolean Is this location the best_oa_location for its resource. See the DOI object's best_oa_location description for more on how we select which location is "best." license String|Null The license under which this copy is published. We return several types of licenses: Creative Commons licenses are uniformly abbreviated and lowercased. Example: cc-by-nc Publisher-specific licenses are normalized using this format: acs-specific: authorchoice/editors choice usage agreement When we have evidence that an OA license of some kind was used, but it’s not reported directly on the webpage at this location, this field returns implied-oa pmh_id String|Null OAI-PMH endpoint where we found this location. This is primarily for internal debugging. It's Null for locations that weren't found using OAI-PMH. updated String Time when the data for this location was last updated. Returned as an ISO8601-formatted <https://xkcd.com/1179/> timestamp. Example: 2017-08-17T23:43:27.753663 url String The url_for_pdf if there is one; otherwise landing page URL. When we can't find a url_for_pdf (or there isn't one), this field uses the url_for_landing_page, which is a useful fallback for some use cases. url_for_landing_page String The URL for a landing page describing this OA copy. When the host_type is "publisher" the landing page usually includes HTML fulltext. url_for_pdf String|Null The URL with a PDF version of this OA copy. Pretty much what it says. version String The content version accessible at this location. We use the DRIVER Guidelines v2.0 VERSION standard <https://wiki.surfnet.nl/display/DRIVERguidelines/DRIVER-VERSION+Mappings> to define versions of a given article; see those docs for complete definitions of terms. Here's the basic idea, though, for the three version types we support: submittedVersion is not yet peer-reviewed. acceptedVersion is peer-reviewed, but lacks publisher-specific formatting. publishedVersion is the version of record.
Tags:
DOIs in Note:
Attachments