API

The Pub2Tools API is consumed by sending a JSON request with HTTP POST. The main endpoint is /api, which on the public instance translates to https://elixir.ut.ee/pub2tools/api.

JSON numbers and booleans are converted to strings internally. JSON objects are ignored (except under bio.tools input), meaning there is no hierarchy in the request JSON structure.

/api

The main endpoint is used for constructing one bio.tools entry candidate based on the given input. It can execute two steps. The key-value pairs in the request JSON fall under two categories: query data and parameters.

Steps

withoutmap

Executes Pub2Tools and outputs tool, which (or parts of which) can be used as input into bio.tools.

The one mandatory input is publicationIds, which contains one or more articles about the tool/database for which a bio.tools entry is to be created. As an optional parameter, the name can also be specified. It is usually not needed, but it will help in the rare case, where the tool/database name in not in the abstract (and thus Pub2Tools is unable to find it), but also when Pub2Tools selects the wrong name or the wrong form of the name (capitalization, etc). The other optional parameter is webpageUrls. This can help, when the wanted links are not present in the publication (abstract or full text), but also when Pub2Tools fails to associate the extracted link with the tool name (and thus does not select the link).

This step is executed by setting step to "withoutmap".

map

Executes EDAMmap on the "tool" output by the previous withoutmap step.

Input to this step can only be specified as bio.tools input under "tool".

This step is executed by setting step to "map".

all

Executes both withoutmap and map in a row.

This step is executed by setting step to "all". This is the default.

Query data

The query data to be mapped can be supplied in two different ways: as strings or arrays of strings under field names name, webpageUrls and publicationIds or as a bio.tools input JSON object (like a bio.tools entry in JSON format). In case data is specified using both ways, only data under the bio.tools input is used.

Pub2Tools input

The following data can be given, with only the "publicationIds" being mandatory.

Key

Type

Description

name

string

Name of tool or service

webpageUrls

array of strings

URLs of homepage, etc

publicationIds

array of strings/objects

PMID/PMCID/DOI of journal article

Note: an article ID can be specified as a string "<PMID>\t<PMCID>\t<DOI>" or as an object (the only place besides bio.tools input where a JSON object is not ignored), wherein keys "pmid", "pmcid", "doi" can be used

bio.tools input

Under the field name "tool", a JSON object adhering to biotoolsSchema can be specified. All values possible in bio.tools can be specified, but only values relevant to Pub2Tools (and EDAMmap in case of the map step) will be used. A few attributes are mandatory: name (tool), description and homepage. In case of steps withoutmap and all also publication group has to be non-empty. In case of the map step, the "tool" input will be mirrored under tool in the response, but with found EDAM terms added to it. In case of withoutmap and all, "tool" will be overwritten by the output of Pub2Tools (plus EDAMmap in case of all) as tool in the response.

Parameters

Main

Parameter	Default	Description
version	`"1"`	API version. Currently, only one possible value: `"1"`.
type	`"core"`	Detail level of the response. Possible values: `"core"`, `"full"`. Currently only detail level of EDAMmap output (step `"map"`) is influenced.
step	`"all"`	The step to execute. Possible values: “withoutmap”, “map”, “all”.

Preprocessing

See EDAMmap API preprocessing parameters. Influences -pass1, -pass2, -map.

Fetching

The fetching parameters are implemented in PubFetcher and thus are described in its documentation: Fetching parameters. Influences -fetch-pub, -fetch-web, -pass2, -map.

The defaults of the following fetching parameters have been changed in Pub2Tools API: retryLimit from 3 to 0, timeout from 15000 to 7500 and quick from false to true.

Mapping

See EDAMmap API mapping parameters. Influences -map.

Response

The response output can contain more or less information, depending on the specified type and step. The section of most interest is probably tool in core.

core

success

true (if false, then the JSON output of Error handling applies instead of the one below)

version

"1"

type

"core"

api

URL of endpoint where request was sent

json

Location of JSON results file

generator

See generator in EDAMmap API

time

See time in EDAMmap API

query

id

Unique ID assigned to the query (and by extension, to this response)

name

Name of tool or service (as specified in query data, null if not specified)

webpageUrls

Array of strings representing URLs of homepage, etc (as specified in query data, null if not specified)

publicationIds

Array of objects representing IDs of journal articles (as specified in query data, mandatory)

pmid
PMID of article

pmcid
PMCID of article

doi
DOI of article

mapping

See mapping in EDAMmap API

Only present when step is "map" or "all"

args

The Parameters

mainArgs

Main parameters

edam: Filename of the used EDAM ontology OWL file
biotools: Filename of the JSON file containing existing bio.tools entries

processorArgs

See processorArgs in EDAMmap API

preProcessorArgs

Preprocessing parameters

fetcherArgs

Fetching parameters (implemented in PubFetcher)

mapperArgs

Mapping parameters

Only present when step is "map" or "all"

tool

The bio.tools entry candidate of the tool. In case of the map step, this will have the same content as in the "tool" given as input (with null and empty values removed), but with found EDAM terms added to it. In case of withoutmap and all, this will have the result of Pub2Tools as content (plus EDAM terms in case of all).

Concerning EDAM terms, EDAMmap results from the “topic” branch are added to the topic attribute and results from the “operation” branch are added under a new function group object. Results from the “data” and “format” branches should be added under the "input" and "output" attributes of a function group, however EDAMmap can’t differentiate between inputs and outputs. Thus, new terms from the “data” and “format” branches will be added as strings (in the form "EDAM URI (label)", separated by " | ") to the note of the last function group object.

status

Potentially useful metadata about the result of Pub2Tools, only present when step is "withoutmap" or "all"

score: score
score2: score2
score2Parts: score2_parts
include: include
existing: existing
publicationAndNameExisting: publication_and_name_existing
nameExistingSomePublicationDifferent: name_existing_some_publication_different
somePublicationExistingNameDifferent: some_publication_existing_name_different
nameExistingPublicationDifferent: name_existing_publication_different
nameMatch: name_match
linkMatch: link_match
nameWordMatch: name_word_match
homepageBroken: homepage_broken
homepageMissing: homepage_missing
otherNames: other_suggestions
toolsExtra: If Pub2Tools has found that the given publication(s) are about more than one tool, then the names of these extra tools (besides the primary chosen tool) are output here (along with their homepages in parenthesis, if existing)

full

See full in EDAMmap API.

This extra mapping information is only present when step is "map" or "all" and type is set to "full".

Examples

One way to test the API is to send JSON data using curl. For example, for sending the input:

{
  "publicationIds": "\t\t10.1093/nar/gkad347"
}

issue the command:

$ curl -H "Content-Type: application/json" -X POST -d '{"publicationIds":"\t\t10.1093/nar/gkad347"}' https://elixir.ut.ee/pub2tools/api

In the output, results can be seen under "tool":

"tool" : {
  "name" : "g:Profiler",
  ...
}

A bit longer input, also supplying a documentation URL that Pub2Tools doesn’t find and asking for a bit more EDAM terms from all branches:

{
  "publicationIds": "\t\t10.1093/nar/gkad347",
  "webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs",
  "branches": [ "topic", "operation", "data", "format" ],
  "matches": 6
}

For testing, this input could be saved in a file, e.g. input.json, and then the following command run:

$ curl -H "Content-Type: application/json" -X POST -d '@/path/to/input.json' https://elixir.ut.ee/pub2tools/api

The same input can be broken into two steps: first the Pub2Tools algorithm is run with "withoutmap" and then the EDAMmap algorithm is run with "map". This breaking into two steps can be useful, because both steps take time and this enables feedback already after the first step has concluded. Also, this enables manual editing of the Pub2Tools result that is fed into EDAMmap. The first step of "withoutmap" is then:

{
  "step": "withoutmap",
  "publicationIds": "\t\t10.1093/nar/gkad347",
  "webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs"
}

And then the output "tool" from the first step (after "description" is manually edited) can be copied into the second step of "map" as:

{
  "step": "map",
  "tool" : {
    "name" : "g:Profiler",
    "description" : "a web server for functional enrichment analysis and conversions of gene lists",
    "homepage" : "https://biit.cs.ut.ee/gprofiler",
    "documentation" : [ {
      "url" : "https://biit.cs.ut.ee/gprofiler/page/docs",
      "type" : [ "User manual" ]
    } ],
    "publication" : [ {
      "doi" : "10.1093/NAR/GKAD347",
      "pmid" : "37144459",
      "pmcid" : "PMC10320099"
    } ],
    "credit" : [ {
      "name" : "Hedi Peterson",
      "email" : "hedi.peterson@ut.ee",
      "orcidid" : "https://orcid.org/0000-0001-9951-5116",
      "typeEntity" : "Person"
    } ],
    "confidence_flag" : "high"
  },
  "branches": [ "topic", "operation", "data", "format" ],
  "matches": 6
}

Prefetching

See prefetching in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).

Error handling

See error handling in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).

Pub2Tools-Server

The Pub2Tools-Server application will run both the Pub2Tools API and a web application that functions as a frontend for the API.

All command-line arguments suppliable to a Pub2Tools server can be seen with:

$ java -jar pub2tools-server-<version>.jar -h

In addition to Processing and Fetching private parameters, Pub2Tools-Server accepts arguments described in the following table (entries marked with * are mandatory).

Parameter	Parameter args	Default	Description
`--biotools` *	<file path>		Path of the bio.tools existing content file in JSON format; will be automatically fetched and periodically updated
`--edam` or `-e` *	<file path>		Path of the EDAM ontology file
`--baseUri` or `-b`	<string>	`http://localhost:8080`	URI where the server will be deployed (as schema://host:port)
`--path` or `-p`	<string>	`/pub2tools`	Path where the server will be deployed (only one single path segment supported, prepend with ‘/’)
`--httpsProxy`			Use if we are behind a HTTPS proxy
`--files` or `-f` *	<directory path>		A directory where the results will be output. It must also contain required CSS, JavaScript and font resources. Will be created, if missing.
`--fetchingThreads`	<positive integer>	`8`	How many threads to create (maximum) for fetching individual database entries of one query

The results directory with required CSS, JavaScript and font resources will be automatically created, if a nonexistent directory path is supplied. Likewise, if --db is used to specify a nonexistent file, an initial empty database for storing fetched webpages, docs and publications is automatically created. And if a nonexistent file is specified using --biotools, the file is created and the entire content of bio.tools is downloaded to it. In any case, the file specified by --biotools is replaced with the up-to-date entire content of bio.tools every 23 hours.

Pub2Tools-Server can now be run with:

$ java -jar pub2tools-server-<version>.jar -b http://127.0.0.1:8080 -p /pub2tools -e EDAM_1.25.owl -f files --fetching true --db server.db --idf biotools.idf --idfStemmed biotools.stemmed.idf --biotools biotools.json --log serverlogs

The web application can now be accessed locally at http://127.0.0.1:8080/pub2tools and the API is at http://127.0.0.1:8080/pub2tools/api. How to obtain the IDF files biotools.idf and biotools.stemmed.idf is described in the setup section of EDAMmap. In contrast to the command line usage of Pub2Tools, the server will not log to a single log file, but with -l or --log a directory can be defined where log files, that are rotated daily, will be stored. The log directory will also contain daily rotated access logs compatible with Apache’s combined format.

A public instance of Pub2Tools-Server is accessible at https://elixir.ut.ee/pub2tools, with the API at https://elixir.ut.ee/pub2tools/api.