API

The Pub2Tools API is consumed by sending a JSON request with HTTP POST. The main endpoint is /api, which on the public instance translates to https://elixir.ut.ee/pub2tools/api.

JSON numbers and booleans are converted to strings internally. JSON objects are ignored (except under bio.tools input), meaning there is no hierarchy in the request JSON structure.

/api

The main endpoint is used for constructing one bio.tools entry candidate based on the given input. It can execute two steps. The key-value pairs in the request JSON fall under two categories: query data and parameters.

Steps

withoutmap

Executes Pub2Tools and outputs tool, which (or parts of which) can be used as input into bio.tools.

The one mandatory input is publicationIds, which contains one or more articles about the tool/database for which a bio.tools entry is to be created. As an optional parameter, the name can also be specified. It is usually not needed, but it will help in the rare case, where the tool/database name in not in the abstract (and thus Pub2Tools is unable to find it), but also when Pub2Tools selects the wrong name or the wrong form of the name (capitalization, etc). The other optional parameter is webpageUrls. This can help, when the wanted links are not present in the publication (abstract or full text), but also when Pub2Tools fails to associate the extracted link with the tool name (and thus does not select the link).

This step is executed by setting step to "withoutmap".

map

Executes EDAMmap on the "tool" output by the previous withoutmap step.

Input to this step can only be specified as bio.tools input under "tool".

This step is executed by setting step to "map".

all

Executes both withoutmap and map in a row.

This step is executed by setting step to "all". This is the default.

Query data

The query data to be mapped can be supplied in two different ways: as strings or arrays of strings under field names name, webpageUrls and publicationIds or as a bio.tools input JSON object (like a bio.tools entry in JSON format). In case data is specified using both ways, only data under the bio.tools input is used.

Pub2Tools input

The following data can be given, with only the "publicationIds" being mandatory.

Key

Type

Description

name

string

Name of tool or service

webpageUrls

array of strings

URLs of homepage, etc

publicationIds

array of strings/objects

PMID/PMCID/DOI of journal article

Note: an article ID can be specified as a string "<PMID>\t<PMCID>\t<DOI>" or as an object (the only place besides bio.tools input where a JSON object is not ignored), wherein keys "pmid", "pmcid", "doi" can be used

bio.tools input

Under the field name "tool", a JSON object adhering to biotoolsSchema can be specified. All values possible in bio.tools can be specified, but only values relevant to Pub2Tools (and EDAMmap in case of the map step) will be used. A few attributes are mandatory: name (tool), description and homepage. In case of steps withoutmap and all also publication group has to be non-empty. In case of the map step, the "tool" input will be mirrored under tool in the response, but with found EDAM terms added to it. In case of withoutmap and all, "tool" will be overwritten by the output of Pub2Tools (plus EDAMmap in case of all) as tool in the response.

Parameters

Main

Parameter

Default

Description

version

"1"

API version. Currently, only one possible value: "1".

type

"core"

Detail level of the response. Possible values: "core", "full". Currently only detail level of EDAMmap output (step "map") is influenced.

step

"all"

The step to execute. Possible values: “withoutmap”, “map”, “all”.

Preprocessing

See EDAMmap API preprocessing parameters. Influences -pass1, -pass2, -map.

Fetching

The fetching parameters are implemented in PubFetcher and thus are described in its documentation: Fetching parameters. Influences -fetch-pub, -fetch-web, -pass2, -map.

The defaults of the following fetching parameters have been changed in Pub2Tools API: retryLimit from 3 to 0, timeout from 15000 to 7500 and quick from false to true.

Mapping

See EDAMmap API mapping parameters. Influences -map.

Response

The response output can contain more or less information, depending on the specified type and step. The section of most interest is probably tool in core.

core

success

true (if false, then the JSON output of Error handling applies instead of the one below)

version

"1"

type

"core"

api

URL of endpoint where request was sent

json

Location of JSON results file

generator

See generator in EDAMmap API

time

See time in EDAMmap API

query
id

Unique ID assigned to the query (and by extension, to this response)

name

Name of tool or service (as specified in query data, null if not specified)

webpageUrls

Array of strings representing URLs of homepage, etc (as specified in query data, null if not specified)

publicationIds

Array of objects representing IDs of journal articles (as specified in query data, mandatory)

pmid

PMID of article

pmcid

PMCID of article

doi

DOI of article

mapping

See mapping in EDAMmap API

Only present when step is "map" or "all"

args

The Parameters

mainArgs

Main parameters

edam

Filename of the used EDAM ontology OWL file

biotools

Filename of the JSON file containing existing bio.tools entries

processorArgs

See processorArgs in EDAMmap API

preProcessorArgs

Preprocessing parameters

fetcherArgs

Fetching parameters (implemented in PubFetcher)

mapperArgs

Mapping parameters

Only present when step is "map" or "all"

tool

The bio.tools entry candidate of the tool. In case of the map step, this will have the same content as in the "tool" given as input (with null and empty values removed), but with found EDAM terms added to it. In case of withoutmap and all, this will have the result of Pub2Tools as content (plus EDAM terms in case of all).

Concerning EDAM terms, EDAMmap results from the “topic” branch are added to the topic attribute and results from the “operation” branch are added under a new function group object. Results from the “data” and “format” branches should be added under the "input" and "output" attributes of a function group, however EDAMmap can’t differentiate between inputs and outputs. Thus, new terms from the “data” and “format” branches will be added as strings (in the form "EDAM URI (label)", separated by " | ") to the note of the last function group object.

status

Potentially useful metadata about the result of Pub2Tools, only present when step is "withoutmap" or "all"

score

score

score2

score2

score2Parts

score2_parts

include

include

existing

existing

publicationAndNameExisting

publication_and_name_existing

nameExistingSomePublicationDifferent

name_existing_some_publication_different

somePublicationExistingNameDifferent

some_publication_existing_name_different

nameExistingPublicationDifferent

name_existing_publication_different

nameMatch

name_match

linkMatch

link_match

nameWordMatch

name_word_match

homepageBroken

homepage_broken

homepageMissing

homepage_missing

otherNames

other_suggestions

toolsExtra

If Pub2Tools has found that the given publication(s) are about more than one tool, then the names of these extra tools (besides the primary chosen tool) are output here (along with their homepages in parenthesis, if existing)

full

See full in EDAMmap API.

This extra mapping information is only present when step is "map" or "all" and type is set to "full".

Examples

One way to test the API is to send JSON data using curl. For example, for sending the input:

{
  "publicationIds": "\t\t10.1093/nar/gkad347"
}

issue the command:

$ curl -H "Content-Type: application/json" -X POST -d '{"publicationIds":"\t\t10.1093/nar/gkad347"}' https://elixir.ut.ee/pub2tools/api

In the output, results can be seen under "tool":

"tool" : {
  "name" : "g:Profiler",
  ...
}

A bit longer input, also supplying a documentation URL that Pub2Tools doesn’t find and asking for a bit more EDAM terms from all branches:

{
  "publicationIds": "\t\t10.1093/nar/gkad347",
  "webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs",
  "branches": [ "topic", "operation", "data", "format" ],
  "matches": 6
}

For testing, this input could be saved in a file, e.g. input.json, and then the following command run:

$ curl -H "Content-Type: application/json" -X POST -d '@/path/to/input.json' https://elixir.ut.ee/pub2tools/api

The same input can be broken into two steps: first the Pub2Tools algorithm is run with "withoutmap" and then the EDAMmap algorithm is run with "map". This breaking into two steps can be useful, because both steps take time and this enables feedback already after the first step has concluded. Also, this enables manual editing of the Pub2Tools result that is fed into EDAMmap. The first step of "withoutmap" is then:

{
  "step": "withoutmap",
  "publicationIds": "\t\t10.1093/nar/gkad347",
  "webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs"
}

And then the output "tool" from the first step (after "description" is manually edited) can be copied into the second step of "map" as:

{
  "step": "map",
  "tool" : {
    "name" : "g:Profiler",
    "description" : "a web server for functional enrichment analysis and conversions of gene lists",
    "homepage" : "https://biit.cs.ut.ee/gprofiler",
    "documentation" : [ {
      "url" : "https://biit.cs.ut.ee/gprofiler/page/docs",
      "type" : [ "User manual" ]
    } ],
    "publication" : [ {
      "doi" : "10.1093/NAR/GKAD347",
      "pmid" : "37144459",
      "pmcid" : "PMC10320099"
    } ],
    "credit" : [ {
      "name" : "Hedi Peterson",
      "email" : "hedi.peterson@ut.ee",
      "orcidid" : "https://orcid.org/0000-0001-9951-5116",
      "typeEntity" : "Person"
    } ],
    "confidence_flag" : "high"
  },
  "branches": [ "topic", "operation", "data", "format" ],
  "matches": 6
}

Prefetching

See prefetching in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).

Error handling

See error handling in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).

Pub2Tools-Server

The Pub2Tools-Server application will run both the Pub2Tools API and a web application that functions as a frontend for the API.

All command-line arguments suppliable to a Pub2Tools server can be seen with:

$ java -jar pub2tools-server-<version>.jar -h

In addition to Processing and Fetching private parameters, Pub2Tools-Server accepts arguments described in the following table (entries marked with * are mandatory).

Parameter

Parameter args

Default

Description

--biotools *

<file path>

Path of the bio.tools existing content file in JSON format; will be automatically fetched and periodically updated

--edam or -e *

<file path>

Path of the EDAM ontology file

--baseUri or -b

<string>

http://localhost:8080

URI where the server will be deployed (as schema://host:port)

--path or -p

<string>

/pub2tools

Path where the server will be deployed (only one single path segment supported, prepend with ‘/’)

--httpsProxy

Use if we are behind a HTTPS proxy

--files or -f *

<directory path>

A directory where the results will be output. It must also contain required CSS, JavaScript and font resources. Will be created, if missing.

--fetchingThreads

<positive integer>

8

How many threads to create (maximum) for fetching individual database entries of one query

The results directory with required CSS, JavaScript and font resources will be automatically created, if a nonexistent directory path is supplied. Likewise, if --db is used to specify a nonexistent file, an initial empty database for storing fetched webpages, docs and publications is automatically created. And if a nonexistent file is specified using --biotools, the file is created and the entire content of bio.tools is downloaded to it. In any case, the file specified by --biotools is replaced with the up-to-date entire content of bio.tools every 23 hours.

Pub2Tools-Server can now be run with:

$ java -jar pub2tools-server-<version>.jar -b http://127.0.0.1:8080 -p /pub2tools -e EDAM_1.25.owl -f files --fetching true --db server.db --idf biotools.idf --idfStemmed biotools.stemmed.idf --biotools biotools.json --log serverlogs

The web application can now be accessed locally at http://127.0.0.1:8080/pub2tools and the API is at http://127.0.0.1:8080/pub2tools/api. How to obtain the IDF files biotools.idf and biotools.stemmed.idf is described in the setup section of EDAMmap. In contrast to the command line usage of Pub2Tools, the server will not log to a single log file, but with -l or --log a directory can be defined where log files, that are rotated daily, will be stored. The log directory will also contain daily rotated access logs compatible with Apache’s combined format.

A public instance of Pub2Tools-Server is accessible at https://elixir.ut.ee/pub2tools, with the API at https://elixir.ut.ee/pub2tools/api.