API
The Pub2Tools API is consumed by sending a JSON request with HTTP POST. The main endpoint is /api, which on the public instance translates to https://elixir.ut.ee/pub2tools/api.
JSON numbers and booleans are converted to strings internally. JSON objects are ignored (except under bio.tools input), meaning there is no hierarchy in the request JSON structure.
/api
The main endpoint is used for constructing one bio.tools entry candidate based on the given input. It can execute two steps. The key-value pairs in the request JSON fall under two categories: query data and parameters.
Steps
withoutmap
Executes Pub2Tools and outputs tool, which (or parts of which) can be used as input into bio.tools.
The one mandatory input is publicationIds, which contains one or more articles about the tool/database for which a bio.tools entry is to be created. As an optional parameter, the name can also be specified. It is usually not needed, but it will help in the rare case, where the tool/database name in not in the abstract (and thus Pub2Tools is unable to find it), but also when Pub2Tools selects the wrong name or the wrong form of the name (capitalization, etc). The other optional parameter is webpageUrls. This can help, when the wanted links are not present in the publication (abstract or full text), but also when Pub2Tools fails to associate the extracted link with the tool name (and thus does not select the link).
This step is executed by setting step to "withoutmap".
map
Executes EDAMmap on the "tool" output by the previous withoutmap step.
Input to this step can only be specified as bio.tools input under "tool".
This step is executed by setting step to "map".
all
Executes both withoutmap and map in a row.
This step is executed by setting step to "all". This is the default.
Query data
The query data to be mapped can be supplied in two different ways: as strings or arrays of strings under field names name, webpageUrls and publicationIds or as a bio.tools input JSON object (like a bio.tools entry in JSON format). In case data is specified using both ways, only data under the bio.tools input is used.
Pub2Tools input
The following data can be given, with only the "publicationIds" being mandatory.
Key |
Type |
Description |
|---|---|---|
name |
string |
Name of tool or service |
webpageUrls |
array of strings |
URLs of homepage, etc |
publicationIds |
array of strings/objects |
PMID/PMCID/DOI of journal article Note: an article ID can be specified as a string |
bio.tools input
Under the field name "tool", a JSON object adhering to biotoolsSchema can be specified. All values possible in bio.tools can be specified, but only values relevant to Pub2Tools (and EDAMmap in case of the map step) will be used. A few attributes are mandatory: name (tool), description and homepage. In case of steps withoutmap and all also publication group has to be non-empty. In case of the map step, the "tool" input will be mirrored under tool in the response, but with found EDAM terms added to it. In case of withoutmap and all, "tool" will be overwritten by the output of Pub2Tools (plus EDAMmap in case of all) as tool in the response.
Parameters
Main
Parameter |
Default |
Description |
|---|---|---|
version |
|
API version. Currently, only one possible value: |
type |
|
Detail level of the response. Possible values: |
step |
|
The step to execute. Possible values: “withoutmap”, “map”, “all”. |
Preprocessing
See EDAMmap API preprocessing parameters. Influences -pass1, -pass2, -map.
Fetching
The fetching parameters are implemented in PubFetcher and thus are described in its documentation: Fetching parameters. Influences -fetch-pub, -fetch-web, -pass2, -map.
The defaults of the following fetching parameters have been changed in Pub2Tools API: retryLimit from 3 to 0, timeout from 15000 to 7500 and quick from false to true.
Mapping
See EDAMmap API mapping parameters. Influences -map.
Response
The response output can contain more or less information, depending on the specified type and step. The section of most interest is probably tool in core.
core
- success
true(iffalse, then the JSON output of Error handling applies instead of the one below)- version
"1"- type
"core"- api
URL of endpoint where request was sent
- json
Location of JSON results file
- generator
- time
- query
- id
Unique ID assigned to the query (and by extension, to this response)
- name
Name of tool or service (as specified in query data,
nullif not specified)- webpageUrls
Array of strings representing URLs of homepage, etc (as specified in query data,
nullif not specified)- publicationIds
Array of objects representing IDs of journal articles (as specified in query data, mandatory)
- pmid
PMID of article
- pmcid
PMCID of article
- doi
DOI of article
- mapping
-
Only present when step is
"map"or"all" - args
The Parameters
- mainArgs
Main parameters
- edam
Filename of the used EDAM ontology OWL file
- biotools
Filename of the JSON file containing existing bio.tools entries
- processorArgs
- preProcessorArgs
Preprocessing parameters
- fetcherArgs
Fetching parameters (implemented in PubFetcher)
- mapperArgs
Mapping parameters
Only present when step is
"map"or"all"
- tool
The bio.tools entry candidate of the tool. In case of the map step, this will have the same content as in the
"tool"given as input (withnulland empty values removed), but with found EDAM terms added to it. In case of withoutmap and all, this will have the result of Pub2Tools as content (plus EDAM terms in case of all).Concerning EDAM terms, EDAMmap results from the “topic” branch are added to the topic attribute and results from the “operation” branch are added under a new function group object. Results from the “data” and “format” branches should be added under the
"input"and"output"attributes of a function group, however EDAMmap can’t differentiate between inputs and outputs. Thus, new terms from the “data” and “format” branches will be added as strings (in the form"EDAM URI (label)", separated by" | ") to the note of the last function group object.- status
Potentially useful metadata about the result of Pub2Tools, only present when step is
"withoutmap"or"all"- score
- score2
- score2Parts
- include
- existing
- publicationAndNameExisting
- nameExistingSomePublicationDifferent
- somePublicationExistingNameDifferent
- nameExistingPublicationDifferent
- nameMatch
- linkMatch
- nameWordMatch
- homepageBroken
- homepageMissing
- otherNames
- toolsExtra
If Pub2Tools has found that the given publication(s) are about more than one tool, then the names of these extra tools (besides the primary chosen tool) are output here (along with their homepages in parenthesis, if existing)
full
See full in EDAMmap API.
This extra mapping information is only present when step is "map" or "all" and type is set to "full".
Examples
One way to test the API is to send JSON data using curl. For example, for sending the input:
{
"publicationIds": "\t\t10.1093/nar/gkad347"
}
issue the command:
$ curl -H "Content-Type: application/json" -X POST -d '{"publicationIds":"\t\t10.1093/nar/gkad347"}' https://elixir.ut.ee/pub2tools/api
In the output, results can be seen under "tool":
"tool" : {
"name" : "g:Profiler",
...
}
A bit longer input, also supplying a documentation URL that Pub2Tools doesn’t find and asking for a bit more EDAM terms from all branches:
{
"publicationIds": "\t\t10.1093/nar/gkad347",
"webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs",
"branches": [ "topic", "operation", "data", "format" ],
"matches": 6
}
For testing, this input could be saved in a file, e.g. input.json, and then the following command run:
$ curl -H "Content-Type: application/json" -X POST -d '@/path/to/input.json' https://elixir.ut.ee/pub2tools/api
The same input can be broken into two steps: first the Pub2Tools algorithm is run with "withoutmap" and then the EDAMmap algorithm is run with "map". This breaking into two steps can be useful, because both steps take time and this enables feedback already after the first step has concluded. Also, this enables manual editing of the Pub2Tools result that is fed into EDAMmap. The first step of "withoutmap" is then:
{
"step": "withoutmap",
"publicationIds": "\t\t10.1093/nar/gkad347",
"webpageUrls": "https://biit.cs.ut.ee/gprofiler/page/docs"
}
And then the output "tool" from the first step (after "description" is manually edited) can be copied into the second step of "map" as:
{
"step": "map",
"tool" : {
"name" : "g:Profiler",
"description" : "a web server for functional enrichment analysis and conversions of gene lists",
"homepage" : "https://biit.cs.ut.ee/gprofiler",
"documentation" : [ {
"url" : "https://biit.cs.ut.ee/gprofiler/page/docs",
"type" : [ "User manual" ]
} ],
"publication" : [ {
"doi" : "10.1093/NAR/GKAD347",
"pmid" : "37144459",
"pmcid" : "PMC10320099"
} ],
"credit" : [ {
"name" : "Hedi Peterson",
"email" : "hedi.peterson@ut.ee",
"orcidid" : "https://orcid.org/0000-0001-9951-5116",
"typeEntity" : "Person"
} ],
"confidence_flag" : "high"
},
"branches": [ "topic", "operation", "data", "format" ],
"matches": 6
}
Prefetching
See prefetching in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).
Error handling
See error handling in EDAMmap API (and replace the edammap API endpoints with pub2tools ones).
Pub2Tools-Server
The Pub2Tools-Server application will run both the Pub2Tools API and a web application that functions as a frontend for the API.
All command-line arguments suppliable to a Pub2Tools server can be seen with:
$ java -jar pub2tools-server-<version>.jar -h
In addition to Processing and Fetching private parameters, Pub2Tools-Server accepts arguments described in the following table (entries marked with * are mandatory).
Parameter |
Parameter args |
Default |
Description |
|---|---|---|---|
|
<file path> |
Path of the bio.tools existing content file in JSON format; will be automatically fetched and periodically updated |
|
|
<file path> |
Path of the EDAM ontology file |
|
|
<string> |
|
URI where the server will be deployed (as schema://host:port) |
|
<string> |
|
Path where the server will be deployed (only one single path segment supported, prepend with ‘/’) |
|
Use if we are behind a HTTPS proxy |
||
|
<directory path> |
A directory where the results will be output. It must also contain required CSS, JavaScript and font resources. Will be created, if missing. |
|
|
<positive integer> |
|
How many threads to create (maximum) for fetching individual database entries of one query |
The results directory with required CSS, JavaScript and font resources will be automatically created, if a nonexistent directory path is supplied. Likewise, if --db is used to specify a nonexistent file, an initial empty database for storing fetched webpages, docs and publications is automatically created. And if a nonexistent file is specified using --biotools, the file is created and the entire content of bio.tools is downloaded to it. In any case, the file specified by --biotools is replaced with the up-to-date entire content of bio.tools every 23 hours.
Pub2Tools-Server can now be run with:
$ java -jar pub2tools-server-<version>.jar -b http://127.0.0.1:8080 -p /pub2tools -e EDAM_1.25.owl -f files --fetching true --db server.db --idf biotools.idf --idfStemmed biotools.stemmed.idf --biotools biotools.json --log serverlogs
The web application can now be accessed locally at http://127.0.0.1:8080/pub2tools and the API is at http://127.0.0.1:8080/pub2tools/api. How to obtain the IDF files biotools.idf and biotools.stemmed.idf is described in the setup section of EDAMmap. In contrast to the command line usage of Pub2Tools, the server will not log to a single log file, but with -l or --log a directory can be defined where log files, that are rotated daily, will be stored. The log directory will also contain daily rotated access logs compatible with Apache’s combined format.
A public instance of Pub2Tools-Server is accessible at https://elixir.ut.ee/pub2tools, with the API at https://elixir.ut.ee/pub2tools/api.