The Web Service API
Pathway Commons integrates a number of pathway and molecular interaction databases supporting BioPAX and PSI-MI formats into one large BioPAX model, which can be queried using our web API (documented below). This API can be used by computational biologists to download custom subsets of Pathway Commons for analysis, or can be used to incorporate powerful biological pathway and network information retrieval and query functionality into websites, scripts and software. For computational biologists looking for comprehensive biological pathway data for analysis, we also make available batch downloads of the data in several formats. Feel free to send a feedback to pc-info *a-t* pathwaycommons.org and tell us more about yourself, how you use the web service and any feature requests and bug reports. We request that you limit access to up to five concurrent connections from one IP address and a few requests per second to avoid slowing down the service for other users. Overloading the server repeatedly could lead to IP address blocking. If this is an issue, let us know, as we will work to add capacity based on demand. For more Pathway Commons features, including links to data archives and user friendly tools for biologists, please visit our homepage at pathwaycommons.org.
Metadata, views, etc.
There are a number of "undocumented" URLs (subject to change without notice) providing metadata, files, scripts and images for creating and maintaining this website. Nevertheless, advanced users may find the following examples useful:
- /idmapping - can map selected biological identifiers to UniProt or ChEBI primary IDs (one way);
- /help/ - returns a tree of Help objects describing the main commands, parameters, BioPAX types, and properties, e.g., /help/schema, /help/commands, /help/types;
- /log/ - service access summary, e.g., /log/TOTAL/geography/world, /log/timeline;
- /[rdf:ID] - every BioPAX object's URI in this resource is a resolvable URL, because current XML base: http://purl.org/pc2/6/ redirects to the web service base URL: http://www.pathwaycommons.org//pc2, and, e.g., http://purl.org/pc2/6/pid URL is by design equivalent to http://www.pathwaycommons.org//pc2get?uri=http://purl.org/pc2/6/pid query (gets the BioPAX RDF/XML representation of the Provenance object).
Fore more information, please contact us.
Parameters: 'source', 'uri', and 'target' require URIs of existing BioPAX elements, which are either standard Identifiers.org URLs (for most canonical biological entities and controlled vocabularies), or Pathway Commons generated http://purl.org/pc2/6/<localID> URLs (for most BioPAX Entities and Xrefs). BioPAX object URIs used by this service are not easy to guess, thus should be discovered using web service commands, such as search, top_pathways, and other queries (i.e., get some objects of interest first). For example, despite knowing current URI namespace http://purl.org/pc2/6/ and actual service location http://www.pathwaycommons.org//pc2, one should not normally hit http://www.pathwaycommons.org//pc2foo, http://purl.org/pc2/6/foo, or http://www.pathwaycommons.org//pc2get?uri=http://purl.org/pc2/6/foo unless the corresponding BioPAX individual exists. However, HUGO gene symbols, SwissProt, RefSeq, Ensembl, and NCBI Gene (positive integer) ID; and ChEBI, ChEMBL, KEGG Compound, DrugBank, PharmGKB Drug, PubChem Compound or Substance (ID must be prefixed with 'CID:' or 'SID:' to distinguish from each other and NCBI Gene), are also acceptable in place of full URIs in get and graph queries. As a rule of thumb, using full URIs makes a precise query, whereas using identifiers makes a more exploratory one (which performs identifier mapping to UniProt and subsequent searches for the Xref's URIs).
Normally, instead of submitting a typically complex URL query via a browser address line, one should find or develop a convenient bioinformatic application, such as Cytoscape, PCViz, ChIBE, or script that uses the web API and a standard client-side software library. Nevertheless, this page includes web links one can simply click to submit an example query and view results. This works because examples are simple queries, and parameters, such as a long URI or Lucene query string, were properly (manually) URL-encoded. We also recommend using HTTP POST method instead of GET (to avoid errors at the browser or web server layers with e.g. caching, encoding, too long URL). Finally, URIs are case-sensitive and contain no spaces.
A full-text search in this BioPAX database using the Lucene query syntax. Index fields (case-sensitive): comment, ecnumber, keyword, name, pathway, term, xrefdb, xrefid, dataSource, organism (some of these are BioPAX properties, while others are composite relationships), can be optionally used in a query string. For example, the pathway index field helps find pathway participants by keywords that match their parent pathway names or identifiers; xrefid finds objects by matching its direct or 'attached to a child element' Xrefs; keyword, the default search field, is a large aggregate that includes all BioPAX properties of an element and nested elements' properties (e.g. a Complex can be found by one of its member's name or EC Number). Search results can be filtered by data provider (datasource parameter), organism, and instantiable BioPAX class (type). Search can be used to select starting points for graph traversal queries (with '/graph', '/traverse', '/get' commands). Search strings are case insensitive unless put inside quotes.
The specified or first page of the ordered list of BioPAX individuals that match the search criteria (the results page size is configured on the server and returned with every result, as an attribute). The results (hits) are returned as Search Response XML Schema instance (XML document). JSON format can be requested by ending the query with ‘.json’ (e.g. '/search.json') or setting HTTP request header 'Accept: application/json' (how - depends on one's client-side API).
- q= [Required] a keyword, name, external identifier, or a Lucene query string.
- page=N [Optional] (N>=0, default is 0). Search results are paginated to avoid overloading the search response. This sets the search result page number.
- datasource= [Optional] filter by data source (use names or URIs of pathway data sources or of any existing Provenance object). If multiple data source values are specified, a union of hits from specified sources is returned. For example, datasource=reactome&datasource=pid returns hits associated with Reactome or PID.
- organism= [Optional] organism filter. The organism can be specified either by official name, e.g. "homo sapiens" or by NCBI taxonomy identifier, e.g. "9606". Similar to data sources, if multiple organisms are declared, a union of all hits from specified organisms is returned. For example 'organism=9606&organism=10016' returns results for both human and mouse. Note the officially supported species.
- type= [Optional] BioPAX class filter (values)
- A basic text search. This query returns all entities that contain the "Q06609" keyword in XML
- Same query returned in JSON format
- This query returns entities "Q06609" only in the 'xrefid' index field in XML
- Search for Pathways containing "Q06609" (search all fields), return JSON
- Search for ProteinReference entries that contain "brca2" keyword in any indexed field, return only human proteins from NCI Pathway Interaction Database
- Similar to search above, but searches specifically in the "name" field
- This query uses wildcard notation to match any Control interactions that has a word that starts with "brc" in any of its indexed fields. The results are restricted to human interactions from the Reactome database.
- An example use of pagination. This query returns the the fourth page (page=3) for all elements that have an indexed word that starts with "a"
- This query finds Control interactions that contain the word "binding" but not "transcription" in their indexed fields, explicitly request the first page.
- This query finds all interactions that directly or indirectly participate in a pathway that has a keyword match for "immune" .
- This query returns all Reactome pathways
- This query lists all organisms, including secondary organisms such as pathogens or model organisms listed in the evidence or interaction objects
Retrieves an object model for one or several BioPAX elements, such as pathway, interaction or physical entity, given their URIs. Get commands only retrieve the specified and all the child BioPAX elements (one can use the traverse query to obtain parent elements).
- uri= [Required] valid/existing BioPAX element's URI (RDF ID; for utility classes that were "normalized", such as entity refereneces and controlled vocabularies, it is usually an Identifiers.org URL. Multiple identifierss are allowed per query, for example, 'uri=http://identifiers.org/uniprot/Q06609&uri=http://identifiers.org/uniprot/Q549Z0' See also note about Identifiers.org.
- format= [Optional] output format (values)
Output:BioPAX (default) representation for the record(s) pointed to by the given URI(s) is returned. Other output formats are produced on demand by converting from the BioPAX and can be specified using the optional format parameter. Please be advised that with some output formats it might return a "no result found" error if the conversion is not applicable to the particular BioPAX result. For example, BINARY_SIF output is only possible if there are some interactions, complexes, or pathways in the retrieved set.
- This command returns the BioPAX representation of http://identifiers.org/uniprot/Q06609 (a ProteinReference object)
- Get by HUGO gene symbol COL5A1 - returns the xrefs in BioPAX format. Note: unlike the first example, this is in fact a two-step query, which internally performs id-mapping and then gets the COL5A1 and P20908 xrefs by revealed absolute URIs. A query like this can be a quick test before submitting an ID to a much slower '/graph' query: if '/get' returns no result, then the ID won't contribute to any graph query result either (despite there might exist BioPAX entities with the ID being part of their names or comments; which can be found by '/search' command).
- Get the Signaling by BMP Pathway (REACT_12034.2, format: BioPAX, source: Reactome)
Graph searches are useful for finding connections and neighborhoods of elements, such as the shortest path between two proteins or the neighborhood for a particular protein state or all states. Graph searches consider detailed BioPAX semantics, such as generics or nested complexes, and traverse the graph accordingly. The starting points can be either physical entites, entity references, or xrefs. In the latter two cases, the graph search starts from ALL the physical entities that belong to that particular canonical reference, i.e. from all the molecular states. Note that we integrate BioPAX data from multiple databases based on our protein and small molecule data warehouse and consistently normalize UnificationXref, EntityReference, Provenance, BioSource, and ControlledVocabulary objects when we are absolutely sure that two objects of the same type are equivalent. We, however, do not merge physical entities and reactions from different sources, as accurately matching and aligning pathways at that level is still an open research problem. As a result, graph searches can return several similar but disconnected sub-networks that correspond to the pathway data from different providers (though some physical entities often refer to the same small molecule or protein reference or controlled vocabulary).
- kind= [Required] graph query (values)
- source= [Required] source object's URI/ID. Multiple source URIs/IDs are allowed per query, for example 'source=http://identifiers.org/uniprot/Q06609&source=http://identifiers.org/uniprot/Q549Z0'. See note about URIs.
- target= [Required for PATHSFROMTO graph query] target URI/ID. Multiple target URIs are allowed per query; for example 'target=http://identifiers.org/uniprot/Q06609&target=http://identifiers.org/uniprot/Q549Z0'. See note about URIs.
- direction= [Optional, for NEIGHBORHOOD and COMMONSTREAM algorithms] - graph search direction (values).
- limit= [Optional] graph query search distance limit (default = 1).
- format= [Optional] output format (values)
- datasource= [Optional] datasource filter (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:By default, graph queries return a complete BioPAX representation of the subnetwork matched by the algorithm. Other output formats are available as specified by the optional format parameter. Please be advised that some output format choices might cause a "no result found" error if the conversion is not applicable for the BioPAX result (e.g., BINARY_SIF output fails if there are no interactions, complexes, nor pathways in the retrieved set).
Examples:Neighborhood of COL5A1 (P20908, CO5A1_HUMAN):
- This query finds the BioPAX nearest neighborhood of the protein reference http://identifiers.org/uniprot/P20908, i.e., all reactions where the corresponding protein forms participate; returned in the Simple Interaction Format (SIF)
- This query finds the 1 distance neighborhood of P20908 - starting from the corresponding Xref, finds all reactions that its owners (e.g., a protein reference) and their states (protein forms) participate in, and returns the BioPAX model.
- A similar query using the gene symbol COL5A1 instead of URI or UniProt ID (this also implies internal id-mapping to primary UniProt IDs). Compared with above examples, particularly the first one, a query like this potentially returns a larger subnetwork, as it possibly starts its graph traversing from several unification and relationship Xrefs rather than from the specified single ProteinReference (http://identifiers.org/uniprot/P20908). One can mix: submit URI along with, e.g., UniProt, RefSeq, NCBI Gene, and Ensemble IDs in a single /graph or /get query; other identifiers may also work, by chance (if present in the database), but are not generally supported. See: about URIs and id-mapping.
Provides XPath-like access to this BioPAX database. With '/traverse', users can explicitly state the paths they would like to access. The format of the path parameter value: [Initial Class]/[property1]:[classRestriction(optional)]/[property2]... A "*" sign after the property instructs the path accessor to transitively traverse that property. For example, the following path accessor will traverse through all physical entity components within a complex: Complex/component*/entityReference/xref:UnificationXref. The following will list the display names of all participants of interactions, which are pathway components of a pathway: Pathway/pathwayComponent:Interaction/participant*/displayName. Optional classRestriction allows to limit the returned property values to a certain subclass of the property's range. In the first example above, this is used to get only the unification xrefs. Path accessors can use all the official BioPAX properties as well as additional derived classes and parameters, such as inverse parameters and interfaces that represent anonymous union classes in OWL. (See Paxtools documentation for more details).
- uri= [Required] a BioPAX element URI - specified similarly to the 'GET' command above). Multiple IDs are allowed (uri=...&uri=...&uri=...).
- path= [Required] a BioPAX property path in the form of property1[:type1]/property2[:type2]; see properties, inverse properties, Paxtools, org.biopax.paxtools.controller.PathAccessor.
Output:XML result according to the Search Response XML Schema (TraverseResponse type; pagination is disabled to return all values at once)
- This query returns the display name of the organism of the ProteinReference specified by the URI.
- This query returns the URI of the organism for each of the Protein References
- This query returns the names of all states of RAD51 protein (by its ProteinReference URI, using property path="ProteinReference/entityReferenceOf:Protein/name")
- This query returns the URIs of states of BRCA1_HUMAN
- This query returns the names of several different objects (using abstract type 'Named' from Paxtools API)
Returns all "top" pathways - pathways that are neither 'controlled' nor a 'pathwayComponent' of another biological process.
- datasource= [Optional] filter by data source (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:XML document described by Search Response XML Schema (SearchResponse type; pagination is disabled to return all top pathways at once)
Maps bioentity identifiers to corresponding primary UniProt or ChEBI accessions. Currently supported are: HUGO gene symbols, UniProt (SwissProt AC and ID), RefSeq, Ensembl, NCBI Gene identifiers; and ChEBI, ChEMBL, KEGG Compound, DrugBank, PharmGKB Drug, PubChem (PubChem ID must be prefixed with either 'CID:' or 'SID:' to distinguish from each other and NCBI Gene ID). You can mix different standard ID types in one query. This is NOT an all-purpose id-mapping system. It's to map to canonical reference proteins and small molecules that may exist in this database; it was originally designed to improve BioPAX data integration and allow graph queries to accept not only URIs but also selected IDs. The mapping table is derived from Swiss-Prot (DR fields) and ChEBI (OBO) data, and custom mapping files (e.g., based on UniChem).
Output:Simple JSON format.
Officially supported organisms
We intend to integrate pathway data only for the following species:
Additional organisms may be pulled in due to interactions with entities from any of the above organisms, but are not otherwise supported. This means that we don’t comprehensively collect information for unsupported organisms and we have not cleaned or converted such data due to the high risk of introducing errors and artifacts. All BioSource objects can be found by using this search query.
Output Format ('format'):
For detailed descriptions of these formats, see output format description.
Graph Type ('kind'):
Graph Directions ('direction'):
BioPAX Properties and Restrictions:
Listed below are BioPAX properties' summary as defined in the Paxtools model: domain, property name, range and restrictions (if any). For example, XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref means that values of ControlledVocabulary.xref can only be of UnificationXref type.
Click here to show/hide the list of properties
Inverse BioPAX Object Properties (a feature of the Paxtools library):
Some of the BioPAX object properties can be traversed in the inverse direction, e.g, 'xref' - 'xrefOf'. Unlike for the standard xref property, e.g., the restriction XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref below must be read right-to-left as it is actually about Xref.xrefOf: RelationshipXref.xrefOf cannot contain neither ControlledVocabulary (any sub-class) nor Provenance objects (in other words, vocabularies and provenance may not have any relationship xrefs).
Click here to show/hide the list of properties
If an error or no results happens while processing a user's request, the client will receive a standard HTTP error response with а corresponding status code (not 200 OK) and message (browsers usually display an error page; web clients should normally check the status before processing the results). In addition to general use of standard HTTP errors, the following four important error responses, by design, are:
- 452 - Bad Request (illegal or no arguments).
- 460 - No Results (e.g., when a search or graph query found no data).
- 500 - Internal Server Error (usually a java exception).
- 503 - Server is temporarily unavailable due to regular maintenance.