Pathway Commons integrates a number of pathway and molecular interaction databases supporting BioPAX and PSI-MI formats into one large BioPAX model, which can be queried using our web API (documented below). This API can be used by computational biologists to download custom subsets of pathway data for analysis, or can be used to incorporate powerful biological pathway and network information retrieval and query functionality into websites and software. For computational biologists looking for comprehensive biological pathway data, we also make available data archives in several formats. Try not to exceed ten concurrent connections, several hits per second, from one IP address to avoid being banned. We can add capacity based on demand. For more information and help, please visit our homepage at beta.pathwaycommons.org. Feel free to tell us more about yourself and your project.
The Web API
There are a number of "undocumented" URLs (subject to change without notice) providing metadata, files, scripts and images for creating and maintaining this website. Nevertheless, advanced users may find the following examples useful:
- /help/ - returns a tree of Help objects describing the main commands, parameters, BioPAX types, and properties, e.g., /help/schema, /help/commands, /help/types;
- /log/ - service access summary, e.g., /log/totals, /log/TOTAL/geography/world, /log/timeline;
- /[rdf:ID] - every BioPAX object's URI here is a resolvable URL, because it is either a standard URI, based no Identifiers.org, or it starts with the XML base: http://pathwaycommons.org/pc2/, which redirects to a description page (it's still work in progress), e.g., http://pathwaycommons.org/pc2/pid.
Fore more information, please contact us.
Parameters: 'source', 'uri', and 'target' require URIs of existing BioPAX elements, which are either standard Identifiers.org URLs (for most canonical biological entities and controlled vocabularies), or Pathway Commons generated http://pathwaycommons.org/pc2/<localID> URLs (for most BioPAX Entities and Xrefs). BioPAX object URIs used by this service are not easy to guess; thus, they should be discovered using web service commands, such as search, top_pathways, or from our archive files. For example, despite knowing current URI namespace http://pathwaycommons.org/pc2/ and the service location, one should not guess /foo, http://pathwaycommons.org/pc2/foo, or get?uri=http://pathwaycommons.org/pc2/foo unless the BioPAX individual actually there exists (find existing object URIs of interest first). However, HUGO gene symbols, SwissProt, RefSeq, Ensembl, and NCBI Gene (positive integer) ID; and ChEBI, ChEMBL, KEGG Compound, DrugBank, PharmGKB Drug, PubChem Compound or Substance (ID must be prefixed with 'CID:' or 'SID:' to distinguish from each other and NCBI Gene), are also acceptable in place of full URIs in get and graph queries. As a rule of thumb, using full URIs makes a precise query, whereas using the identifiers makes a more exploratory one, which depends on full-text search (index) and id-mapping.
Normally, instead of submitting a typically complex URL query via a browser address line, one should find or develop a convenient bioinformatic application, such as Cytoscape, PCViz, ChIBE, or script that uses the web API and a standard client-side software library. Nevertheless, this page includes web links one can simply click to submit an example query and view results. This works because examples are simple queries, and parameters, such as a long URI or Lucene query string, were properly (manually) URL-encoded. We also recommend using HTTP POST method instead of GET (to avoid errors at the browser or web server layers with e.g. caching, encoding, too long URL). Finally, URIs are case-sensitive and contain no spaces.
A full-text search in this BioPAX database using the Lucene query syntax. Index fields (case-sensitive): comment, ecnumber, keyword, name, pathway, term, xrefdb, xrefid, dataSource, organism (some of these are BioPAX properties, while others are composite relationships), can be optionally used in a query string. For example, the pathway index field helps find pathway participants by keywords that match their parent pathway names or identifiers; xrefid finds objects by matching its direct or 'attached to a child element' Xrefs; keyword, the default search field, is a large aggregate that includes all BioPAX properties of an element and nested elements' properties (e.g. a Complex can be found by one of its member's name or EC Number). Search results can be filtered by data provider (datasource parameter), organism, and instantiable BioPAX class (type). Search can be used to select starting points for graph traversal queries (with '/graph', '/traverse', '/get' commands). Search strings are case insensitive unless put inside quotes.
The specified or first page of the ordered list of BioPAX individuals that match the search criteria (the results page size is configured on the server and returned with every result, as an attribute). The results (hits) are returned as Search Response XML Schema instance (XML document). JSON format can be requested by ending the query with ‘.json’ (e.g. '/search.json') or setting HTTP request header 'Accept: application/json' (how - depends on one's client-side API).
- q= [Required] a keyword, name, external identifier, or a Lucene query string.
- page=N [Optional] (N>=0, default is 0). Search results are paginated to avoid overloading the search response. This sets the search result page number.
- datasource= [Optional] filter by data source (use names or URIs of pathway data sources or of any existing Provenance object). If multiple data source values are specified, a union of hits from specified sources is returned. For example, datasource=reactome&datasource=pid returns hits associated with Reactome or PID.
- organism= [Optional] organism filter. The organism can be specified either by official name, e.g. "homo sapiens" or by NCBI taxonomy identifier, e.g. "9606". Similar to data sources, if multiple organisms are declared, a union of all hits from specified organisms is returned. For example 'organism=9606&organism=10016' returns results for both human and mouse. Note the officially supported species.
- type= [Optional] BioPAX class filter (values)
- A basic text search. This query returns all entities that contain the "Q06609" keyword in XML
- Same query returned in JSON format
- This query returns entities "Q06609" only in the 'xrefid' index field in XML
- Search for Pathways containing "Q06609" (search all fields), return JSON
- Search for ProteinReference entries that contain "brca2" keyword in any indexed field, return only human proteins from NCI Pathway Interaction Database
- Similar to search above, but searches specifically in the "name" field
- This query uses wildcard notation to match any Control interactions that has a word that starts with "brc" in any of its indexed fields. The results are restricted to human interactions from the Reactome database.
- An example use of pagination. This query returns the the fourth page (page=3) for all elements that have an indexed word that starts with "a"
- This query finds Control interactions that contain the word "binding" but not "transcription" in their indexed fields, explicitly request the first page.
- This query finds all interactions that directly or indirectly participate in a pathway that has a keyword match for "immune" .
- This query returns all Reactome pathways
- This query lists all organisms, including secondary organisms such as pathogens or model organisms listed in the evidence or interaction objects
Retrieves an object model for one or several BioPAX elements, such as pathway, interaction or physical entity, given their URIs. Get commands only retrieve the specified and all the child BioPAX elements (one can use the traverse query to obtain parent elements).
- uri= [Required] valid/existing BioPAX element's URI (RDF ID; for utility classes that were "normalized", such as entity refereneces and controlled vocabularies, it is usually an Identifiers.org URL. Multiple identifierss are allowed per query, for example, 'uri=http://identifiers.org/uniprot/Q06609&uri=http://identifiers.org/uniprot/Q549Z0' See also note about URIs and IDs.
- format= [Optional] output format (values)
Output:BioPAX (default) representation for the record(s) pointed to by the given URI(s) is returned. Other output formats are produced on demand by converting from the BioPAX and can be specified using the optional format parameter. Please be advised that with some output formats it might return a "no result found" error if the conversion is not applicable to the particular BioPAX result. For example, BINARY_SIF output is only possible if there are some interactions, complexes, or pathways in the retrieved set.
- This command returns the BioPAX representation of Q06609 (a ProteinReference's sub-model).
- Gets the JSON-LD representation of Q06609 of the ProteinReference.
- Find/get by HUGO gene symbol COL5A1 - returns BioPAX entities. Note: unlike the first example, it first performs a full-text search for physical entities and genes by using 'xrefid:COL5A1' query, and then gets the COL5A1 (P20908) related BioPAX entities.
- Get the Signaling by BMP Pathway (R-HSA-201451, format: BioPAX, source: Reactome).
Graph searches are useful for finding connections and neighborhoods of elements, such as the shortest path between two proteins or the neighborhood for a particular protein state or all states. Graph searches consider detailed BioPAX semantics, such as generics or nested complexes, and traverse the graph accordingly. The starting points can be either physical entites, entity references, or xrefs. In the latter two cases, the graph search starts from ALL the physical entities that belong to that particular canonical reference, i.e. from all the molecular states. Note that we integrate BioPAX data from multiple databases based on our protein and small molecule data warehouse and consistently normalize UnificationXref, EntityReference, Provenance, BioSource, and ControlledVocabulary objects when we are absolutely sure that two objects of the same type are equivalent. We, however, do not merge physical entities and reactions from different sources, as accurately matching and aligning pathways at that level is still an open research problem. As a result, graph searches can return several similar but disconnected sub-networks that correspond to the pathway data from different providers (though some physical entities often refer to the same small molecule or protein reference or controlled vocabulary).
- kind= [Required] graph query (values)
- source= [Required] source object's URI/ID. Multiple source URIs/IDs are allowed per query, for example 'source=http://identifiers.org/uniprot/Q06609&source=http://identifiers.org/uniprot/Q549Z0'. See note about URIs and IDs.
- target= [Required for PATHSFROMTO graph query] target URI/ID. Multiple target URIs are allowed per query; for example 'target=http://identifiers.org/uniprot/Q06609&target=http://identifiers.org/uniprot/Q549Z0'. See note about URIs and IDs.
- direction= [Optional, for NEIGHBORHOOD and COMMONSTREAM algorithms] - graph search direction (values).
- limit= [Optional] graph query search distance limit (default = 1).
- format= [Optional] output format (values)
- datasource= [Optional] datasource filter (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:By default, graph queries return a complete BioPAX representation of the subnetwork matched by the algorithm. Other output formats are available as specified by the optional format parameter. Please be advised that some output format choices might cause a "no result found" error if the conversion is not applicable for the BioPAX result (e.g., BINARY_SIF output fails if there are no interactions, complexes, nor pathways in the retrieved set).
Examples:Neighborhood of COL5A1 (P20908, CO5A1_HUMAN):
- This query finds the BioPAX nearest neighborhood of the protein reference http://identifiers.org/uniprot/P20908, i.e., all reactions where the corresponding protein forms participate; returned in the Simple Interaction Format (SIF)
- This query finds the 1 distance neighborhood of P20908 - starting from the corresponding Xref, finds all reactions that its owners (e.g., a protein reference) and their states (protein forms) participate in, and returns the BioPAX model.
- A similar query using the gene symbol COL5A1 instead of URI or UniProt ID (this performs internal full-text search / id-mapping). Compared with above examples, particularly the first one, a query like this potentially returns a larger sub-network, as it possibly starts graph traversing from multiple matching entities (seeds) rather than from a single ProteinReference (http://identifiers.org/uniprot/P20908). One can mix: e.g., submit URIs along with UniProt, NCBI Gene, ChEBI IDs in a single /graph or /get query; other identifier types may also work. See: about URIs and IDs.
Provides XPath-like access to this BioPAX database. With '/traverse', users can explicitly state the paths they would like to access. The format of the path parameter value: [Initial Class]/[property1]:[classRestriction(optional)]/[property2]... A "*" sign after the property instructs the path accessor to transitively traverse that property. For example, the following path accessor will traverse through all physical entity components within a complex: Complex/component*/entityReference/xref:UnificationXref. The following will list the display names of all participants of interactions, which are pathway components of a pathway: Pathway/pathwayComponent:Interaction/participant*/displayName. Optional classRestriction allows to limit the returned property values to a certain subclass of the property's range. In the first example above, this is used to get only the unification xrefs. Path accessors can use all the official BioPAX properties as well as additional derived classes and parameters, such as inverse parameters and interfaces that represent anonymous union classes in OWL. (See Paxtools documentation for more details).
- uri= [Required] a BioPAX element URI - specified similarly to the 'GET' command above). Multiple IDs are allowed (uri=...&uri=...&uri=...).
- path= [Required] a BioPAX property path in the form of property1[:type1]/property2[:type2]; see properties, inverse properties, Paxtools, org.biopax.paxtools.controller.PathAccessor.
Output:XML result according to the Search Response XML Schema (TraverseResponse type; pagination is disabled to return all values at once)
- This query returns the display name of the organism of the ProteinReference specified by the URI.
- This query returns the URI of the organism for each of the Protein References
- This query returns the names of all states of RAD51 protein (by its ProteinReference URI, using property path="ProteinReference/entityReferenceOf:Protein/name")
- This query returns the URIs of states of BRCA1_HUMAN
- This query returns the names of several different objects (using abstract type 'Named' from Paxtools API)
Returns all "top" pathways - pathways that are neither 'controlled' nor a 'pathwayComponent' of another biological process, excluding "pathways" having less than three components, none of which being a non-empty sub-pathway.
- q= [Optional] a keyword, name, external identifier, or a Lucene query string, like in 'search', but the default is '*' (match all).
- datasource= [Optional] filter by data source (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:XML document described by Search Response XML Schema (SearchResponse type; pagination is disabled to return all top pathways at once)
Officially supported organisms
We intend to integrate pathway data only for the following species:
Homo sapiens (9606)
Additional organisms may be pulled in due to interactions with entities from any of the above organisms, but are not otherwise supported. This means that we don’t comprehensively collect information for unsupported organisms and we have not cleaned or converted such data due to the high risk of introducing errors and artifacts. All BioSource objects can be found by using this search query.
Output Format ('format'):
For detailed descriptions of these formats, see output format description.
Graph Type ('kind'):
Graph Directions ('direction'):
BioPAX Properties and Restrictions:
Listed below are BioPAX properties' summary as defined in the Paxtools model: domain, property name, range and restrictions (if any). For example, XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref means that values of ControlledVocabulary.xref can only be of UnificationXref type.
Click here to show/hide the list of properties
Inverse BioPAX Object Properties (a feature of the Paxtools library):
Some of the BioPAX object properties can be traversed in the inverse direction, e.g, 'xref' - 'xrefOf'. Unlike for the standard xref property, e.g., the restriction XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref below must be read right-to-left as it is actually about Xref.xrefOf: RelationshipXref.xrefOf cannot contain neither ControlledVocabulary (any sub-class) nor Provenance objects (in other words, vocabularies and provenance may not have any relationship xrefs).
Click here to show/hide the list of properties
If an error or no results happens while processing a user's request, the client will receive a standard HTTP error response with а corresponding status code (not 200 OK) and message (browsers usually display an error page; web clients should normally check the status before processing the results). In addition to general use of standard HTTP errors, the following four important error responses, by design, are:
- 452 - Bad Request (illegal or no arguments).
- 460 - No Results (e.g., when a search or graph query found no data).
- 500 - Internal Server Error (usually a java exception).
- 503 - Server is temporarily unavailable due to regular maintenance.