About Pathway Commons
Pathway Commons is a convenient point of access to biological pathway information collected from several public pathway databases supporting BioPAX and PSI-MI formats.
Our data can be accessed through several methods:
- We provide batch downloads of the data in several different formats. You might want to use this option if you are a computational biologist and you need comprehensive biological pathway data for analysis.
- There are several tools that can search, accesses, and display the data in an intuitive form. You might want to consider this option if you are a biologist who wants to explore the pathways.
- We provide powerful search facilities and multiple export options through our Web API. You might want to use this option if you are developing software that needs to search and retrieve pathway information.
Send us your feedback to pc-info *a-t* pathwaycommons.org (subject: "cpath2 feedback").
Feel free to tell us about yourself and how you use this web service. This will not only help us improve (and report to funding organizations) but also possibly advise you, should too many and failed queries happen to be sent in a short time one day. Currently, users are expected to have from one to five active connections from the same IP address at the same time and just a few requests per second (if one exceeds these limits, might the corresponding IP address be automatically blacklisted, and then we'll have to present some reasons to unblock).
Data is freely available, under the license terms of each contributing database.
cPath2 Web Service Description
To query the integrated biological pathway data from this server use the Web application programming interface (API) described below. This page also provides some examples to help you get started.
For most users
Researchers and application developers can use the following web services (the stable API):
For advanced users
Pathway data integrators and developers who build a service or website on top of their own cPath2 instance may also send requests to the following URL paths, which return HTML/plain text or JSON objects (please refer to the developer's documentation, e.g., web controllers source code, and use with care, for all these features are supplementary and may change):
- /idmapping (can map some identifiers to primary UniProt IDs)
- /robots.txt and /favicon.ico
- /resources/* (css and scripts)
- /admin and /admin/** (a work-in-progress Web console...)
- /metadata/* (e.g., /metadata/datasources, /metadata/validations)
- /help (a REST web service that returns a XML/JSON Help object, nested information pages about main web service commands, parameters, BioPAX types and properties; e.g., /help/schema, /help/commands, /help/types, all /help.json, etc.)
- /log and /log/* (server access logs summary and statistics, e.g.,)
Everything else attached to the base web service URL is considered a BioPAX element's local ID and translated to the corresponding GET query. For example, http://www.pathwaycommons.org//pc2pid (as well as http://purl.org/pc2/4/pid) is currently forwarded to http://www.pathwaycommons.org//pc2get?uri=http://purl.org/pc2/4/pid and returns the Provenance object (in BioPAX format). This together with setting up a partial redirect for http://purl.org/pc2/4/ can make most of the URIs in the database resolvable (Linked Data friendly). Normally, client application developers are to use "get?uri=...&uri=..." directly and favor HTTP POST queries instead of sending HTTP GET requests to http://purl.org/pc2/4/* or http://www.pathwaycommons.org//pc2 URLs (which are not guaranteed to always work or return BioPAX; in future versions, there would be HTML instead).
Some of the commands require URIs of existing BioPAX elements (parameters: 'source', 'uri', 'target'). Such URIs are either Identifiers.org standard URLs (of canonical entity references, controlled vocabularies, etc., of participants of interactions and pathways), or URLs that start with current xml:base, http://purl.org/pc2/4/ (e.g., URIs of most Entities and Xrefs). BioPAX elements's URIs are not something to guess or about or hit by chance. For example, despite knowing current URI namespace http://purl.org/pc2/4/ and actual service location http://www.pathwaycommons.org//pc2, one should not normally hit http://www.pathwaycommons.org//pc2foo, http://purl.org/pc2/4/foo, or http://www.pathwaycommons.org//pc2get?uri=http://purl.org/pc2/4/foo unless the corresponding BioPAX individual in fact there exists. Consider using search, top_pathways, and other query results to find objects of interest and extract valid URIs. Alternatively, official gene symbols, SwissProt, RefSeq, Ensembl, NCBI gene/protein identifiers might work as well in place of the full URIs in get and graph queries. As a rule, using full URIs makes a precise query, whereas using identifiers - more exploratory one (which internally performs a simple id-mapping to UniProt and full-text search to discover the URIs for the query).
This command provides a text search using the Lucene query syntax . Indexed fields were selected based on most common searches. Some of these fields are direct BioPAX properties, others are composite relationships. All index fields are (case-sensitive):comment, ecnumber, keyword, name, pathway, term, xrefdb, xrefid, dataSource, and organism. The pathway field maps to all participants of pathways that contain the keyword(s) in any of its text fields. This field is transitive in the sense that participants of all sub-pathways are also returned. Finally, keyword is a transitive aggregate field that includes all searchable keywords of that element and its child elements - e.g. a complex would be returned by a keyword search if one of its members has a match. Keyword is the default field type. All searches can also be filtered by data source and organism. It is also possible to restrict the domain class using the 'type' parameter. This query can be used standalone or to retrieve starting points for graph searches. Search strings are case insensitive unless put inside quotes.
A set of BioPAX individuals that match the string search criteria. By default the results are returned as a XML document that follows the Search Response XML Schema. It is also possible to obtain the results in JSON by appending '.json' to the query URL. The results are paginated (the page size is configured on the server side; current value is returned with every result, as attribute).
- q= [Required] a keyword, name, external identifier, or a Lucene query string.
- page=N [Optional] (N>=0, default is 0), search result page number.
- datasource= [Optional] filter by data source (use names or URIs of pathway data sources or of any existing Provenance object). If multiple data source values are specified, a union of hits from specified sources is returned. For example, datasource=reactome&datasource=pid returns hits associated with Reactome or PID.
- organism= [Optional] organism filter. The organism can be specified either by official name, e.g. "homo sapiens" or by NCBI taxonomy id, e.g. "9606". Similar to data sources, if multiple organisms are declared a union of all hits from specified organisms is returned. For example 'organism=9606&organism=10016' returns results for both human and mice. Note the officially supported species.
- type= [Optional] BioPAX class filter (values)
- A basic text search. This query returns all entities that contain the "Q06609" keyword in XML
- Same query returned in JSON format
- This query returns entities "Q06609" only in the 'xrefid' index field in XML
- Search for Pathways containing "Q06609" (search all fields), return JSON
- Search for ProteinReference entries that contain "brca2" keyword in any indexed field, return only human proteins from NCI Pathway Interaction Database
- Similar to search above, but searches specifically in the "name" field
- This query uses wildcard notation to match any Control interactions that has a word that starts with brca in any of its indexed fields. The results are restricted to human interactions from the Reactome database.
- An example use of pagination -- This query returns the the forth page (page=3) for all elements that has an indexed word that starts with "a"
- This query finds Control interactions that contain the word "binding" but not "transcription" in their indexed fields, explicitly request the first page.
- This query will find all interactions that directly or indirectly participate in a pathway that has a keyword match for "immune" .
- This query will return all Reactome pathways
- This query will list all organisms, including secondary organisms such as pathogens or model organisms listed in the evidence or interaction objects
Summary:This command retrieves full pathway information for a set of elements such as pathway, interaction or physical entity given the RDF IDs. Get commands only retrieve the BioPAX elements that are directly mapped to the ID. Use the "traverse query to traverse BioPAX graph and obtain child/owner elements.
- uri= [Required] valid/existing BioPAX element's URI (RDF ID; for utility classes that were "normalized", such as entity refereneces and controlled vocabularies, it is usually a Idntifiers.org URL. Multiple IDs are allowed per query, for example, 'uri=http://identifiers.org/uniprot/Q06609&uri=http://identifiers.org/uniprot/Q549Z0' See also about MIRIAM and Identifiers.org.
- format= [Optional] output format (values)
Output:By default, a complete BioPAX representation for the record pointed to by the given URI is returned. Other output formats are produced by converting the BioPAX record on demand and can be specified by the optional format parameter. Please be advised that with some output formats it might return "no result found" error if the conversion is not applicable for the BioPAX result. For example, BINARY_SIF output usually works if there are some interactions, complexes, or pathways in the retrieved set and not only physical entities.
- This command returns the BioPAX representation of http://identifiers.org/uniprot/Q06609 (ProteinReference)
- This command returns Xref(s) in BioPAX format found by gene symbol COL5A1 Note: UniProt, RefSeq, NCBI Gene, and Ensemble identifiers ususally work here too if these, or their corresponding primary UniProt accession, match at least one Xref.id BioPAX property value.
Summary:Graph searches are useful for finding connections and neighborhoods of elements. such as the shortest path between two proteins or the neighborhood for a particular protein state or all states. Graph searches take detailed BioPAX semantics such as generics or nested complexes into account and traverse the graph accordingly. The starting points can be either physical entites or entity references. In the case of the latter the graph search starts from ALL the physical entities that belong to that particular entity references, i.e. all of its states. Note that we integrate BioPAX data from multiple databases based on our proteins and small molecules data warehouse and consistently normalize UnificationXref, EntityReference, Provenance, BioSource, and ControlledVocabulary objects when we are absolutely sure that two objects of the same type are equivalent. We, however, do not merge physical entities and reactions from different sources as matching and aligning pathways at that level is still an open research problem. As a result, graph searches can return several similar but disconnected sub-networks that correspond to the pathway data from different providers (though some physical entities often refer to the same small molecule or protein reference or controlled vocabulary).
- kind= [Required] graph query (values)
- source= [Required] source object's URI/ID. Multiple source URIs/IDs are allowed per query, for example 'source=http://identifiers.org/uniprot/Q06609&source=http://identifiers.org/uniprot/Q549Z0'. See a note about MIRIAM and Identifiers.org.
- target= [Required for PATHSFROMTO graph query] target URI/ID. Multiple target URIs are allowed per query; for example 'target=http://identifiers.org/uniprot/Q06609&target=http://identifiers.org/uniprot/Q549Z0'. See a note about MIRIAM and Identifiers.org.
- direction= [Optional, for NEIGHBORHOOD and COMMONSTREAM algorithms] - graph search direction (values).
- limit= [Optional] graph query search distance limit (default = 1).
- format= [Optional] output format (values)
- datasource= [Optional] datasource filter (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:By default, graph queries return a complete BioPAX representation of the subnetwork matched by the algorithm. Other output formats are available as specified by the optional format parameter. Please be advised that some output format choices might cause "no result found" error if the conversion is not applicable for the BioPAX result (e.g., BINARY_SIF output fails if there are no interactions, complexes, nor pathways in the retrieved set).
Query Examples:Neighborhood of COL5A1 (P20908, CO5A1_HUMAN):
- This query finds the BioPAX nearest neighborhood of the protein reference http://identifiers.org/uniprot/P20908, i.e., all reactions where the corresponding protein forms participate; returned in the Simple Interaction Format (SIF)
- This query finds the 1 distance neighborhood of P20908 - starting from the corresponding Xref, finds all reactions that its oners (e.g., a protein reference) and their states (protein forms) participate in, and returns the BioPAX model.
- A similar query using the gene symbol COL5A1 instead of URI or UniProt ID (this also implies internal id-mapping to primary UniProt IDs). Compared with above examples, particularly the first one, a query like this potentially returns a larger subnetwork, for it possibly starts its graph traversing from several unification and relationship Xrefs rather than from the ProteinReference (http://identifiers.org/uniprot/P20908). One can mix: submit URI along with UniProt accession, RefSeq ID, NCBI Gene ID and Ensemble IDs in a single /graph or /get query; other identifiers might also work, by chance (if present in the db).
Summary:This command provides XPath-like access to the PC. With travers users can explicitly state the paths they would like to access. The format of the path query is in the form: [Initial Class]/[property1]:[classRestriction(optional)]/[property2]... A "*" sign after the property instructs path accessor to transitively traverse that property. For example, the following path accessor will traverse through all physical entity components within a complex:
The following will list display names of all participants of interactions, which are components (pathwayComponent) of a pathway (note: pathwayOrder property, where same or other interactions can be reached, is not considered here):
The optional parameter classRestriction allows to restrict/filter the returned property values to a certain subclass of the range of that property. In the first example above, this is used to get only the Unification Xrefs. Path accessors can use all the official BioPAX properties as well as additional derived classes and parameters in paxtools such as inverse parameters and interfaces that represent anonymous union classes in OWL. (See Paxtools documentation for more details).
- uri= [Required] a BioPAX element URI - specified similarly to the 'GET' command above). Multiple IDs are allowed (uri=...&uri=...&uri=...).
- path= [Required] a BioPAX propery path in the form of property1[:type1]/property2[:type2]; see properties, inverse properties, Paxtools, org.biopax.paxtools.controller.PathAccessor.
Output:XML result that follows the Search Response XML Schema (TraverseResponse type; pagination is disabled: returns all values at once)
- This query returns the display name of the organism of the ProteinReference specified by the URI.
- This query returns the URI of the organism for each of the Protein References
- This query returns the names of all states of RAD51 protein (by its ProteinReference URI, using property path="ProteinReference/entityReferenceOf:Protein/name")
- This query returns the URIs of states of BRCA1_HUMAN
- This query returns the names of several different objects (using abstract type 'Named' from Paxtools API)
Summary:This command returns all "top" pathways -- pathways that are neither 'controlled' nor 'pathwayComponent' of another process.
- datasource= [Optional] filter by data source (same as for 'search').
- organism= [Optional] organism filter (same as for 'search').
Output:XML result that follows the Search Response XML Schema (SearchResponse type; pagination is disabled: returns all pathways at once)
Summary:Unambiguously maps, e.g., HGNC gene symbols, NCBI Gene, RefSeq, ENS*, and secondary UniProt identifiers to the primary UniProt accessions, or - ChEBI and PubChem IDs to primary ChEBI. You can mix different standard ID types in one query. NOTE: This is a specific id-mapping (not general-purpose) for reference proteins and small molecules; it was first designed for internal use, such as to improve BioPAX data integration and allow for graph queries accept not only URIs but also standard IDs. The mapping tables were derived exclusively from Swiss-Prot (DR fields) and ChEBI data (manually created tables and other mapping types and sources can be added in the future versions if necessary).
Output:JSON (serialized Map)
Officially supported organisms
Having the above data sources, we chose to integrate all the pathway data files only for the following species:
There are still other organisms associated with some BioPAX elements too, because original pathway data might contain disease pathways, other lab experiment details, use generics (i.e., wildcard proteins), etc. We did not specially clean or convert such data. You can find all organisms by using search for all BioSource objects.
Query Parameter Values
Output Formats ('format'):
See also output formats.
Built-in graph queries ('kind'):
Graph traversal directions ('direction'):
BioPAX classes ('type'):
Click here to show/hide the list
Official BioPAX Properties and Domain/Range Restrictions:
Note: "XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref" means XReferrable.xref, and that, for a ControlledVocabulary.xref, the value can only be of UnificationXref type, etc.
Click here to show/hide the list of properties
Inverse BioPAX Object Properties and Domain/Range Restrictions (useful feature of Paxtools API):
Note: Some of object range BioPAX properties can be traversed in the inverse direction e.g, 'xref' - 'xrefOf'. These are listed below. But, e.g., unlike the normal xref property, the same restriction ("XReferrable xref Xref D:ControlledVocabulary=UnificationXref D:Provenance=UnificationXref,PublicationXref") must read/comprehend differently: it's actually now means Xref.xrefOf, and that RelationshipXref.xrefOf cannot contain a ControlledVocabulary (or its sub-class) values, etc.
Click here to show/hide the list of properties
If an error or no results happens while processing a user's request, the client will receive a HTTP response with error status code and message (then browsers usually display a error page sent by the server; clients normally check the status before further processing the results, if any.)