Upcoming Events

July 19-23, 2013: 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB).

April 7-10, 2013: 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB).

April 15-19, 2013: IEEE Symposium on Computational Intelligence in Bioinformatics & Computational Biology.

March 4-6, 2013: 5th international conference on Bioinformatics & Computational Biology (BICoB).

More events....



News     

Oct 18, 2012: A new Postdoctoral Fellow, Dr. Sitanshu Sekhar Sahu has joined our lab.

Aug 13-17, 2012: A comprehensive 1-week Bioinformatics Workshop was organized on campus; co-organized by OSU's iCREST center. Visit facebook page for details.

Apr 23, 2012: Co-hosted Dr. James Tiedje (Director, NSF Center for Microbial Ecology, Michigan State University) as an invited iCREST speaker; see flyer for details.

Apr 13, 2012: World renowned Computational Biologist, Dr. Eugene Koonin (NCBI) visited our lab, and delivered an invited lecture on campus as part of iCREST speaker series; see flyer for details. Video on YouTube.

Mar 16, 2012: We welcome Dr. Chris Town (Group leader, Plant Genomics, JCVI) as an invited iCREST speaker; see flyer for details.

Feb 14, 2012: KBL receives new grant from OCAST to develop bioinformatics systems for plant-microbe interaction networks; immediate Postdoc opening available.

Oct 21, 2011: We welcome Dr. Patrick X. Zhao (Head, Bioinformatics Lab, Noble Foundation) as an invited iCREST speaker; see flyer for details.

Sep 17, 2011: Tyler Weirick joins our lab (under iCREST) as a Graduate Research Assistant.

Aug 17, 2011: Robyn Kelley, a new master's student joins our lab as a Graduate Research Assistant.

July 21, 2011: KBL receives OSU funding to establish an iCREST center for Bioinformatics and Computational Biology.

June 08, 2011: KBL welcomes its first student, Kalpana Varala to work as a summer scholar in lab.



 




AP-iNET
Home Submit Help Datasets Team

How To Use AP-iNet

The web tool was developed to provide interaction predictions between host and pathogen (Arabidopsis-Pseudomonas ) proteins in two separate formats: the prediction module and the search modules.

Prediction Modules

These modules are for predicting interactions in submitted sequences. There are two types of prediction models available for each submission type: Support vector machine (SVM) and homology based.

Support Vector Machine (SVM) Prediction

Support vector machine is a type of supervised learning algorithm and has been widely used in bioinformatics for a variety of applications. AP-iNET used the SVMlight implementation through the python wrapper pysvmlight. In AP-iNET, it takes the input sequences provided by the user and derives the feature vectors. Then, the feature vectors are used to predict the interaction by the SVM models developed from the known interaction datasets (See datasets for more details).The various features used in the SVM model are described below:
  1. Amino Acid Compsition (AAC): Each protein is defined by a 20-dimensional feature vector in Euclidean space. The protein corresponds to a point whose co-ordinates are given by the occurrence frequencies of the 20 constituent amino acids. Each pair host-pathogen Protein-protein interaction is represented by a 40 length feature vector by combing their individual amino acid composition.
  2. Dipeptide Composition (DIPEP): This feature counts the occurrence frequencies of each dipeptide (pair of amino acids, Ex:“AA”, “AC”, “AD”...etc).This produces a fixed pattern of length 400 (20x20) for each protein and each pair of host-pathogen is represented by a 800 length feature vector.
  3. Conjoint triad (CT): In this method, first the twenty native amino acids are grouped into seven classes based on their electrostatic and hydrophobic properties such as dipoles and volumes of the side chains [Shen et al.,2007]. The method considers one amino acid and its vicinal amino acids and regards any three contiguous amino acids as a unit called triad. For each protein, the frequency of each triad in the sequence is determined and projected in a vector space by normalizing between 0 and 1. Thus, the protein is represented by a 343-dimension (7x7x7) feature vector. The CT descriptors of the host and pathogen proteins are concatenated and a total 686-dimensional vector constructed to represent each PPI.
  4. Composition-Transition-Distribution (CTD): In this method, three local descriptors, Composition (C), Transition (T) and Distribution (D) are used in combination to construct the feature vector. These descriptors are based on the variation of occurrence of functional groups of amino acids within the primary sequence of protein. Thus, before computing this feature the twenty amino acids are clustered into seven functional groups based on the dipoles and volumes of the side chains [Shen et al.,2007]. The composition descriptor computes the occurrence of each amino acid group along the sequence. Transition represents the percentage frequency with which amino acid in one group is followed by amino acid in another group. The distribution feature reflects the dispersion pattern along the entire sequence by measuring the location of the first, 25%, 50%, 75% and 100% of residues of a given group. Hence, a total of 63 features (7 composition, 21 transition and 35 distribution) are constructed to represent a protein. Finally, the CTD descriptors of the host and pathogen are concatenated to form a 126 feature vector for each protein pair.

Homology Prediction

Homology based models rely on protein sequence similarity to conduct the protein-protein interaction prediction. A protein pair is predicted to interact if an experimentally evidenced interaction exists between their respective homologous proteins in another organism. The homologous interacting pairs are searched against a well curated experimental evidenced interaction database [HPIDB+APINET database]. The HPIDB database (http://www.agbase.msstate.edu/hpi/main.html) is a database of experimental determined interactions between 62 hosts and 529 pathogens with 23735 unique PPIs. The APINET database is a collection of 166 experimentally proved PPIs between Arabidopsis-Pseudomonas from Mukhtar's expeiment (Mukhtar et al. 2011) and several other public databses such as BIND, APID, iRefIndex, IntAct and Uniprot.

Each protein is BLASTed against the set of known PPIs in the HPIDB+APINET database with a user defined E-value cutoff and bitscore value. For each protein pair, the interaction is predicted if their corresponding homologs in the curated database have atleast one interaction.

Host-Pathogen Prediction

The aim is, for a set of host proteins as well as pathogen proteins, to predict the interaction between them. Note: Host and pathogen sequences should be submitted in their designated area as separate files or input in separate text areas. There are two types of prediction avalible: a support vector machine based model and a homology model. The model used for prediciton can be decided by setting the radio button under "Prediction Method" to the desired method.

When the submission finishes a results page will be shown. The results page contains a table in which each row shows the score returned by the SVM prediction for a host-pathogen protein pair and the descision based on the cutoff value supplied. Also, at the top provides links to visualize the interaction network through cytoscape and to download the output as comma separated values (CSV), tab separated values (TSV) files.

Clicking on the cytoscape link will open a cytoscape viewer. In the cytoscape viewer the interactions are represented as lines between colored nodes which represent proteins. Clicking on a node will display additional information about the node in the box below the graph.

Host Interaction Predction

For a set of host proteins, predict their interactions with specific strains of Pseudomonas syringae. Note: Host sequences should be input in their designated text area or file upload.

Pathogen Prediction

For a set of patghogen proteins, predict their interaction with Arabidopsis. Note: Pathogen sequences should be input in their designated text area or file upload.

Keyword Search

This provides the facility that user can provide any specific function (GO term, domain name, etc.) or an accession number of host or pathogen as a query to see its interacting partners with their functional descriptions. The keyword search is a tool for searching the AP-iNET database consiting of all experimentally validated interactions used in AP-iNET as well as the predicitons made by AP-iNET between the strains of Psudomonas and Arabidopsis.

Search queries will return sequence matches by the given query and sequences that interact with the matched sequences.

Blast Search

The "Blast" tab allows users to search the AP-iNET database for interactions between sequences homologous to the query.

The results show the e-value, bitscore between the query and the hit.

The search functions identically to searching an accession number in "Keyword Search" tab.

Inputting Sequences

Please enter the sequnces into the designated areas for host or pathogen! This can directly affect the accuracy of the prediction! You can either copy/type sequences in the box or upload a file. Maximum input is noted in upload area. Please input FASTA formatted sequence only.

FASTA format example:

< Seq1
MASTPGHTIIYEAVCLHNDRTTIPMANASGFFTHPSIPNLRSRIHVPVRVSGSG
>Seq2 optional comment
MASQKRPSQRHGSKYLATASTMDHARHGFLPRHRDTGILDSIGRFFGGDRGAPK
NMYKDSHHPARTAHYGSLPQKSHGRTQDENPVVHFFKNIVTPRTPPPSQGKGRV
KSAHKGFKGVDAQGTLSKIFKLGGRDSRSGSPMARRELVISLIVES

 

Enter your email (optional)

Results may take a few minutes depending on the number of sequences uploaded. A link to a generated page for your results will be sent to your supplied email address. This link will be seen for 7 days only. This is optional. A link to results will also be supplied on a redirected page after sumission.

 

Submit or Clear

When ready hit submit to start the prediction.

 

Interpretion of SVM results

The prediction results show sequence IDs with the SVM score. The score greater than 0.00 are considered positive for being interacting in a host-pathogen network and scores below 0.00 are considered non-interacting.

 

NIMFFAB LOGO