Upcoming Events

July 19-23, 2013: 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB).

April 7-10, 2013: 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB).

April 15-19, 2013: IEEE Symposium on Computational Intelligence in Bioinformatics & Computational Biology.

March 4-6, 2013: 5th international conference on Bioinformatics & Computational Biology (BICoB).

More events....


Oct 18, 2012: A new Postdoctoral Fellow, Dr. Sitanshu Sekhar Sahu has joined our lab.

Aug 13-17, 2012: A comprehensive 1-week Bioinformatics Workshop was organized on campus; co-organized by OSU's iCREST center. Visit facebook page for details.

Apr 23, 2012: Co-hosted Dr. James Tiedje (Director, NSF Center for Microbial Ecology, Michigan State University) as an invited iCREST speaker; see flyer for details.

Apr 13, 2012: World renowned Computational Biologist, Dr. Eugene Koonin (NCBI) visited our lab, and delivered an invited lecture on campus as part of iCREST speaker series; see flyer for details. Video on YouTube.

Mar 16, 2012: We welcome Dr. Chris Town (Group leader, Plant Genomics, JCVI) as an invited iCREST speaker; see flyer for details.

Feb 14, 2012: KBL receives new grant from OCAST to develop bioinformatics systems for plant-microbe interaction networks; immediate Postdoc opening available.

Oct 21, 2011: We welcome Dr. Patrick X. Zhao (Head, Bioinformatics Lab, Noble Foundation) as an invited iCREST speaker; see flyer for details.

Sep 17, 2011: Tyler Weirick joins our lab (under iCREST) as a Graduate Research Assistant.

Aug 17, 2011: Robyn Kelley, a new master's student joins our lab as a Graduate Research Assistant.

July 21, 2011: KBL receives OSU funding to establish an iCREST center for Bioinformatics and Computational Biology.

June 08, 2011: KBL welcomes its first student, Kalpana Varala to work as a summer scholar in lab.


Home Submit Help Datasets Team

Datasets used for model development for Arabidopsis thaliana and Pseudomonas syringae system

The prediction capability of SVM model depends on the quality of positive and negative control datasets used for the study. For training and testing, the following datasets were used:

Positive Set:

We collected well curated experimentally evidenced interactions to use as a positive control for the support vector machine models. The largest dataset for Arabidopsis-Pseudomonas interaction available to date was the experiment done by Mukhtar et al., 2011 which contains 153 PPIs. Again we collected 13 PPIs from HPIDB database and another 21 from various databases such as BIND, DIP, MINT, and iRefIndex. Thus,a total of 187 experimental PPIs are collected and after removing duplicate pairs, 166 unique pairs were used as the positive dataset. Then to reduce redundancy pairs from the dataset we ran CDHIT at 40% cutoff on the dataset and the resulting 34 PPIs were kept as training dataset. The remaining 132 PPIs are used as an independent test set.

Negative Set:

We collected a set of keywords related to intraspecies and interspecies interaction by carefully searching the literature (Leucine, coiled, resistance, kinase, binding, disease, defense, defensin, interaction, receptor etc.). We searched both intra and inter species keywords in the sequence annotation of the whole proteome of Arabidopsis (35386 proteins) collected from TAIR and 10048 proteins are selected potential candidates for interaction. Then we searched swissprot knowledge base for Arabidopsis with the interspecies keywords and 13832 proteins were collected as positive hits. Also we added the Arabidopsis proteins from the positive dataset mentioned above as interacting candidates with Pseudomonas. After these processing steps, 21458 unique Arabidopsis proteins were collected as potential positive candidates for interaction. Again to prepare a better negative control dataset, we extract the homologoues of these positive candidates in the remaining 13928 proteins by BLAST with E-value 10-4. Thus, removing these positive like candidates, the remaining 5955 proteins were considered as negative control dataset.

Since the proteins localized in the cytoplasm of bacteria may not be involved in interaction, all the proteins of Pseudomonas (all three pathovars tomato DC3000,phaseolicola and syringae) are processed through the Psortb3.0 [a widely used tool for protein localization in bacteria (www.psort.org/psortb)] and those predicted as cytoplasmic or cytoplasmic membrane are considered as negative candidates. The other proteins are considered to be positive candidates. Again, we searched the whole proteome of all three pathovars of Pseudomonas through the effector database (http://www.effectors.org/), which is an integrated database for secreted type proteins for bacteria. Those identified as secreted are considered as positive candidates for interaction. Combining these two steps, a positive dataset was constructed and the remainder were considered as negatives. This positive dataset was BLASTED against the negative to remove their homologous proteins from the negative dataset. Then we searched the negative candidate proteins of Pseudomonas with the keywords related to interaction (see mentioned above) and the hits are removed from the negative dataset. After processing these steps, the remaining proteins were considered as negative control. Finally, a total of 3383 Pseudomonas proteins were constructed as the negative dataset.

Note: The Positive & Negative Datasets can be provided upon request to authors.