Development of protein subcellular-location prediction pipeline LocateP v.2.0

Title: Development of protein subcellular-location prediction pipeline LocateP v.2.0

Description

Supervisors: Sacha van Hijum (svhijum@cmbi.ru.nl), Miaomiao Zhou(m.zhou@cmbi.ru.nl) 

Background:Knowledge on the precise subcellular location (SCL) of proteins is especially important to judge the biological nature and role of their activity. Therefore, accurate and detailed SCL predictions on a genome-scale are desired by molecular biology researchers. However, almost all available SCL predictors are imprecise and suffer from erroneous predictions. Therefore, we have constructed a new SCL prediction pipeline: LocateP. It combines a number of bioinformatics tools by mimicking the cellular processes of protein synthesis and protein secretion. Validation and comparison to other tools showed that LocateP v.1.0 is the most accurate and detailed SCL classifier for Gram-positive bacteria that is currently available. An extended version of LocateP that predicts the SCL of Gram-negative bacterial proteins, LocateP v 2.0 has also been developed.  Currently, the SCL predictions of all completed bacterial genomes were pre-calculated by LocateP and the results saved in the corresponding database: LocateP-DB (www.cmbi.ru.nl/locatep-db).  FG-web is a framework has been developed that facilitates converting a bioinformatics work-flow (computer program) in a fully functional web-tool. This includes genomics database storage and retrieval, user-based access, parameter checking, process queuing, etc. It has been developed at the University of Groningen (http://bioinformatics.biol.rug.nl/websoftware) and will be used in this project to convert the LocateP v 2.0 (for Gram-negatives) to a web-tool. A number of published bioinformatics tools are currently implemented in FG-web (e.g., Projector 2 and MINOMICS).   

Objectives:We aim to convert the current version of LocateP pipeline to a web based tool, which will be able to perform real-time prediction using the users-uploaded sequences as the input.

Project:-          understand the protein secretion process in bacteria and get to know the LocateP pipeline (student will learn relevant knowledge on microbiology and get a overview of currently available SCL predictors)-          integrate LocateP work-flow in FG-web, select appropriate parameters, (student will learn/practice computer programming, implement machine learning techniques, database construction)-          Test and debug the web-tool (student will learn/practice computer programming)

 Zhou M, Boekhorst J, Francke C, Siezen RJ: LocateP: genome-scale subcellular-location predictor for bacterial proteins. BMC Bioinformatics 2008, 9:173.

Brouwer RW, van Hijum SA, Kuipers OP: MINOMICS: visualizing prokaryote transcriptomics and proteomics data in a genomic context. Bioinformatics. 2009, 25:139-40.

van Hijum SA, Zomer AL, Kuipers OP, Kok J: Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Res. 2005, 33:W560-6.