Ongoing Projects at IIL:

Project Name: LifeDB

Description: Querying hidden Web sources on the fly for life sciences investigation poses several challenges such as automatic information extraction, schema matching and semantic reconciliation among sources. Though many automatic technologies emerge for the above mentioned problems, the following interesting question remains unexplored "how to integrate those existing technologies automatically and develop a full-fledged autonomous integration system"? In LifeDB, we exploit the flexibility and strengths of declarative language to automate integration of existing technologies. As part of LifeDB system, we have extended relational algebra semantically to get an integration algebra called Integra that readily gives a declarative language known as BioFlow. The choice of Integra and BioFlow strengthen LifeDB in several ways. First, it hides the details of implementation issues from user view, thus brings the integration aspects into language level. Second, if some intelligent system can describe necessary resources and integration steps (also known as pipeline) semantically in BioFlow then fully automatic integration is possible. Third, in absence of such an intelligent system, we are building a visual interface to define the pipeline and make the rest of the integration automatic.

Members: Shazzad Hosain, Shahriyar Hossain, Mohammad Shafkat Amin, Aminul Islam, Anupam Bhattacharjee

Click here to visit LifeDB website.

Project Name: BioFlow

Description: Scientific workflows in Life Sciences are usually complex, and use many online databases, analysis tools, publication repositories and customized computation intensive desktop software in a coherent manner to respond to investigative queries. These investigative queries are generally ad hoc, ill-formed, and often, used only once to test a single hypothesis. In such cases, developing customized workflows becomes a major undertaking, rendering the effort truly expensive, prohibitive and resource intensive. Such high development costs often act as deterrents to many interesting queries and promising on-time scientific discoveries. In this paper, we introduce a new query language that combines workflow features for scientific applications, called BioFlow, that exploits many recent developments in internet communication, databases, wrapper and mediator technologies, ontology, and data integration. BioFlow is a declarative language that abstracts these features to help hide most procedural aspects of mediation, data integration, communication protocols, data extraction and workflow details. We will demonstrate that fairly complex workflows can be effortlessly and declaratively expressed in BioFlow in an ad hoc fashion at minimal costs. We also report a prototype implementation of BioFlow in Windows VB .NET that includes most of its powerful and representative features as proof of feasibility of our proposal.

Members: Shazzad Hosain, Shahriyar Hossain, Mohammad Shafkat Amin, Aminul Islam, Anupam Bhattacharjee, Sharrukh Zaman, Emdad Ahmed

Project Name: PhyQL

Description: Popular phylogenetic databases such as TreeBASE, PhyloFinder, TreeFAM offer complex text-based web forms for structure queries. Still there seems a great need for intelligent visual query formation based on a phylogenetic query language for content exploration. PhyQL offers a visual query design interface where the user can create simple to complex queries based visual query operators. The query language is translated to a list of datalog queries, then executed in XSB, an extension of Prolog. Separating the application layer from the data layer by a logic layer reduces query tools development time. Moreover, PhyQL offers interactive tree visualization which is very convenient for viewing very large trees

Members: Munirul Islam, Shahriyar Hosain

Click here to visit PhyQL website.

Project Name: Computational Approaches to Identify Human Disease Genes 

Description: The overall goal of this project is to develop and use computational approaches to identify and prioritize genes involved in human diseases by defining their complex interaction networks and characterizing how those networks evolve. In this regard, we are assembling information to create a human interactome database in which novel data representation methods capable handling uncertainty will be used. We are also working on developing new algorithms to compute “significance” of specific patterns in the interaction network that Biologists consider interesting. A Cytoscape plug-in has been developed recently to facilitate such a significance assignment to protein-protein interaction data in Fruit Fly. We are also developing a new database computational method to find various types of network topologies related to specific functions – regulatory or signaling cascades.

 

Expectation: At the end, we expect to identify some novel disease genes from the integrated human interactome database combining gene and protein interactions from available experimental data sets with interactions predicted using computational approaches. 

Members: Anupam Bhattacharjee

Project name: Study of disulfide bonds.

Supervisors: Alan Dombkowski

Members: Kazi Zakia Sultana

Click here to visit the SSD interface.

Project name: MiR-AT: A Microarray Analysis Tool.

Supervisors: Alan Dombkowski

Members: Kazi Zakia Sultana

Click here to visit the MiR-AT interface.

Project Name: Computational approaches to identify human disease genes

Description: In order to identify putative disease genes, different salient features of existing protein protein interaction network are being analyzed to predict a relationship between function and network structure. The reliability of the PPI network is enhanced by associating it with a confidence score. Moreover, in order to corroborate the results, several statistical and machine learning approaches are also being employed.

Our goal is to extract from a large list of candidate genes a smaller list that is enriched with disease genes. Each locus may have from one to hundreds of genes. To identify the disease genes among the list of candidates, we perform search in an integrated interactome network to find a subset of the candidate genes that are significantly connected in the network. A group of genes are connected significantly together in the network because they function together in a common pathway. We assume no prior knowledge of any known gene for any particular disease neither do we assume prior knowledge about the possible molecular mechanism for a disease

Expected outcome: We expect to construct an integrated human interactome database and develop a sophisticated computational approach to identify disease genes using interactome networks. We also aim to correlate evolutionary rates of  individual genes and interacting pairs with disease gene candidates.

Members: Mohammad Shafkat Amin

Project Name: Evolution of Labor and Birth

Description: The research I'm involved in is on the study of genes related to mammalian reproduction, and specially the process of placentation and parturition. The current focus is on studies which could eventually reveal if the genes that are expressed in the placental tissue are related to preeclampsia. I am trying to find genes and pseudo genes that are similar to the previously discovered human genes which play a role in nutrition of the fetus (are highly expressed in the placenta) and study the evolution of human genes compared to other placental mammals that could possibly provide a clue into a plausible cause for this disease.

Members: Saied Haidarian

Project Name: Network modeling engine

PI: Alan Dombkowski, Ph.D.

Description: This project aims at identifying proteins significantly involved in cancer pathway. A clustering analysis is done to group genes that are similarly regulated by different     inhibitors. The network modeling engine’s responsibility is to discover the relationships between the clustered genes with the help of interactome databases.

Expected outcome: The engine should be capable of identifying networks of given gene from different point of view.

Members: Sharrukh Zaman

Project Name: Identifying transcription factors

PI:. Marcus Friedrich, Ph.D.

Description: This project aims at finding similar binding sites from sequence in interest based on some known criteria. One of such criteria is using position weight matrix from known     publication to find similar sequence from target sequences.

Expected outcome: Find a method to incorporate own position weight matrix in searching.

Members: Sharrukh Zaman

Project name: Microarray study of PMD genotypes

PI: Alexander Gow, Ph.D. 

Description: We have performed a microarray experiment on two genotypic classes of mice that serves as models of the leukodystrophy, Pelizaeus-Merzbacher Disease (PMD. They are myelin synthesis deficient (msd) and Rumpshaker (rsh) mutants. In both mutants, expression of mutant Plp1 causes metabolic stress in the endoplasmic reticulum as a result of Plp1 misfolding. This stress induces an unfolded protein response (UPR). In case of msd mice the UPR leads to high levels of apoptosis, while apoptosis in rsh is much less and these mice exhibits mild symptoms. One of the key questions we are asking is why there is a high rate of apoptosis in msd compared to rsh. 

Outcome: In the process of answering this question we will also be able to identify the genes which switch UPR into apoptosis. We can use these genes as pharmacological targets to ameliorate disease severity of other UPR related diseases - Alzheimer, Parkinson, Huntington etc.

Members: Shahriyar Hossain

Project name:  Study of transcriptional regulation of CcO4-2

PI: Lawrence Grossman, Ph.D.

Description: We have identified a novel 24-bp promoter region in human CcO4-2 which is simulated by hypoxia. This region is highly conserved in cow, rat, mouse and human. We have also identified 295 mouse genes that contain this region with up to 2 mismatches in either direction. Now, we are trying to analyze the expression patterns of these genes under different levels of oxygen. We performed microarray experiments with Illumina human genome wide chip under 0.5%, 4% and 20% oxygen. We are applying Gene Set Enrichment Tool (http://www.broad.mit.edu/gsea/) to determine if the set of 295 genes shows statistically significant expression between various conditions.

Members: Shahriyar Hossain

Project name:  OCPAT Enhancement by Automatic Data Integration Methods

Supervisors: Hasan Jamil, Ph.D., Derek E Wildman, Ph.D.

Description: One of the major challenging tasks in bioinformatics research is to automatically integrate life science data gathered from different resources. Our aim, in this project, was to show the purpose of using automatic link and combine operation (which is a principal part of automatic data integration) in case of a real life example of life science. We, for this purpose, chose OCPAT, a popular alignment tool devised by Wildman lab at CMMG, to link with another bioinformatics resource ENSEMBL. The target was to use the ENSEMBL gene tree viewer to judge the quality of alignments found by OCPAT. We also developed a web interface to generalize the tool.

Members: Anupam Bhattacharjee, Aminul Islam