Research
The Dahlquist Lab performs four distinct, but related research projects. The common thread amongst these projects is that they are interdisciplinary, employing the techniques of bioinformatics, biomathematics, and genomics and the perspective of systems biology. All involve teams of undergraduate students that iteratively feed data, analyses, and code to each other, as well as interdisciplinary collaborations with other faculty at LMU, Dr. John David N. Dionisio (Department of Electrical Engineering and Computer Science) and Dr. Ben G. Fitzpatrick (Department of Mathematics). These research projects have also been brought into the classroom in such courses as Biology/Computer Science 367: Biological Databases, Biology 368: Bioinformatics Laboratory, Biology/Mathematics 388: Biomathematical Modeling, and Biology 478: Molecular Biology of the Genome. For lab protocols see the Dahlquist Lab wiki at OpenWetWare.org.
Determining the gene regulatory network controlling the global transcriptional response of budding yeast, Saccharomyces cerevisiae, to cold shock and recovery using DNA microarrays
The complete sequencing of the human genome and those of other major model organisms, along with the invention of high-throughput methods to measure gene expression has propelled biology into the genomics era. Where once biologists could only study genes one at a time, DNA microarrays now generate data routinely for thousands of genes in a single experiment. My research harnesses the power of DNA microarray and other types of genomic data to elucidate the systems level properties of gene regulatory networks in the budding yeast, Saccharomyces cerevisiae. Yeast responds to environmental stresses through characteristic programs of gene expression. The transcriptional response to heat shock and a variety of other stressors such as changes in nutrient availability, osmolarity, and oxidative stress has been extensively characterized. Yeast responds to heat shock through the induction of heat shock proteins. In addition, a set of genes that are induced or repressed due to many different stressors have been termed the environmental stress response (ESR) genes (Gasch et al. 2000 Mol Biol Cell 11:4241; Causton et al. 2001 Mol Biol Cell 12:323). The general environmental stress response also involves the production of heat shock proteins that are universally conserved across all organisms and that have been very well characterized. However, the response to cold shock has been less well studied and has not revealed a similar set of universally conserved cold shock proteins (Thieringer et al. 1998 BioEssays 20:49; Al-Fageeh & Smales 2006 Biochemical J 397:247; Aguilera et al. 2007 FEMS Microbiol Rev 31:327).
Previous studies on the transcriptional response of budding yeast to cold shock have revealed that the response can be divided into a set of early response genes (after 15 minutes to 2 hours of cold temperatures) and late response genes (after 12 to 60 hours of cold temperatures) (Sahara et al. 2002 J Biol Chem 277:50015; Schade et al. 2004 Mol Biol Cell 15:5492). The late response genes include the ESR genes induced by many environmental stresses and are regulated by the Msn2/Msn4 transcription factors (Schade et al. 2004). However, the transcription factors responsible for the induction of the early response genes and the overall regulatory mechanism governing this early response remain largely unknown. Furthermore, there is ample evidence to suggest that environmental stress response pathways overlap, as is seen by the induction of the same set of ESR genes under multiple stress conditions (Gasch et al. 2000; Causton et al. 2001). Finally, for example, DNA microarray experiments comparing gene expression changes when the Leu3 transcription factor was deleted or overexpressed has revealed that many genes that are not direct targets of that factor were affected in the experiment due to indirect effects (Tang et al. 2006 BMC Genomics 7:1). These indirect effects are most likely due to regulatory relationships between transcription factors. Thus, my lab is pursuing three main questions:
- Which transcription factors control the early response to cold shock in S. cerevisiae?
- Which part of the early transcriptional response to cold shock is due to indirect effects of other transcription factors in the gene regulatory network?
- What are the dynamics of the gene regulatory network controlling this response?
To approach these questions, we need to complement high-throughput genomic data with the tools of mathematical biology and the perspective of systems biology.
The Dahlquist Lab has had a longstanding project to measure the global transcriptional response to the environmental stress of cold shock and subsequent recovery in S. cerevisiae using DNA microarrays. DNA microarrays measure the mRNA levels of all 6000 genes in yeast simultaneously, giving a snapshot of all transcriptional activity in the cells at one time. Wild type yeast cells (BY4741) were grown to early log phase at 30°C, then shifted to 13°C for 60 minutes, and then shifted back to 30°C for another 60 minutes. Samples were collected before cold shock (t0), after 15, 30, and 60 minutes of cold shock (t15, t30, and t60), and after 30 and 60 minutes (t90, t120) of recovery at 30°C. Yeast samples were collected for four independent replicates of the time course experiment. Total RNA was purified from each of the samples. The mRNA was amplified, labeled with Cy3 and Cy5 dyes by the indirect method, and hybridized to DNA microarrays according to the manufacturers’ protocols provided with the reagents. DNA microarrays were initially obtained through the Genome Consortium for Active Teaching. (See the Dahlquist Lab wiki at OpenWetWare.org for protcols.) Over 1300 genes (22% of the yeast genome) show significant differential expression at least one timepoint with a Benjamini and Hochberg corrected ANOVA p value < 0.05, with the two dominant clusters being increased expression during cold shock followed by decreased expression at recovery or vice versa. Having characterized the wild type response, we then pursued the identification of transcription factors that regulate that response bioinformatically through the analysis of the data and experimentally by the targeted screen of transcription factor deletion strains (obtained from the systematic yeast deletion project) for cold-sensitive phenotypes. We have obtained full microarray datasets for the wild type strain and six strains individually deleted for the Cin5, Gln3, Hap4, Hmo1, Swi4, and Zap1 transcription factors. Students in my my BIOL 478: Molecular Biology of the Genome course contributed to the collection of these data. The wet-lab generated data is then fed into the mathematical modeling project to determine the dynamics of the gene regulatory network described below.
GRNmap: Gene Regulatory Network Modeling and Parameter Estimation (GRNmap Website)
Gene expression is a complex biological process in which cells first transcribe their genes encoded in the DNA into mRNA, which is then translated into proteins. Transcription factors are regulatory proteins which increase or decrease the rate at which a cell transcribes a gene. A gene regulatory network (GRN) consists of a set of transcription factors that regulate the level of expression of genes encoding other transcription factors. The dynamics of a GRN show how gene expression in the network changes over time. In collaboration with Dr. Ben G. Fitzpatrick (LMU Department of Mathematics), we report in Dahlquist et al. (2015) the results of our investigation into the dynamics of a gene regulatory network controlling the cold shock response in budding yeast, Saccharomyces cerevisiae. The medium-scale network, derived from published genome-wide location data (Lee et al. 2002 Science 298:799; Harbison et al. 2004 Nature 431:99), consists of 21 transcription factors that regulate one another through 31 directed edges. The expression levels of the individual transcription factors were modeled using mass balance ordinary differential equations with a sigmoidal production function. Each equation includes a production rate, a degradation rate, weights that denote the magnitude and type of influence of the connected transcription factors (activation or repression), and a threshold of expression. The inverse problem of determining model parameters from observed data is our primary interest. We fit the differential equation model to published microarray data (Schade et al. 2004) using a penalized nonlinear least squares approach. Model predictions fit the experimental data well, within the 95 % confidence interval. Tests of the model using randomized initial guesses and model-generated data also lend confidence to the fit. The results have revealed activation and repression relationships between the transcription factors. Sensitivity analysis indicates that the model is most sensitive to changes in the production rate parameters, weights, and thresholds of Yap1, Rox1, and Yap6, which form a densely connected core in the network. The modeling results newly suggest that Rap1, Fhl1, Msn4, Rph1, and Hsf1 play an important role in regulating the early response to cold shock in yeast. Our results demonstrate that estimation for a large number of parameters can be successfully performed for nonlinear dynamic gene regulatory networks using sparse, noisy microarray data.
We have continued to develop the modeling software, called GRNmap, which is written in MATLAB. We have added new several new features to the model. There is now an option to use a Michaelis-Menten production function as well as the sigmoidal production function originally described in Dahlquist et al. (2015)), the ability to input replicate expression data instead of the means for each timepoint, and the option to include data for experiments in which a transcription factor was deleted from the network. While the model results described in Dahlquist et al. (2015) were based on published data from a different lab, all current analyses are based on data generated by students in the Dahlquist Lab. However, the large number of developers and time span of development led to a code base that was difficult to revise and adjust. To address this, in Fall 2014, we shifted to an open development model, depositing the code in a GitHub repository under the open source BSD license and bringing it into alignment with best practices for software engineering and scientific computing. We refactored the script-based software with global variables into a function-based package that uses an object to carry relevant information from function to function. This modular approach allows for cleaner, less ambiguous code and increased maintainability. We have also implemented a unit-testing framework to ensure the program works as expected. Finally, after the code was refactored and tested, we used the MATLAB compiler to create an executable file that can be run on any Windows machine without the need of a MATLAB license, increasing the accessibility of our program. The code and executable are available under the open source BSD license from the GRNmap website. GRNmap is used in the course BIOL 388: Biomathematical Modeling/MATH 388: Survey of Biomathematics team-taught by Drs. Dahlquist and Fitzpatrick (e.g., see the Spring 2015 course wiki).
GRNsight: a Web Application and Service for Visualizing Models of Small- to Medium-scale Gene Regulatory Networks (GRNsight Website)
In collaboration with Dr. John David N. Dionisio (LMU Department of Electrical Engineering and Computer Science) and Dr. Ben G. Fitzpatrick (LMU Department of Mathematics), in the Dahlquist et al. (2016) PeerJPreprint, we describe GRNsight, an open source web application and service for visualizing small- to medium-scale gene regulatory networks (GRNs). We wanted a quick and easy way to visualize the weight parameters from the GRNmap Website model which represent the direction and magnitude of the influence of a transcription factor on its target gene. GRNsight automatically lays out either an unweighted or weighted network graph based on an Excel spreadsheet containing an adjacency matrix where regulators are named in the columns and target genes in the rows, a Simple Interaction Format (SIF) text file, or a GraphML XML file. When a user uploads an input file specifying an unweighted network, GRNsight automatically lays out the graph using black lines and pointed arrowheads. For a weighted network, GRNsight uses pointed and blunt arrowheads, and colors the edges and adjusts their thicknesses based on the sign (positive for activation or negative for repression) and magnitude of the weight parameter. GRNsight is written in JavaScript, with diagrams facilitated by D3.js, a data visualization library. Node.js and the Express framework handle server-side functions. GRNsight’s diagrams are based on D3.js’s force graph layout algorithm, which was then extensively customized to support the specific needs of GRN visualization. Nodes are rectangular and support gene labels of up to 12 characters. The edges are arcs, which become straight lines when the nodes are close together. Self-regulatory edges are indicated by a loop on the lower-right side of a node. When a user mouses over an edge, the numerical value of the weight parameter is displayed. Visualizations can be modified by sliders that adjust D3.js’s force graph layout parameters and through manual node dragging. GRNsight is best-suited for visualizing networks of fewer than 35 nodes and 70 edges, although it accepts networks of up to 75 nodes or 150 edges. Although originally designed for GRNs, GRNsight has general applicability for displaying any small, unweighted or weighted network with directed edges for systems biology or other application domains.
GRNsight serves as an example of following and teaching best practices for scientific computing and complying with FAIR Principles, using an open and test-driven development model with rigorous documentation of requirements and issues on GitHub. An exhaustive unit testing framework using Mocha and the Chai assertion library consists of around 160 automated unit tests that examine nearly 530 test files to ensure that the program is running as expected. The GRNsight application and code are available under the open source BSD license. GRNsight is used in the course BIOL 388: Biomathematical Modeling/MATH 388: Survey of Biomathematics team-taught by Drs. Dahlquist and Fitzpatrick (e.g., see the Spring 2015 course wiki)
XMLPipeDB: A Reusable, Open Source Tool Chain for Building Relational Databases from XML Sources (XMLPipeDB Website)
GenMAPP (Gene Map Annotator and Pathway Profiler) is a tool for viewing and analyzing DNA microarray and other types of high-throughput data on MAPPs representing biological pathways or other functional groupings of genes. GenMAPP has graphics tools for drawing MAPPs, but also has an underlying Gene Database which allows users to give the genes placed on the MAPP an identifier from a public database. This feature makes it possible for users to import gene expression data into GenMAPP and color-code the genes according to the data. MAPPFinder works with GenMAPP to determine which Gene Ontology terms are over-represented among genes changed in the imported expression data. Although GenMAPP is now considered “legacy” software and is no longer supported (although it is still available on GitHub), the LMU Bioinformatics Group headed by Dr. Dahlquist and Dr. John David N. Dionisio of the Department of Electrical Engineering and Computer Science have extended its life by providing new and updated Gene Databases for use with GenMAPP using the open source XMLPipeDB software suite.
XMLPipeDB is a reusable, open source tool chain for automatically building relational databases from an XML schema (XSD). XML stands for eXtensible Mark-up Language. Bioinformatics data is typically provided in XML format because an accompanying XSD or DTD document gives a complete description of the format of the XML data that is less idiosyncratic than other formats and can be easily read programmatically. While XMLPipeDB is a general-purpose tool that can be used for any type of XML data, we have used it to create and update Gene Databases for GenMAPP. The software uses UniProt as the main data source, is robust to changes in source file formats, uses XML sources wherever possible, takes advantage of existing open source tools, and limits the manual manipulation of the data. The XMLPipeDB software suite includes the individual programs, XSD-to-DB, XMLPipeDB Utilities, and GenMAPP Builder. XSD-to-DB reads an XSD (XML schema) and automatically generates the SQL schema file, Java classes, and Hibernate mappings needed to create a relational database. XMLPipeDB Utilities is a general purpose library for performing simple database functions such as importing XML data into the relational database and running simple queries. GenMAPP Builder is a downstream application that exports the data as a GenMAPP-formatted Gene Database. GenMAPP Builder has been used to create a Gene Databases for several species including Arabidopsis thaliana, Bordetella pertussis, Burkholderia cenocepacia, Chlamydia trachomatis, Escherichia coli, Helicobacter pylori, Leishmania infantum, Leishmania major, Mycobacterium smegmatis, Mycobacterium tuberculosis, Plasmodium falciparum, Pseudomonas aerugenosa, Salmonella typhimurium, Shewanella oneidensis, and Shigella flexneri, Sinorhizobium melliloti, Staphylococcus aureus, Streptococcus pneumoniae, and Vibrio cholerae. XMLPipeDB source code and Gene Databases are available for download from our GitHub site. XMLPipeDB development began in Spring 2006 as a group project in a special studies course in Bioinformatics (CMSI 698/BIOL 498) team-taught by Drs. Dahlquist and Dionisio, described in a publication in the ACM SIGCSE journal and is featured in our cross-listed and team-taught course, BIOL/CMSI 367: Biological Databases course (e.g., see the 2015 course wiki).
Last modified: 8/19/16