“Google Summer of Code (GSoC) is a global program that offers student developers stipends to write code for various open source projects. Google will be working with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together nearly 4,500 students and over 3000 mentors from over 100 countries worldwide, all for the love of code.” GSoC has several goals:
We are once again pooling the efforts of our colleagues and collaborators for this year’s Google Summer of Code. The National Resource for Network Biology (NRNB) is organizing the joint efforts of GenMAPP, Cytoscape, and WikiPathways (see below). This is a great opportunity to work at the intersection of biology and computing.
GenMAPP is a pathway visualization and analysis tool for biological data. GenMAPP illustrates the relationships between various genes and proteins to help researchers understand their data in terms of connected, biological pathways. Tens of thousands of people from a hundred countries have registered to download the GenMAPP program. The GenMAPP group is coordinated by the Conklin Lab at the Gladstone Institutes (UCSF). Our development team is composed of biologists and programmers, providing a unique perspective on building and using open source tools.Links: Website(old), Wiki
Cytoscape is a general network visualization tool that integrates network topology with data about the network into the visualization. Cytoscape is rapidly becoming a systems biology standard. Cytoscape consists of a plugin framework which extends functionality in new ways. Our team consists of programmers and biologists from both academia and industry including: UC San Diego, UC San Francisco, U of Toronto, Agilent, Institute for Systems Biology, Sloan-Kettering, Institut Pasteur and others. Links: Website, Wiki, 2.8.0 Javadocs
WikiPathways is a wiki for biological pathways. The wiki approach allows biologists and domain experts to easily create and update pathways. Pathways can be directly modified from a web browser using an embedded applet where you can draw genes, proteins and their interactions like in any popular drawing tool. The pathways can be used as images for publication and in data analysis tools such as GenMAPP, PathVisio and Cytoscape. WikiPathways itself is completely open source and is built on top of MediaWiki, using PathVisio as the pathway editor and BridgeDb as the backend database. WikiPathways is developed and maintained by BiGCaT Bioinformatics (University of Maastricht) and the Conklin Lab at the Gladstone Institutes (UCSF). Links: Website, Source code
We would like to know who you are and how you think. Incorporate the following into your application:
As we are prototyping new features and functions for GenMAPP, Cytoscape, and WikiPathways we are exploring a number of areas ideal for Google Summer of Code students. These projects include a broad set of skills, technologies and domains, such as Java GUIs, database integration, algorithms and wikis. Of course, you are also encouraged to propose your own ideas related to our projects. If you have solid CS skills and have an interest in the biological domain (do you think genes are cool?), then you should apply!
Feel free to propose your own idea. As long as it relates to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers, but make sure your proposal is also relevant.
Researchers commonly want to share their results on the web in the form of pathways colored with data. This export feature would include a browsable index page with a user-defined list of pathways/networks, all related pathway pages with coloring representing user-defined coloring criteria. A drop-down menu would control which coloring criteria is depicted on the pathway. In addition, a set of backpages per gene would be exported as well, listing all database references relevant for the particular gene. More about idea 2...
The batch html export is inspired by the MAPP Set export in GenMAPP 2.0. An example of a batch html export can be found here.
Language and Skills: java
Idea by: Scooter
Potential Mentors: Kristina, Alex, Scooter
We’ve already made a first pass at developing the GOLayout plugin for Cytoscape. It’s basic function is to partition large network hairball into several small subnetworks, each containing genes/proteins associated with a particular biological process. Within each subnetwork, genes/proteins are laid out by cellular compartments and color coded molecular function. The final layout includes graphical annotations for cellular compartments defined by a template file (GPML file). There is still a lot of work to be done to make this useful and really cool. Some ideas include:
Language and Skills: Java
Idea by: Alex, Allan, Scooter, Kristina
Potential Mentors: Alexander, Allan, Kristina, Scooter
Cytoscape 3 can have multiple implementations for data models (i.e., network data). This means, in addition to the current on-memory graph implementation, we can use graph database systems as its backend. Neo4j is one of the most popular implementation of graph databases written in Java. If Cytoscape can use Neo4j as its network/attribute database, the graph size Cytoscape can handle is significantly larger than current on-memory model. The goal of this project is use Neo4j as an implementation of Cytoscape 3’s graph model and make it usable from ThinkerPop tools. As a possible alternative to Neo4j, I have heard good things about 4store (Alex).
Language and Skills: java, maven, basics of OSGi
Idea by: Kei
Potential Mentors: Kei
When you click on a node you might want to show the flow of reaction information downstream from the click based on directional arrows and arrow types (activation/inhibition). You might also want to highlight all the “transcription factors” downstream of a clicked node. Ultimately, we want to support various heuristics for defining a “shortest” or “most interesting” path. The semantics of the arrows and the functional annotation of the nodes can come from a number of sources that we would want to standardize in this project. There is a line here between exploratory analysis and hardcore simulation. We will want to stay on soft side of this line. This work can make use of existing algorithms for shortest-path, hubs, attribute-weighted paths, etc.
Language and Skills: Java
Idea by: Alex, Allan
Potential Mentors: Alex, Allan
Early architectural discussions on 3.0 included the concept of a Decorated Network. This would be a subclass of CyNetwork that includes graphical annotation. For example, the traditional pathway diagram cannot be easily represented in Cytoscape due to lack of grouping symbols, lines, arrows, etc. In this project we would like a student to implement the user interface and graphical model to display these annotations within Cytoscape 3.0.
Language and Skills: Java
Idea by: Scooter
Potential Mentors: Scooter
See prior work on clusterMaker
See HOPACH publication.
Aim 1 involves migration of the R program HOPACH to JAVA and running this software from an associated Cytoscape plugin with nice graphical interface for user options and file selection.
Aim 2 (may be a second summer code project) involves further porting the JAVA code to C for distribution with other bioinformatics software.
Language and Skills: Java and R (experience with C for Aim 2)
Idea by: Scooter
Potential Mentors: Alisha, Katie, Scooter
Analogy: intersecting tags can be used to narrow down search in Delicious. This could implemented in Cytoscape using parameters from CyDataTables to intelligently create the equivalent of tags. “Intelligent” refers to meaningful, recognizable attributes like GO terms, etc.
Language and Skills: Java
Idea by: Scooter, Alex, Allan
Potential Mentors: Scooter, Alex, Allan
Similar in idea to what is provided by VistaClara with some of the advanced features in clusterMaker, e.g., dendrograms, zooming, etc.
Language and Skills: Java
Idea by: Scooter
Potential Mentors: Scooter
Molecular interaction networks are complex and difficult for a biologist to comprehend. The goal of this project is to develop layout strategies that draw on biological knowledge including subcellular location, sequence similarity and functional annotations to simplify visualizations while maximizes the biological information conveyed by the display.
Language and Skills: Java
Idea by: David States
Potential Mentors: Co-mentoring is encouraged
Goal: Create a real-time validation framework/panel for Pathvisio for use with MIM and SBGN
Real-time syntax validation helps programmers prevent careless mistakes, such as missing semi-colons and using undefined symbols, by highlighting these errors as they happen. We want to extend this idea in Pathvisio for biological pathway diagrams. There is growing interest in standardized notations for biology (e.g. MIM and SBGN), which are graphical notations that have well-defined rules for syntax meant to promote the creation of unambiguous biological pathway diagrams. One limitation in getting users to draw diagrams according to the rules of these notations is that they involve specifications of 50+ pages, which can be barriers to getting researchers to use the notations.
In order to help users draw diagrams properly, we propose the development of a validation framework and panel in Pathvisio similar to what developers are accustomed to in IDEs, such as the “Problems” view in Eclipse. Rule sets would be encoded in Schematron, a validation language, that uses XSL Transformations (XSLT) to validate XML datasets. The result of validation is a simple XML formatted report using the Schematron Validation Report Language (SVRL). These validation reports would be parsed, and error and/or warning messages for the current diagram would be displayed in a special side-panel in Pathvisio as a background process; additionally, elements with errors and/or warnings could be highlighted. Such a framework would be generic, allowing other developers to create their own rule sets.
Those interested in this project would not be responsible for developing the rule sets. One Schematron rule set has been developed for MIM by researchers at the National Cancer Institute in the US, and a preliminary rule set for SBGN is expected to be completed before the start of GSoC.
Resources
Molecular Interaction Map (MIM) Notation
Systems Biology Graphical Notation (SBGN)
Schematron
Schematron/SVRL Specification
Saxon XSLT processor
Xalan XSLT processor
Mailing list thread with additional links and comments
Language and Skills: Java and XML
Idea by: Augustin
Potential Mentors: Augustin and Martijn
BridgeDb is a framework for identifier search, translation and annotation. The system integrates identifier mapping tables in multiple formats from multiple sources and provides an API for attaching new sources, specifying mapping parameters and querying. BridgeDb is currently integrated in a number of applications, including PathVisio and Cytoscape. This project would involve designing and implementing an interactive web front-end for BridgeDb. Features would include basic and advanced support for common identifier mapping problems.
Language and Skills: PHP and/or Javascript, Web design, Database systems
Possible Mentors: Jahn, Martijn, Alex
BridgeDb currently supports a handful of external resources, but there are more available that could be covered. On this page (http://bridgedb.org/wiki/ComparisonMatrix) we maintain a list of available mapping sources, which ones are covered and which ones still have to be implemented. The CyThesaurus plugin for Cytoscape has to be updated to allow the new mapping sources to be configured in a user-friendly way. Note: This project could be combined with the Idea #14 below.
Language and Skills: Java, Web services
Possible Mentors: Jahn and Martijn
Currently it’s possible to export a pathway as a simple list of identifiers. Some people have requested to get a table (instead of a list) with not only the basic identifiers, but as much extra annotation as possible, such as gene name, description, and identifiers from all large databases such as Ensembl, Entrez, Unigene and Uniprot. Using BridgeDb and the PathVisio plugin framework, it should be possible to create a plugin to export pathways as an annotated table. Note: This project could be combined with the Idea #13 above.
Possible Mentors: Jahn and Martijn
PathVisio can visualize different types of high-throughput data. Thus far it has been tested with microarray, proteomics and metabolomics data. It would be interesting to test the application of pathway analysis to SNP data. An example of such a study is: Gang Peng 2010. Possibly a specialized plugin could be developed to enhance visualization of SNP data.
Language and Skills: Java
Possible mentors: Jahn and Martijn
Goal: Implement a feature in PathVisio that allows users to specify cellular location of a pathway entity.
In the current WikiPathways pathways, cellular locations are usually illustrated as a rectangle or ellipse that define the boundaries of the location, in combination with a label that defines the name of the location (see ‘mitochondrion’ in Apoptosis Pathway for an example). Visually, it is perfectly clear that the genes within that boundary are located in the corresponding cellular location. However, computationally, it’s hard to derive this, unless you would stored the location for each of the genes that are within the boundaries. This information can be stored in GPML, but there is no user interface to do that. A user interface would allow users to set the cellular location for each pathway object. It would also be cool to have some kind of cellular location drawing tool. A way this could work for the user: you draw a rectangle by dragging your mouse, all genes within that rectangle will highlight, you release the mouse button and a dialog pops up where you have to choose the cellular location. The end result would look the same as the current shape/label approach, but now the cellular-location is automatically stored as GPML attribute for all including genes. An extra could be that you can choose the cellular locations from an existing ontology, like Gene Ontology and that you could easily change the location’s boundaries to include or exclude genes.
Language and Skills: Java, web services
Possible mentors: Martijn
Goal: Develop a simplified version of Cytoscape Web using only JavaScript and HTML5 resources (e.g. SVG, Canvas).
Although Cytoscape Web is exposed to developers as a JavaScript API, its core is implemented in Adobe Flash. With the increasing adoption of operating systems that do not support Flash at all (basically iOS), and the constant HTML5 improvements, it might be the right time to start testing the feasibility of a Cytoscape Web version that relies only on web standards, such as JavaScript, HTML, SVG and CSS.
The minimum requirements of the HTML5 prototype are:
Other features are desirable, but not necessary for this project: zooming and panning the whole graph; drag-selection; visual mappers; filtering nodes/edges.
The resulting code does not need to be released with Cytoscape Web, but if the outcome of this project is a simple, but usable and reliable library, it can be released as an extension to the current Flash based implementation, such as a lighter version for users/browsers that do not support Flash. We will still require the project to have professional quality, but since we cannot guarantee that the number of implemented features will be enough for a public Cytoscape Web release, the final goal is to generate a very good HTML5 prototype that successfully evaluates HTML5 technology and can be easily extended in the future.
The student will have to search for open source graph libraries implemented in JavaScript/HTML5, and probably choose one of them as the rendering engine--possible candidates: JavaScript InfoVis Toolkit, Protovis, Raphaël, arbor.js
Language and Skills: JavaScript, AJAX, HTML, CSS, web application development, one object-oriented language (e.g. Java, C#, C++, Python).
Note: The student must have a very good experience with JavaScript and AJAX, and must have used a JavaScript framework such as jQuery before. The mentor will guide the student through the analysis and design phases, and will also help with the implementation, but the student should be experienced enough to be able to understand and handle JavaScript based libraries and write a high quality code, not just basic scripting for HTML form validation, for instance.
Possible mentors: Christian
Goal: Implement a Cytoscape 2.8 plugin that exports a user session as a Cytoscape Web (CW) application. The generated website would display the same networks, keeping their topology and supported visual styles (colors, node sizes, labels, etc).
The plugin should:
The generated website would not be a copy of Cytoscape, but a much simpler network visualization Web application, specially conceived for users that want to quickly save relatively small networks to HTML for web display. For example, it does not need to provide filters or a visual styles editor, although it would be nice to have an attribute browser.
The student could reuse part of the code from the CW demo application, but it is probably better to start with fresh and simpler code.
Language and Skills: Java/Swing, HTML, CSS, JavaScript, jQuery.
Possible mentors: Christian
Goal: Develop a plugin for VANTED for the import and export of GPML files
The VANTED (http://vanted.ipk-gatersleben.de) system (Visualization and Analysis of Networks containing Experimental Data) is an open source software that offers the possibility to load and edit graphs, which may represent biological pathways or functional hierarchies. It allows to integrate different *omics data into the functional context and provides a variety of functions for data mapping and processing, statistical analysis, and visualization. The development of a VANTED plugin for the import and export of GPML files would provide new resources for WikiPathways by exporting networks from VANTED, enable VANTED users to access pathway data from WikiPathways, and establish file exchange capabilities between VANTED and PathVisio.
Language and Skills: Java
Possible mentors: Martijn, Hendrik, Tobias, Falk
Apply your web development skills to improve the WikiPathways website, especially the pathway page. Here are several ideas of features you could work on:
Language and Skills: JavaScript, JQuery, Web development
Possible mentors: Thomas, Alex
igraph is a very versatile and efficient software for graph analysis. Although most of the functions in igraph is written by C, it is very straightforward to connect Cytoscape with igraph via the Java Native Access(JNA) package. A Cytoscape plugin GLay has been developed to port some of the community analysis and high performance layout algorithms to Cytoscape and illustrate this proof of concept. There are many other analytical functions in igraph, which will improve the versatility of Cytoscape if added. A documentation of igraph can be found here.
With igraph, the user can easily generate various random graphs, identify cliques, find communities, search multiple shortest paths, etc. Results from the generic graph analysis functions can then be integrated with other Biological annotation functions in Cytoscape.
Language and Skills: Java, C
Idea by: Gang
Potential Mentors: Gang
PathVisio is currently undergoing the process to integrate the established modularity framework OSGi. One of the goals is to simplify the installation and search of plug-ins for PathVisio. Therefore we want to implement a plug-in manager that allows the user to easily find all available plug-ins and install them. The plug-ins should be available from one or more plug-in site (similar to the plug-in manager of Eclipse).
Language and Skills: Java, basics of OSGi, Web development
Potential Mentors: Martijn
SubgeneViewer is a prototype plugin being developed for Cytoscape which aims to provide innovative visualization for whole transcriptome experiment data produced from RNA sequencing experiments or from splicing sensitive microarrays. The goal of this plugin is to provide a visually informative exon and junction model for visualizing the expression of exons and junctions in a single unified view informed from all known gene transcripts. The gene model is provided from the software AltAnalyze. Below are several aims. AIMS 1 AND 2 ARE SEPARATE STUDENT PROJECTS (too much for one summer together), but Aims 3 and 4 could be apart of either of the summer projects in addition to Aim 1 or 2.
Language and Skills: Java and sufficient understanding of related biology concepts (transcripts, protein translation, splicing, exons and junctions). Experience with Cytoscape is a plus (new or just learned).
Idea by: Nathan
Potential Mentors: Nathan, Doro
Massive amounts of data are being produced from next generation RNA sequencing experiments that provide insights into which RNAs in the cell are expressed and which proteins these RNAs produce. Alternative splicing is a process by which exons and junctions are alternatively expressed in a cell, leading to the expression of different RNAs for a single gene. With alternative splicing, RNAs can be produced for a single gene that differ in whether they are made into protein or not and the composition of function domains that dictate the structure of the protein and which interactions in the cell are possible. The software AltAnalyze allows users to take user RNA sequencing data and determine the possible effect of expressed junctions on which proteins are expressed and what the effect alternative splicing has on the composition of these protein (e.g., are they truncated, expressed and do they support specific interactions in the cell).
The AltInteract plugin will be a new Cytoscape tool designed to visualize the predictions from AltAnalyze along pathways imported into Cytoscape from the Network manager or for de novo created pathways from input user gene lists. This plugin will (A) import or create pathways, (B) import results from AltAnalyze (C) deconstruct the pathway network graph to identify domain-domain interactions disrupted by AltAnalyze predictions and visualize gene level disruptions also based on these predictions and (D) interface this plugin with the plugin SubgeneViewer to visualize gene data at the transcript, exon and junction level (methods to be supported by SubgeneViewer). To accomplish these goals, one or more of the following aims must be met:
Language and Skills: Java and sufficient understanding of related biology concepts (transcripts, protein translation, splicing, exons and junctions). Experience with Cytoscape is a plus (new or just learned).
Idea by: Nathan
Potential Mentors: Nathan, Doro
Related to idea 24, however, implemented in the Python program GO-Elite with no associated visualization component. GO-Elite is analysis software that takes integrated pathways (e.g., WikiPathways, Gene Ontology terms) to identify enriched pathways from amongst input gene lists using known and novel methods. The latest version of this software provides basic methods for the import of pathway XML data. These methods need to be expanded to include interactions from pathways.
For the proposed project, GO-Elite will import results from the program AltAnalyze to determine which pathways have significant disruption of gene expression and domain interactions, resulting from alternative splicing. Basically, for each provided gene, precomputed associations for each gene from AltAnalyze will be imported indicating potential “disruptions” that occur due to alternative exon inclusion. Domain interactions from existing flat files will be integrated into the analysis to determine where potential domain interactions are disrupted as a result of AltAnalyze predictions. A tabular report of the number and type of disruptions for all pathways will be exported along with a second pathway containing the details of each disruption for all pathways. An GO-Elite enrichment analysis will be conducted on these initial results to identify pathways with over-represented disrupted interactions.
Language and Skills: Python and reasonable biology understanding
Idea by: Nathan
Potential Mentors: Nathan
This project involves migrating methods being developed in Perl and R to create a stand-alone python program performing base-level and transcript-level quality control (QC) measures on RNA sequencing data from various sequencing platforms (e.g., Illumina paired-end, ABI Solid).
These methods should include analyses of base composition, error rates (e.g., quality per base position over read length), alignment statistics (mapped, unmapped, non-unique mappings), transcript read density variation (5’ vs. 3’, exon vs. junction, exon vs. intron, normalization correction bias), replicate comparison (quantile-quantile aligned read count plots), known versus novel exon/junctions and expression of a panel of known housekeeping genes. These methods will go far beyond existing QC programs, such as FastQC, by providing analyses on known transcripts, exons and junctions. Output will include tables and graphical plots (PMW). Implementation without calls to external Python libraries (e.g., numpy) is preferred but not required. Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.
Language and Skills: Python, Perl, R and bioinformatics background preferred
Idea by: Nathan
Potential Mentors: Alisha, Nathan
AltAnalyze is a program designed to identify alternative exon detected exons and junctions from RNA-seq and microarray experiments. After identifying regulated exons, AltAnalyze determines the putative effect of alternative exon inclusion on associated proteins, domains and binding sites (protein and RNA). We wish to couple these analyses to allelic variation data that is collected in parallel with RNA-seq and microarray datasets. This project involves importing pre-processed allelic variation data (DNA-seq, SNP arrays, copy number variation data), where genotypes can be assigned per genomic sequence and genes. Based on this allelic data, genotypes (e.g., AA, GG, AG) will be used to group samples prior to analysis as opposed to the user designating which samples belong to each group. Hence, transcriptome data will be re-grouped for each genotype examined and analyzed to identify splicing events that segregate with the genotype for that gene and quantitative trait loci (differential gene expression) along with existing AltAnalyze predictions.
Implementation will require new methods built in Python and modification of existing AltAnalyze methods to support sample to group re-assignment on a per genotype basis. Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.
Language and Skills: Python and bioinformatics background preferred
Idea by: Nathan
Potential Mentors: Nathan
AltAnalyze is a program designed to identify alternative exon detected exons and junctions from RNA-seq and microarray experiments. After identifying regulated exons, AltAnalyze determines the putative effect of alternative exon inclusion on associated proteins, domains and binding sites (protein and RNA). For RNA sequencing studies (RNA-seq), AltAnalyze currently requires the import aligned junction data. The goal of this project is to:
Aim 1:
Aim 2:
Calling these apps will have to implemented in a manner to allow for customized user options (additional flags passed to the commandline applications). Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.
Language and Skills: Python required, C++ preferred, bioinformatics background preferred, RNA-seq knowledge preferred
Idea by: Nathan
Potential Mentors: Nathan, Alisha
The Savant Genome Browser is a desktop visualization tool for genomic data. It was primarily developed for visualizing high throughput (aka next generation) sequencing data, although it can be used to visualize virtually any genome-based sequence, point, interval, or continuous dataset. Savant features a rich plug-in framework, allowing for the integration of diverse methods and datasets, including variant detection, medical annotation, and RNA-sequencing. The goal of this project is to develop a Savant plugin to visualize pathway data from WikiPathways and related resources within Savant. The plugin should
Implementation will require developing a new plugin for the Savant Browser, and working with the Savant team to extend the plug-in API if current functionality proves insufficient.
Language and Skills: Java, Swing, familiarity with genomics data
Idea by: Mike Brudno
Potential Mentors: Mike Brudno, Marc Fiume
OpenTutorials is a free, online tutorial resource for open source and NRNB-supported tools. It is built on top of MediaWiki and a handful of extensions. It’s basic function is to provide a wiki-based collection of tutorials that can be viewed as web page, slide show or printed handout. The wiki approach makes it easy to build and maintain content. The slide show and handout features make it useful for presenters.
There are a number of features that need to be implemented to improve the site and really make it effective to a larger community. These would each be implemented as one or more PHP extensions:
Language and Skills: PHP, MediaWiki
Idea by: Alex
Potential Mentors: Alex, Kristina