NRNB GSoC How to Apply Resources Project Ideas

NRNB GSoC

“Google Summer of Code (GSoC) is a global program that offers student developers stipends to write code for various open source projects. Google will be working with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together nearly 4,500 students and over 3000 mentors from over 100 countries worldwide, all for the love of code.” GSoC has several goals:

Find out more

Contents

  1. NRNB GSoC
    1. How to apply
      1. Guidelines and Advice
      2. If you are selected
    2. Resources
      1. Communication
      2. For Students
      3. For Mentors
      4. Testimonials
    3. Project Ideas
      1. IDEA 1: Original Idea
      2. IDEA 2: Browsable batch html export from Cytoscape
      3. IDEA 3: GOLayout: Network partitioning and layout driven by GO ontology
      4. IDEA 4: Neo4j graph implementation for Cytoscape 3 and Integration of ThinkerPop Tools
      5. IDEA 5: Interactive path explorer through networks and biological pathways for Cytoscape 2.x
      6. IDEA 6: Decorated Networks in Cytoscape 3.0
      7. IDEA 7: Port HOPACH cluster algorithm into clusterMaker
      8. IDEA 8: Facetted browsing using CyDataTables in 3.0
      9. IDEA 9: Implement heatmaps in CyDataTable browser in 3.0
      10. IDEA 10: Biological directed layout
      11. IDEA 11: Real-time Validation Framework for Pathvisio
      12. IDEA 12: Design and implement web interface for BridgeDb
      13. IDEA 13: Expand coverage of Identifier Mapping resources in BridgeDb
      14. IDEA 14: Export pathway as a fully annotated table from PathVisio
      15. IDEA 15: Visualize SNP data on pathways in PathVisio
      16. IDEA 16: Defining cellular location in a pathway in PathVisio
      17. IDEA 17: Cytoscape Web - HTML5 Prototype
      18. IDEA 18: Cytoscape plugin for exporting networks as a Cytoscape Web application
      19. IDEA 19: GPML import and export for VANTED for interoperability with WikiPathways and PathVisio
      20. IDEA 20: WikiPathways pathway page enhancements
      21. IDEA 21: Connecting Cytoscape with igraph
      22. IDEA 22: PathVisio plug-in manager and plug-in site development
      23. IDEA 23: SubgeneViewer plugin for transcriptome analyses
      24. IDEA 24: AltInteract plugin for visualizing pathway level domain disruption
      25. IDEA 25: Global analysis of domain interaction disruption along pathways in GO-Elite
      26. IDEA 26: Implementation of a new RNA-seq quality control program
      27. IDEA 27: Allelic variation analysis in AltAnalyze
      28. IDEA 28: Integrate Existing RNA-seq alignment software with AltAnalyze
      29. IDEA 29: Integrate Pathway Visualization and Analysis into Savant
      30. IDEA 30: New Features for Online Tutorial Support System

NRNB GSoC

We are once again pooling the efforts of our colleagues and collaborators for this year’s Google Summer of Code. The National Resource for Network Biology (NRNB) is organizing the joint efforts of GenMAPP, Cytoscape, and WikiPathways (see below). This is a great opportunity to work at the intersection of biology and computing.

GenMAPP is a pathway visualization and analysis tool for biological data. GenMAPP illustrates the relationships between various genes and proteins to help researchers understand their data in terms of connected, biological pathways. Tens of thousands of people from a hundred countries have registered to download the GenMAPP program. The GenMAPP group is coordinated by the Conklin Lab at the Gladstone Institutes (UCSF). Our development team is composed of biologists and programmers, providing a unique perspective on building and using open source tools.Links: Website(old), Wiki

Cytoscape is a general network visualization tool that integrates network topology with data about the network into the visualization. Cytoscape is rapidly becoming a systems biology standard. Cytoscape consists of a plugin framework which extends functionality in new ways. Our team consists of programmers and biologists from both academia and industry including: UC San Diego, UC San Francisco, U of Toronto, Agilent, Institute for Systems Biology, Sloan-Kettering, Institut Pasteur and others. Links: Website, Wiki, 2.8.0 Javadocs

WikiPathways is a wiki for biological pathways. The wiki approach allows biologists and domain experts to easily create and update pathways. Pathways can be directly modified from a web browser using an embedded applet where you can draw genes, proteins and their interactions like in any popular drawing tool. The pathways can be used as images for publication and in data analysis tools such as GenMAPP, PathVisio and Cytoscape. WikiPathways itself is completely open source and is built on top of MediaWiki, using PathVisio as the pathway editor and BridgeDb as the backend database. WikiPathways is developed and maintained by BiGCaT Bioinformatics (University of Maastricht) and the Conklin Lab at the Gladstone Institutes (UCSF). Links: Website, Source code


How to apply

We would like to know who you are and how you think. Incorporate the following into your application:

Guidelines and Advice

If you are selected


Resources

Communication

For Students

For Mentors

Testimonials


Project Ideas

As we are prototyping new features and functions for GenMAPP, Cytoscape, and WikiPathways we are exploring a number of areas ideal for Google Summer of Code students. These projects include a broad set of skills, technologies and domains, such as Java GUIs, database integration, algorithms and wikis. Of course, you are also encouraged to propose your own ideas related to our projects. If you have solid CS skills and have an interest in the biological domain (do you think genes are cool?), then you should apply!

IDEA 1: Original Idea

Feel free to propose your own idea. As long as it relates to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers, but make sure your proposal is also relevant.

IDEA 2: Browsable batch html export from Cytoscape

Researchers commonly want to share their results on the web in the form of pathways colored with data. This export feature would include a browsable index page with a user-defined list of pathways/networks, all related pathway pages with coloring representing user-defined coloring criteria. A drop-down menu would control which coloring criteria is depicted on the pathway. In addition, a set of backpages per gene would be exported as well, listing all database references relevant for the particular gene. More about idea 2...

The batch html export is inspired by the MAPP Set export in GenMAPP 2.0. An example of a batch html export can be found here.

Language and Skills: java
Idea by: Scooter
Potential Mentors: Kristina, Alex, Scooter

IDEA 3: GOLayout: Network partitioning and layout driven by GO ontology

We’ve already made a first pass at developing the GOLayout plugin for Cytoscape. It’s basic function is to partition large network hairball into several small subnetworks, each containing genes/proteins associated with a particular biological process. Within each subnetwork, genes/proteins are laid out by cellular compartments and color coded molecular function. The final layout includes graphical annotations for cellular compartments defined by a template file (GPML file). There is still a lot of work to be done to make this useful and really cool. Some ideas include:

Language and Skills: Java
Idea by: Alex, Allan, Scooter, Kristina
Potential Mentors: Alexander, Allan, Kristina, Scooter

IDEA 4: Neo4j graph implementation for Cytoscape 3 and Integration of ThinkerPop Tools

Cytoscape 3 can have multiple implementations for data models (i.e., network data). This means, in addition to the current on-memory graph implementation, we can use graph database systems as its backend. Neo4j is one of the most popular implementation of graph databases written in Java. If Cytoscape can use Neo4j as its network/attribute database, the graph size Cytoscape can handle is significantly larger than current on-memory model. The goal of this project is use Neo4j as an implementation of Cytoscape 3’s graph model and make it usable from ThinkerPop tools. As a possible alternative to Neo4j, I have heard good things about 4store (Alex).

Language and Skills: java, maven, basics of OSGi
Idea by: Kei
Potential Mentors: Kei

IDEA 5: Interactive path explorer through networks and biological pathways for Cytoscape 2.x

When you click on a node you might want to show the flow of reaction information downstream from the click based on directional arrows and arrow types (activation/inhibition). You might also want to highlight all the “transcription factors” downstream of a clicked node. Ultimately, we want to support various heuristics for defining a “shortest” or “most interesting” path. The semantics of the arrows and the functional annotation of the nodes can come from a number of sources that we would want to standardize in this project. There is a line here between exploratory analysis and hardcore simulation. We will want to stay on soft side of this line. This work can make use of existing algorithms for shortest-path, hubs, attribute-weighted paths, etc.

Language and Skills: Java
Idea by: Alex, Allan
Potential Mentors: Alex, Allan

IDEA 6: Decorated Networks in Cytoscape 3.0

Early architectural discussions on 3.0 included the concept of a Decorated Network. This would be a subclass of CyNetwork that includes graphical annotation. For example, the traditional pathway diagram cannot be easily represented in Cytoscape due to lack of grouping symbols, lines, arrows, etc. In this project we would like a student to implement the user interface and graphical model to display these annotations within Cytoscape 3.0.

Language and Skills: Java
Idea by: Scooter
Potential Mentors: Scooter

IDEA 7: Port HOPACH cluster algorithm into clusterMaker

See prior work on clusterMaker

See HOPACH publication.

Aim 1 involves migration of the R program HOPACH to JAVA and running this software from an associated Cytoscape plugin with nice graphical interface for user options and file selection.
Aim 2 (may be a second summer code project) involves further porting the JAVA code to C for distribution with other bioinformatics software.

Language and Skills: Java and R (experience with C for Aim 2)
Idea by: Scooter
Potential Mentors: Alisha, Katie, Scooter

IDEA 8: Facetted browsing using CyDataTables in 3.0

Analogy: intersecting tags can be used to narrow down search in Delicious. This could implemented in Cytoscape using parameters from CyDataTables to intelligently create the equivalent of tags. “Intelligent” refers to meaningful, recognizable attributes like GO terms, etc.

Language and Skills: Java
Idea by: Scooter, Alex, Allan
Potential Mentors: Scooter, Alex, Allan

IDEA 9: Implement heatmaps in CyDataTable browser in 3.0

Similar in idea to what is provided by VistaClara with some of the advanced features in clusterMaker, e.g., dendrograms, zooming, etc.

Language and Skills: Java
Idea by: Scooter
Potential Mentors: Scooter

IDEA 10: Biological directed layout

Molecular interaction networks are complex and difficult for a biologist to comprehend. The goal of this project is to develop layout strategies that draw on biological knowledge including subcellular location, sequence similarity and functional annotations to simplify visualizations while maximizes the biological information conveyed by the display.

Language and Skills: Java
Idea by: David States
Potential Mentors: Co-mentoring is encouraged

IDEA 11: Real-time Validation Framework for Pathvisio

Goal: Create a real-time validation framework/panel for Pathvisio for use with MIM and SBGN

Real-time syntax validation helps programmers prevent careless mistakes, such as missing semi-colons and using undefined symbols, by highlighting these errors as they happen. We want to extend this idea in Pathvisio for biological pathway diagrams. There is growing interest in standardized notations for biology (e.g. MIM and SBGN), which are graphical notations that have well-defined rules for syntax meant to promote the creation of unambiguous biological pathway diagrams. One limitation in getting users to draw diagrams according to the rules of these notations is that they involve specifications of 50+ pages, which can be barriers to getting researchers to use the notations.

In order to help users draw diagrams properly, we propose the development of a validation framework and panel in Pathvisio similar to what developers are accustomed to in IDEs, such as the “Problems” view in Eclipse. Rule sets would be encoded in Schematron, a validation language, that uses XSL Transformations (XSLT) to validate XML datasets. The result of validation is a simple XML formatted report using the Schematron Validation Report Language (SVRL). These validation reports would be parsed, and error and/or warning messages for the current diagram would be displayed in a special side-panel in Pathvisio as a background process; additionally, elements with errors and/or warnings could be highlighted. Such a framework would be generic, allowing other developers to create their own rule sets.

Those interested in this project would not be responsible for developing the rule sets. One Schematron rule set has been developed for MIM by researchers at the National Cancer Institute in the US, and a preliminary rule set for SBGN is expected to be completed before the start of GSoC.

Resources
Molecular Interaction Map (MIM) Notation
Systems Biology Graphical Notation (SBGN)
Schematron
Schematron/SVRL Specification
Saxon XSLT processor
Xalan XSLT processor
Mailing list thread with additional links and comments

Language and Skills: Java and XML
Idea by: Augustin
Potential Mentors: Augustin and Martijn

IDEA 12: Design and implement web interface for BridgeDb

BridgeDb is a framework for identifier search, translation and annotation. The system integrates identifier mapping tables in multiple formats from multiple sources and provides an API for attaching new sources, specifying mapping parameters and querying. BridgeDb is currently integrated in a number of applications, including PathVisio and Cytoscape. This project would involve designing and implementing an interactive web front-end for BridgeDb. Features would include basic and advanced support for common identifier mapping problems.

Language and Skills: PHP and/or Javascript, Web design, Database systems
Possible Mentors: Jahn, Martijn, Alex

IDEA 13: Expand coverage of Identifier Mapping resources in BridgeDb

BridgeDb currently supports a handful of external resources, but there are more available that could be covered. On this page (http://bridgedb.org/wiki/ComparisonMatrix) we maintain a list of available mapping sources, which ones are covered and which ones still have to be implemented. The CyThesaurus plugin for Cytoscape has to be updated to allow the new mapping sources to be configured in a user-friendly way. Note: This project could be combined with the Idea #14 below.

Language and Skills: Java, Web services
Possible Mentors: Jahn and Martijn

IDEA 14: Export pathway as a fully annotated table from PathVisio

Currently it’s possible to export a pathway as a simple list of identifiers. Some people have requested to get a table (instead of a list) with not only the basic identifiers, but as much extra annotation as possible, such as gene name, description, and identifiers from all large databases such as Ensembl, Entrez, Unigene and Uniprot. Using BridgeDb and the PathVisio plugin framework, it should be possible to create a plugin to export pathways as an annotated table. Note: This project could be combined with the Idea #13 above.

Possible Mentors: Jahn and Martijn

IDEA 15: Visualize SNP data on pathways in PathVisio

PathVisio can visualize different types of high-throughput data. Thus far it has been tested with microarray, proteomics and metabolomics data. It would be interesting to test the application of pathway analysis to SNP data. An example of such a study is: Gang Peng 2010. Possibly a specialized plugin could be developed to enhance visualization of SNP data.

Language and Skills: Java
Possible mentors: Jahn and Martijn

IDEA 16: Defining cellular location in a pathway in PathVisio

Goal: Implement a feature in PathVisio that allows users to specify cellular location of a pathway entity.

In the current WikiPathways pathways, cellular locations are usually illustrated as a rectangle or ellipse that define the boundaries of the location, in combination with a label that defines the name of the location (see ‘mitochondrion’ in Apoptosis Pathway for an example). Visually, it is perfectly clear that the genes within that boundary are located in the corresponding cellular location. However, computationally, it’s hard to derive this, unless you would stored the location for each of the genes that are within the boundaries. This information can be stored in GPML, but there is no user interface to do that. A user interface would allow users to set the cellular location for each pathway object. It would also be cool to have some kind of cellular location drawing tool. A way this could work for the user: you draw a rectangle by dragging your mouse, all genes within that rectangle will highlight, you release the mouse button and a dialog pops up where you have to choose the cellular location. The end result would look the same as the current shape/label approach, but now the cellular-location is automatically stored as GPML attribute for all including genes. An extra could be that you can choose the cellular locations from an existing ontology, like Gene Ontology and that you could easily change the location’s boundaries to include or exclude genes.

Language and Skills: Java, web services
Possible mentors: Martijn

IDEA 17: Cytoscape Web - HTML5 Prototype

Goal: Develop a simplified version of Cytoscape Web using only JavaScript and HTML5 resources (e.g. SVG, Canvas).

Although Cytoscape Web is exposed to developers as a JavaScript API, its core is implemented in Adobe Flash. With the increasing adoption of operating systems that do not support Flash at all (basically iOS), and the constant HTML5 improvements, it might be the right time to start testing the feasibility of a Cytoscape Web version that relies only on web standards, such as JavaScript, HTML, SVG and CSS.

The minimum requirements of the HTML5 prototype are:

Other features are desirable, but not necessary for this project: zooming and panning the whole graph; drag-selection; visual mappers; filtering nodes/edges.

The resulting code does not need to be released with Cytoscape Web, but if the outcome of this project is a simple, but usable and reliable library, it can be released as an extension to the current Flash based implementation, such as a lighter version for users/browsers that do not support Flash. We will still require the project to have professional quality, but since we cannot guarantee that the number of implemented features will be enough for a public Cytoscape Web release, the final goal is to generate a very good HTML5 prototype that successfully evaluates HTML5 technology and can be easily extended in the future.

The student will have to search for open source graph libraries implemented in JavaScript/HTML5, and probably choose one of them as the rendering engine--possible candidates: JavaScript InfoVis Toolkit, Protovis, Raphaël, arbor.js

Language and Skills: JavaScript, AJAX, HTML, CSS, web application development, one object-oriented language (e.g. Java, C#, C++, Python).

Note: The student must have a very good experience with JavaScript and AJAX, and must have used a JavaScript framework such as jQuery before. The mentor will guide the student through the analysis and design phases, and will also help with the implementation, but the student should be experienced enough to be able to understand and handle JavaScript based libraries and write a high quality code, not just basic scripting for HTML form validation, for instance.

Possible mentors: Christian

IDEA 18: Cytoscape plugin for exporting networks as a Cytoscape Web application

Goal: Implement a Cytoscape 2.8 plugin that exports a user session as a Cytoscape Web (CW) application. The generated website would display the same networks, keeping their topology and supported visual styles (colors, node sizes, labels, etc).

The plugin should:

The generated website would not be a copy of Cytoscape, but a much simpler network visualization Web application, specially conceived for users that want to quickly save relatively small networks to HTML for web display. For example, it does not need to provide filters or a visual styles editor, although it would be nice to have an attribute browser.
The student could reuse part of the code from the CW demo application, but it is probably better to start with fresh and simpler code.

Language and Skills: Java/Swing, HTML, CSS, JavaScript, jQuery.
Possible mentors: Christian

IDEA 19: GPML import and export for VANTED for interoperability with WikiPathways and PathVisio

Goal: Develop a plugin for VANTED for the import and export of GPML files

The VANTED (http://vanted.ipk-gatersleben.de) system (Visualization and Analysis of Networks containing Experimental Data) is an open source software that offers the possibility to load and edit graphs, which may represent biological pathways or functional hierarchies. It allows to integrate different *omics data into the functional context and provides a variety of functions for data mapping and processing, statistical analysis, and visualization. The development of a VANTED plugin for the import and export of GPML files would provide new resources for WikiPathways by exporting networks from VANTED, enable VANTED users to access pathway data from WikiPathways, and establish file exchange capabilities between VANTED and PathVisio.

Language and Skills: Java
Possible mentors: Martijn, Hendrik, Tobias, Falk

IDEA 20: WikiPathways pathway page enhancements

Apply your web development skills to improve the WikiPathways website, especially the pathway page. Here are several ideas of features you could work on:

Language and Skills: JavaScript, JQuery, Web development
Possible mentors: Thomas, Alex

IDEA 21: Connecting Cytoscape with igraph

igraph is a very versatile and efficient software for graph analysis. Although most of the functions in igraph is written by C, it is very straightforward to connect Cytoscape with igraph via the Java Native Access(JNA) package. A Cytoscape plugin GLay has been developed to port some of the community analysis and high performance layout algorithms to Cytoscape and illustrate this proof of concept. There are many other analytical functions in igraph, which will improve the versatility of Cytoscape if added. A documentation of igraph can be found here.

With igraph, the user can easily generate various random graphs, identify cliques, find communities, search multiple shortest paths, etc. Results from the generic graph analysis functions can then be integrated with other Biological annotation functions in Cytoscape.

Language and Skills: Java, C
Idea by: Gang
Potential Mentors: Gang

IDEA 22: PathVisio plug-in manager and plug-in site development

PathVisio is currently undergoing the process to integrate the established modularity framework OSGi. One of the goals is to simplify the installation and search of plug-ins for PathVisio. Therefore we want to implement a plug-in manager that allows the user to easily find all available plug-ins and install them. The plug-ins should be available from one or more plug-in site (similar to the plug-in manager of Eclipse).

Language and Skills: Java, basics of OSGi, Web development
Potential Mentors: Martijn

IDEA 23: SubgeneViewer plugin for transcriptome analyses

SubgeneViewer is a prototype plugin being developed for Cytoscape which aims to provide innovative visualization for whole transcriptome experiment data produced from RNA sequencing experiments or from splicing sensitive microarrays. The goal of this plugin is to provide a visually informative exon and junction model for visualizing the expression of exons and junctions in a single unified view informed from all known gene transcripts. The gene model is provided from the software AltAnalyze. Below are several aims. AIMS 1 AND 2 ARE SEPARATE STUDENT PROJECTS (too much for one summer together), but Aims 3 and 4 could be apart of either of the summer projects in addition to Aim 1 or 2.

Language and Skills: Java and sufficient understanding of related biology concepts (transcripts, protein translation, splicing, exons and junctions). Experience with Cytoscape is a plus (new or just learned).
Idea by: Nathan
Potential Mentors: Nathan, Doro

IDEA 24: AltInteract plugin for visualizing pathway level domain disruption

Massive amounts of data are being produced from next generation RNA sequencing experiments that provide insights into which RNAs in the cell are expressed and which proteins these RNAs produce. Alternative splicing is a process by which exons and junctions are alternatively expressed in a cell, leading to the expression of different RNAs for a single gene. With alternative splicing, RNAs can be produced for a single gene that differ in whether they are made into protein or not and the composition of function domains that dictate the structure of the protein and which interactions in the cell are possible. The software AltAnalyze allows users to take user RNA sequencing data and determine the possible effect of expressed junctions on which proteins are expressed and what the effect alternative splicing has on the composition of these protein (e.g., are they truncated, expressed and do they support specific interactions in the cell).

The AltInteract plugin will be a new Cytoscape tool designed to visualize the predictions from AltAnalyze along pathways imported into Cytoscape from the Network manager or for de novo created pathways from input user gene lists. This plugin will (A) import or create pathways, (B) import results from AltAnalyze (C) deconstruct the pathway network graph to identify domain-domain interactions disrupted by AltAnalyze predictions and visualize gene level disruptions also based on these predictions and (D) interface this plugin with the plugin SubgeneViewer to visualize gene data at the transcript, exon and junction level (methods to be supported by SubgeneViewer). To accomplish these goals, one or more of the following aims must be met:

Language and Skills: Java and sufficient understanding of related biology concepts (transcripts, protein translation, splicing, exons and junctions). Experience with Cytoscape is a plus (new or just learned).
Idea by: Nathan
Potential Mentors: Nathan, Doro

IDEA 25: Global analysis of domain interaction disruption along pathways in GO-Elite

Related to idea 24, however, implemented in the Python program GO-Elite with no associated visualization component. GO-Elite is analysis software that takes integrated pathways (e.g., WikiPathways, Gene Ontology terms) to identify enriched pathways from amongst input gene lists using known and novel methods. The latest version of this software provides basic methods for the import of pathway XML data. These methods need to be expanded to include interactions from pathways.

For the proposed project, GO-Elite will import results from the program AltAnalyze to determine which pathways have significant disruption of gene expression and domain interactions, resulting from alternative splicing. Basically, for each provided gene, precomputed associations for each gene from AltAnalyze will be imported indicating potential “disruptions” that occur due to alternative exon inclusion. Domain interactions from existing flat files will be integrated into the analysis to determine where potential domain interactions are disrupted as a result of AltAnalyze predictions. A tabular report of the number and type of disruptions for all pathways will be exported along with a second pathway containing the details of each disruption for all pathways. An GO-Elite enrichment analysis will be conducted on these initial results to identify pathways with over-represented disrupted interactions.

Language and Skills: Python and reasonable biology understanding
Idea by: Nathan
Potential Mentors: Nathan

IDEA 26: Implementation of a new RNA-seq quality control program

This project involves migrating methods being developed in Perl and R to create a stand-alone python program performing base-level and transcript-level quality control (QC) measures on RNA sequencing data from various sequencing platforms (e.g., Illumina paired-end, ABI Solid).

These methods should include analyses of base composition, error rates (e.g., quality per base position over read length), alignment statistics (mapped, unmapped, non-unique mappings), transcript read density variation (5’ vs. 3’, exon vs. junction, exon vs. intron, normalization correction bias), replicate comparison (quantile-quantile aligned read count plots), known versus novel exon/junctions and expression of a panel of known housekeeping genes. These methods will go far beyond existing QC programs, such as FastQC, by providing analyses on known transcripts, exons and junctions. Output will include tables and graphical plots (PMW). Implementation without calls to external Python libraries (e.g., numpy) is preferred but not required. Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.

Language and Skills: Python, Perl, R and bioinformatics background preferred
Idea by: Nathan
Potential Mentors: Alisha, Nathan

IDEA 27: Allelic variation analysis in AltAnalyze

AltAnalyze is a program designed to identify alternative exon detected exons and junctions from RNA-seq and microarray experiments. After identifying regulated exons, AltAnalyze determines the putative effect of alternative exon inclusion on associated proteins, domains and binding sites (protein and RNA). We wish to couple these analyses to allelic variation data that is collected in parallel with RNA-seq and microarray datasets. This project involves importing pre-processed allelic variation data (DNA-seq, SNP arrays, copy number variation data), where genotypes can be assigned per genomic sequence and genes. Based on this allelic data, genotypes (e.g., AA, GG, AG) will be used to group samples prior to analysis as opposed to the user designating which samples belong to each group. Hence, transcriptome data will be re-grouped for each genotype examined and analyzed to identify splicing events that segregate with the genotype for that gene and quantitative trait loci (differential gene expression) along with existing AltAnalyze predictions.

Implementation will require new methods built in Python and modification of existing AltAnalyze methods to support sample to group re-assignment on a per genotype basis. Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.

Language and Skills: Python and bioinformatics background preferred
Idea by: Nathan
Potential Mentors: Nathan

IDEA 28: Integrate Existing RNA-seq alignment software with AltAnalyze

AltAnalyze is a program designed to identify alternative exon detected exons and junctions from RNA-seq and microarray experiments. After identifying regulated exons, AltAnalyze determines the putative effect of alternative exon inclusion on associated proteins, domains and binding sites (protein and RNA). For RNA sequencing studies (RNA-seq), AltAnalyze currently requires the import aligned junction data. The goal of this project is to:

Aim 1:

Aim 2:

Calling these apps will have to implemented in a manner to allow for customized user options (additional flags passed to the commandline applications). Due to the nature of this proposal, it may inherently be ranked less than other ideas as it does not involve a network biology problem in it’s suggested implementation. Please feel free to modify where if you see unexplored areas of innovation in this regard.

Language and Skills: Python required, C++ preferred, bioinformatics background preferred, RNA-seq knowledge preferred
Idea by: Nathan
Potential Mentors: Nathan, Alisha

IDEA 29: Integrate Pathway Visualization and Analysis into Savant

The Savant Genome Browser is a desktop visualization tool for genomic data. It was primarily developed for visualizing high throughput (aka next generation) sequencing data, although it can be used to visualize virtually any genome-based sequence, point, interval, or continuous dataset. Savant features a rich plug-in framework, allowing for the integration of diverse methods and datasets, including variant detection, medical annotation, and RNA-sequencing. The goal of this project is to develop a Savant plugin to visualize pathway data from WikiPathways and related resources within Savant. The plugin should

Implementation will require developing a new plugin for the Savant Browser, and working with the Savant team to extend the plug-in API if current functionality proves insufficient.

Language and Skills: Java, Swing, familiarity with genomics data
Idea by: Mike Brudno
Potential Mentors: Mike Brudno, Marc Fiume

IDEA 30: New Features for Online Tutorial Support System

OpenTutorials is a free, online tutorial resource for open source and NRNB-supported tools. It is built on top of MediaWiki and a handful of extensions. It’s basic function is to provide a wiki-based collection of tutorials that can be viewed as web page, slide show or printed handout. The wiki approach makes it easy to build and maintain content. The slide show and handout features make it useful for presenters.

There are a number of features that need to be implemented to improve the site and really make it effective to a larger community. These would each be implemented as one or more PHP extensions:

Language and Skills: PHP, MediaWiki
Idea by: Alex
Potential Mentors: Alex, Kristina