Short bio: My name is Amari Lewis, and I am a Senior Comptuer Science major with a concentration in High Performance Computing and a french minor at Winston-Salem State University. I am very appreciative for the opportunity to participate in this years visualization REU at Clemson University. I intend on experiencing cutting edge research and being exposed to a host of new information and computational skills throughout this summer that will truly be beneficial toward my future in research.
The research project that I am working on with Dr. Feltus and Dr. Smith is entitled "Visualization and interaction of multiple layers of high dimensional biological data." Throughout this REU experience my ultimate goal is to make improvements to the existing software created G3NA which was created by the mentors along with PhD student, Karan Sapra. This software is a GPU enabled gene network aligner. Through this software the user is able to input the specific genes of interests to visualize through a force-directed graph/layout. My contribution to this ongoing project is to assist in making the software more interactive by allowing the user to click on specific nodes (genes) to access important Gene Ontology (GO) information about this node. The important GO information may include the display ID, the description, and information type. This task will be achieved through accessing an external database server which includes biological data, also, through coding in python, parse the information to only display the information desired.
During the first week of the REU, I met my research mentors Dr. Feltus and Dr. smith as well as one of her PhD candidates in which I will be working with throughout the summer. I was given a task, to create a C code implementing force-directed layout. This task was given to me as an introduction to some of the goals of the project. In retrospect, the force-drected layout or graphs are how the biological data is represented for the this project. Additionally during this week we attended group sessions. These sessions helped the cohort get adjusted to the campus as well as learning about Palmetto and how to use it through Linux operating system.
This week has been very interesting. I have been really doing a lot of research on the project and how to complete the task. My portion of the project has been defined; I will be using the Ensembl API to access the web database, which has the information about the genes. Ultimately, the goal is to have the visualization tool have the ability to display annotation such as the gene ID when the user selects a node. This will be achieved through programming in python to retrieve the json file and parse the information so that it will only display the gene ID, display ID and the info_type. Additionally, as a cohort we have been attending wonderful session in which we are exposed to the different ways that visualization can be used. This week the keynote speakers included Dr. White and Dr. Feltus, one of my mentor, both of the presentations were interesting and expressed how visualization was important for the respective areas of study. Lastly, we ended the week by visiting the Clemson University’s Department of production arts studio. The studio was amazing. We were able to see the visualization produced by their students.
Throughout this week I continued working on the python program assigned to me in the previous week. Therefore, much of my work included doing more background and research on coding in python order to achieve favorable outcome towards the entire project. Also, as a cohort we began working on preparing our abstracts. I prepared a rough draft for my mentors to review. In a session entitled “introduction to visualization” we were introduced to a new visualization tool, Paraview. In Paraview we went through the tutorials utilizing some of the major functions of the software. Additionally, I met with my mentors at Moe Joe’s coffee shop to discuss our progress in the project and possible conferences for us to consider attending. Friday we had the opportunity to visit the Clemson data center, and we were able to actually see the palmetto cluster and learn about the structure of the entire data center.
This week involved preparing for the midterm presentations. I was able to finanlize my abstract witht he help of Dr. Tanner and my mentor, Dr. Feltus. Kathleen Kyle and I are both working on this project and this week we collaborated to help each other come up with a concise structure for the midterm presentation in which we would be transitioning from one person to the next. Additionally, I have achieved optimal outcome. Using the Arabidopsis thaliana genes, I was able to implement the code in python to access the GO information. We have decided to utilize QT for the next steps of the project. Midterm Presentation
Throughout this week I have been working on improvements for my python code. The code works with the Esembl-Rest API to access the external server and connect with the Gene Ontology databacse which holds all the of the sequence of the genomes. The two crops that we are working with (Maize and Rice) did not have any ontology information being projected which caused an error in my code. The error was actually with the database and my mentor contacted Ensembl-Rest and informed them of such error. Moving forward, for the extend of the project I used the Arabidopsis thaliana gene. I also began working with the qt programming software in order to build the structure for the gPICTviz tool we are helping to create. In preparation for the midterm presentation, i prepared an abstract with my mentors and Dr. Tanners assistance. Abstract: This project explores the interaction between rice and maize genes through gene interaction graphs. The ultimate goal of the research was to study the gene relationships between plants such as rice and maize to identify the correct in-species gene mixture for crop improvement via natural breeding as an alternative to genetically modified organisms. A visualization tool is necessary to view the connections between multiple genes. Many of the available visualization tools for depicting biological information are not able to scale big data or metadata sufficiently. The challenge in the research was to create a useful visualization tool, which incorporated high performance computation to explore the relationships between genes across multiple species. This was achieved through network alignment. The GPU enabled G3NA tool produces optimal, fast alignment of gene interaction graphs. The next stage of research is to improve G3NA tool by connecting to an external server through the Ensembl-REST Application Programming Interface and parse the gene ontology information using python programming language to sort biological information in a real-time interactive visualization tool gPICTviz. This promotes user interactivity and enables a researcher to visualize and interact with multiple networks.
This week consisted of me producing a model of what we hope to achieve for the gPICTviz tool we are creating and working on. I achieved a successful demo through programming in C++ using the Qt creator software. Essentially, the user will have the visualization window as displayed in the right (which is Kathleens visualization of the rice and Maize gene network clusters) and the user can zoom in and zoom out of the visualization to visualize the cluster or a specific node. In the demo, an arbitrary output is displayed, however the ultimate goal is to have the user ability to view a drop down list of the Gene Ontology ID and the definition of the node. During our weekly meetings with my mentors Dr. Feltus, Karan and Kathleen; we have come to a conclusion to integrate my project with Kathleens project. We hope to achieve the click ability and functionality so that the users can easily access the network gene ontology information and definitions.
Moreover, the cohort participated in our first graduate school preparatory course. The course reviewed four scholarly articles. Everyone was placed into a group to review a specific paper, but we were all expected to read all four papers. During this graduate level course we all participated in a discussion analyzing all of the articles. Each group had a designated presenter, which was the person appointed to lead the discussion of the groups assigned article. The discussion went well and we all learned from each other, as well as from reading the articles. This course was very beneficial and will better prepare us for reading and analyzing graduate level scholarly articles. Below are images that depict the display window that the gPICTviz tool will display.
This week is the last week before our final presentations. Throughout this week I have been working on integrating Kathleen’s project with my portion of the project. The work that Kathleen is working on consisted of abstracting a cluster view of the two gene networks of plants, this is done by using C/C++ and openGL. We came across many obstacles trying to integrate our projects into one. These obstacles included scaling the program to function with Qt, as well as having the ability to move nodes and zoom in and out. We began by doing a test visualization using openGL with a triangle. This is shown below:
Additionally, I have incorporated a large text file that entailed all of the gene ontology information retrieved from the external server. In this visualization, the user is able to parse and search through the text file to find certain genes that are associated with the terms such as “photosynthesis.” Essentially, the suer or biologist is interested in the commonalities between the gene networks and various species. The user can also write into the tool, this functionality is used to send information that the user writes to the visualization to display/ highlight the nodes of interest. Below is a depiction of this:
The next steps included incorporating the ability to upload file paths to the tool in order to create the visualization. The way that the force directed graph works is to upload the files associated with each gene network and the alignment file. We are still working towards the functionality of this feature however, below is a depiction of this function.
This week is the final week. Most of my time was dedicated to putting the finishing touches to our project in order to have optimal results to present. below is a link to view my poster presentation for the 2015 XSEDE conference.
click here This REU site experience has been a truly motivating experience. I had the opportunity to participate in cutting edge scientific visualization work. I appreciate all of my mentors for helping support me throughout this entire program and thank you to the National Science Foundation for making this opportunity available.Last updated: 7/23/2015