Katie Kyle

Florida State University logo

Clemson Research Experiences for Undergraduates

Collaborative Data Visualization Applications Summer 2015

Home Institution

Florida State University
Tallahassee, FL
kek12e@my.fsu.edu

Clemson Research Mentors

Dr. Alex Feltus | Dr. Melissa Smith and Karan Sapra
Genetics and Biochemistry | Electrical and Computer Engineering

Clemson Visualization Mentor

Dr. Jill Gemmill
Electrical and Computer Engineering

About Me

  • My interests: computational biology, molecular evolution, genetics and genomics
  • My hobbies: powerlifting, playing soccer, reading, drawing, and eating good food
  • Something interesting about myself: I love nature and seeing new things, so hiking, rafting, climbing, and any type of exploring are all activities I'm always up for!
  • Classes I've taken; some background about myself: I've taken an assortment of biology and programming classes including Molecular Biology, Genetics, Cell Structure and Function, Organic Chemistry, Object Oriented Programming, Data Structures and Algorithms, Databases, and many more. My favorite of these were Cell Structure and Function, and Object Oriented Programming. I love research and teaching. I've been doing research in an evolutionary biology lab at FSU for the past two years and was a Teaching Assistant for the general biology classes this past year. Next year I will be a Lab Teaching Assistant for the non-majors Biology classes. In the future I hope to go on to graduate school and become a research professor at a large university.
  • Awards/Recognitions: Florida Academic Scholar, FSU Freshman Academic Scholar, MFST MOAA Scholar, FSU Honors Program, FSU WIMSE Research Symposium Presenter and Speaker, Golden Key, Dean's List, ACC Scholar

Project Description

My project is related to the project fellow cohort Amari Lewis is working on entitled "Visualization and interaction of multiple layers of high dimensional biological data". The goal of this project is to improve upon existing code developed for the GPU-enabled Global Gene Network Aligner (G3NA), improving the user interface and the program functionality. Right now my goal is to simplify the initial visual output of the program so as not to overwhelm the user. My goal is to create a more abstracted view of the gene networks in which clusters of genes are represented as ovals or circles rather than as hundreds of nodes representing each gene. The user can then click on these abstracted clusters, if they so desire, to explore the more intricate, expanded view. Simplifying the initial visualization will also save on compute time since the graphical components of the program are very compute-heavy. By only expanding the gene clusters of interest, less strain will be put on the program and hopefully a faster runtime will ensue. Here are some images of the current visualizations produced by G3NA:

Conserved Alignment of Sub-graph Maize and Rice Alignment(A and C are Maize and Rice while B and D are respect consvered sub-graphs) Maize Network
(*Images from http://network.genome.clemson.edu/about.php ).

Week by Week

Week 1

This week I spent most of my time learning HTML and CSS and getting my wepage up and running because I didn't know my project yet. Later in the week, once we were introduced to Palmetto, Clemson's super computer, I started familiarizing myself with the Palmetto environment and looking through the online user's guide. I was able to figure out how to get the GUI for Matlab to pop up through the Palmetto server. I am very excited to utilize the power of Palmetto in the future. On Friday afternoon I was able to finally speak to Dr. Feltus and get an idea of what I should start working on to prepare for my project. I have started looking into CUDA C programming which is a programming model that utilizes both the CPU and GPU. The GPU is used to run many processes in parallel and is a very fast piece of hardware for not only graphics, but also general purpose use. I look forward to meeting with Dr. Feltus early in Week 2 to talk about what my project will entail.

Week 2

On Monday I met with my mentors for the first time where I was told my project and given some tasks to accomplish for the week. I was tasked with installing the MCL-edge software on my palmetto account and running the clustering program (MCL) on a couple of data files. This program uses the Markov Cluster Algorithm to determine which nodes in data files are part of clusters. After I used MCL on the files, I had to write a short python program to alter the output files. MCL produced files that had each cluster on one line of the text file with each node of the cluster separated by tabs. I needed to change this to having each node listed on one line aligned with a cluster ID. I had never used python before this, so I spent some of my time this week getting more familiar with python coding and finally created a script that could generate a unique cluster ID for each cluster and then separate the nodes of the clusters onto indivdual lines paired with their cluster ID.

Once I had accomplished this, I started looking into OpenGL programming. This is a library that can be used with C code and is an API that can create 2D and 3D graphics. This is the language we will be using for the visualization component of this project.

On Friday I met with my mentors again to discuss my progress so far and to talk about the next step. I am now working on writing C code that will read the output file I created using python and store each node into an array of structs that will be used later.

Week 3

This was a busy week! Early in the week I finished with the C code to read each cluster node into an array of structs and met with Karan on Wednesday to get information about my next task. He told me to then start working on adding an adjacency matrix to the C code. This matrix stores the information about edges between the nodes by using the indexes of the nodes from the cluster array to mark which nodes are connected. Conceptually, this is a 2D array with the same number of rows and columns as the cluster array. If two nodes are connected by an edge, a number other than 0 is inserted into the 2D array (matrix) at the position where the corresponding row and column for the two nodes meet. In implementation, it is a 1D array in which the proper index is calculated using the simple formula: 1D_idx = (xidx * num_rows + yidx). Now that this is done, I have a C code which has a cluster array to store all the nodes that are in clusters and an edge matrix that stores all the edges between the clustered nodes. This completes the data preprocessing stage of my project, and hopefully next week I can start working on visualization using OpenGL!

We also started discussing abstracts and our midterm presentations this week. I started working on the rough draft for my abstract, and started putting together a powerpoint for my midterm presentation. Because Amari and I are working on two parts of the same project, we will have to make sure our powerpoints relate to one another. I will be presenting directly before Amari so I need to think about a good transition into her presentation. We will also probably have very similar abstracts. This close connection to another member of the cohort adds an extra level of collaboration to this experience which I think is really neat.

Week 4

I spent a lot of time this week working on my abstract, elevator pitch, and midterm presentation on top of getting started with OpenGL programming. Our midterm presentations are next week on Wednesday and so this week was pretty hectic. There seemed to be a focus on public speaking and presenting with our lecture on elevator speeches on Monday. This was a particularly helpful lecture; Dr. Pindar was phenomenal and had a lot of great ideas and feedback for our talks! Tuesday and Wednesday I focused on my abstract and was glad to get it finished and approved by Dr. Feltus within a few days. I also started working on my midterm presentation since Amari and I needed to work out the details of how our presentations would complement each other's. Amari and I went to the DRL on Friday morning to test our presentations and images in the room we will be presenting in, and we discussed further how I would be transitioning to her presentation and how we would tie our prensentations together at the beginning and end. Also, I worked on my elevator pitch and left a recording on Dr. Byrd's voicemail on Friday evening. This stressed me out more than I thought it would, and I now know I'm going to have to work hard on how I will deliver my presentation on Wednesday.

While doing all of this supplemental work, I was also busy trying to figure out OpenGL. Monday and Tuesday I struggled learning how to compile with the OpenGL libraries, and eventually I had to meet with Karan on Wednesday because I could not figure it out. He was able to get it compiling thankfully, and found a good and simple tutorial for me to start out with. Once I figured out some of the functions I could use, I was able to display the cluster nodes stored in my C code to a window. Right now I just have them as points at random locations, but this is the first visual I have produced. The image is shown below, with the maize nodes in red and the rice in blue.

On Friday, I met with Dr. Feltus and Karan where Karan helped me come up with a plan to have a rudimentary cluster view ready for my midterm presentation (hopefully). I am now tasked to create a new cluster array that keeps information about the clusters, rather than the nodes, and then create a cluster adjacency matrix to keep information about edges between clusters, rather than just nodes. I can then attempt to draw the cluster as points, with the point-size set relative to the size of the cluster, and edges between clusters with the line width relative to the number of connections between two clusters.

Week 5

Starting with elevator pitches and ending with BBQ at The Smoking Pig, Week 5 was an interesting week. On Monday we performed our elevator pitches in front of the cohort and a video camera, which provided valuable feedback on how we present in front of crowds. This was useful since Wednesday was our midterm presentations. Tuesday I spent a lot of time working on producing a rudimentary cluster view of the maize-rice networks to show in my midterm presentation, adding the finishing touches to my powerpoint, and practicing my presentation. You can see the visuals I have produced so far below.



I was able to produce a rudimentary cluster view by setting the point size equal to the cluster size in my OpenGL code. So a cluster containing more genes is represented as a larger point than a cluster with less genes. I was then able to use the cluster edge matrix I created within my code to draw the edges between the clusters. Similarly, the edge width is relative to the number of connections between two clusters. More connections means a wider line between the clusters. Currently the points are displayed in a random layout, so my next step will be creating a force-directed layout to display the clusters in. I also developed a timeline for the rest of my weeks here:

Week 6

We started off this week with part 1 of 3 graduate student seminars with Dr. Gemmill. Each week we will be reading four papers related to visualization and presenting and discussing them with the rest of the cohort. Reading scientific papers teaches you a lot and is an extremely useful skill. On Wednesday we had the priviledge of talking to Mr. Jim Bottum, the CIO at Clemson. His advice and stories were full of wisdom and his care for students was obvious and inspiring. Also on Wednesday I began working on implementing a force-directed layout for my cluster visualization. I continued working on this into Thursday, and eventually asked for help from Dr. Gemmill and Karan at our meeting that afternoon. There were a few bits of OpenGL code missing from my program that was preventing to force directed layout to work, and Karan decided to allow me to use the algorithm he developed for the layout so that I could start focusing on integrating my project with Amari's. Here is my most recent visualization after these code updates:


Also this week I started preparing for the REU mini conference at the College of Charleston. I had to prepare two slides to present my research and a two minute "lightning talk". It is getting much easier to present on my research with each presentation. Other REU groups from NC State, Auburn, and UNC-Charlotte were also at this conference presenting their own research. It was interesting to see what other students at other universities are working on, and it was also a great place to get ideas for my own project. At the end of the conference we were asked to reflect and come up with two things that we learned and either found interesting, or thought of a way it could apply to our own projects. These are two interesting things I learned:

  1. Basics about machine learning and how it can be used in a wide variety of computational algorithms
  2. Many other projects involved the visualization of clusters like my own. I found the existance of this commonality despite the very different data sets very interesting. I was also able to see some different ways to visually represent clusters.

Overall the conference was a good experience and a nice place to get more familiar with giving talks about your research. It was a great atmosphere and Charleston is a very charming city.

Week 7

This was by far the busiest week yet. With final presentations just a week away, I spent a lot of time trying to improve my visualizations and get my code integrated with Amari's. This week we were also honored with talks from two very influential people: Dr. Vernon Burton and Dr. Bernice Rogowitz. Dr. Burton was so interesting to talk to and Dr. Rogowitz gave all of us advice on how to better use color in our visualizations. Dr. Rogowitz's talk on color perception was eye-opening. I had no idea how important the color schemes of a visualization were.

I spent a large portion of my time this week with Dr. Gemmill improving my visualization. She helped me interpret the force-directed layout pseudo code that Karan gave to me last week, and helped me adjust it to work with my code. Unfortunately it is still not working properly, so I will have to continue to research force-directed layouts. Dr. Gemmill also helped me insert some more advanced OpenGL features into my code. I was able to change the clusters from points into spheres, add lighting, and add special key functions so that I can now zoom in and out and rotate my visualization. Dr. Gemmill's 3D graphics exerience was very helpful in understanding the OpenGL library, which is very difficult for a novice to grasp. I was also able to add different colors to the edges of the graph that distinguish between inter- and intraspecies connections. You can see my improved visualizations below.

Week 8

My final week here at the VisREU felt like 8 weeks all on its own. We were busy working on final posters for XSEDE, final presentations for Friday, and starting to think about the final papers we need to write, all the while also planning our trip to the XSEDE conference which means plane tickets and baggage fees and business cards. It was crazy but it was rewarding. I accomplished a lot this week and working on my final poster and presentation made me see how much I accomplished overall here at Clemson, which is a really good feeling. We also had the opportunity to give a lightning talk to incoming freshmen who are interested in research. It was really great to see such young people already looking into the possibilities of research and they asked some really great questions. It was also neat to see how much myself and the rest of the cohort have improved in our presentation abilities. By this point in the program we have given so many presentations about our research, I'm pretty sure I've actually done it in my sleep, and again it is a great feeling to feel confident about your ability to present in front of crowds. Also this week, I was able to implement a search function into my program that searches the graph for a gene name or gene ontology term and then highlights the cluster where it is found. Searching for a gene ontology term still causes some problems, so that will need to be worked on in the future. I think the issue lies somewhere in the parsing on the gene ontology information from the data file. But searching for a gene name works fine. The image below shows where the maize gene "GRMZM2G161306" is in the maize-rice cluster graph (highlighted in red).

Final presentations were at the end of this week and they seemed much more relaxed than the midterm presentations were. Perhaps it is because we are all much more used to presenting at this point, or because there was less riding on this presentation since we were all going to XSEDE. Everyone's presentation was great, and I think mine went pretty well. It was a lot of information packed into ten minutes, but it seemed to go smoothly. I was able to demo my final visualization product, and the cohort said I did a really good job showing my progress throughout the summer so I think that is how I will finish this webpage. Below you will find a progression of the visuals I produced during my time here at Clemson. First however, I would like to say thank you to everyone who contributed to my fantastic experience here at Clemson and the VisREU, especially the ACC and Jim Bottum for giving me the opportunity to take part in this program. A special thank you to Dr. Feltus and Karan who spent so much time with me working on my project and giving words of wisdom, to Dr. Gemmill who spent hours on end working with me to understand the OpenGL library and improve my visual, to Dr. Tanner who was a warm and friendly presence throughout the program and who helped me so much with my scientific writing skills, and to Allison Peasley who was our friend and confidant from start to finish in this program. Finally, a very special thank you to Dr. Byrd who worked tirelessly to give the cohort the best that Clemson has to offer and being a truly great and inspiring mentor. This summer has definitely been one to remember.



In My Own Words

My VisREU/ACC Scholar Experience

My XSEDE15 Experience

Presentations

Midterm Presentation

Lightning Talk

Final Presentation

XSEDE15 Poster

Last updated: 08/07/2015