Shayne O'Brien

Your home institution logo goes here

Clemson Research Experiences for Undergraduates

Collaborative Data Visualization Applications Summer 2015

Home Institution

SUNY Geneseo
Geneseo, NY

Clemson Research Mentor

Dr. Bo Song
Forestry and Natural Resources

Clemson Visualization Mentor

Dr. Vetria Byrd

About Me

  • Interests: Mathematics, Statistical Modeling, Computer Programming.
  • Hobbies: Backpacking, philosophy, tennis, running.
  • Fun Fact: I have done research investigating illegal African immigration into Spain!
  • Classes: Probability, Regression, Time Series, Linear Algebra, Linguistics, Matlab, R
  • Awards/Honors: Edgar Fellow, Phi Eta Sigma, Dean's List, JRH International Memorial Scholarship, Positively Geneseo

Hello! My name is Shayne O'Brien and I am a Mathematics and Spanish double major in the Edgar Fellows honors program at SUNY Geneseo. Ever since I can remember, I have been fascinated with the outdoors; I am the President of the Geneseo Outing Club and over the past two years I helped spearhead the organization of a co-op between our club and the U.S. Army Corps of Engineers. In addition to Outing Club, I am also an active member of the Latino Student Association at my school, Treasurer of the Class of 2017, and an advocate for social justice. For the past five years I have been a volunteer track coach with my town's local youth organization, and I look forward to returning to this position once again after the conclusion of this REU!

Project Description

This summer I will be working with Dr. Bo Song and her colleague Brian Williams. In my project, I will be visualizing and analyzing collected XYZ and RGB light detection and ranging (LIDAR) data of an American Sycamore Platanus occidentalis over a span of eight weeks as the tree transitions from fall into winter. In order to accomplish this, I will be using Matlab and R to develop algorithmic methodology to appropriately handle the LIDAR data. The objectives of this study are to accomplish the following:

1) generate a high quality 3D virtual representation of a scanned tree and reduce noise due to environmental variability;
2) differentiate between various components (crown, trunk and branches) of a given tree based on RGB values and calculate the respective volumes of these sections;
3) project leaf color change dynamics over time.

This study is being completed in an effort to determine if the FARO Focus3D X 130 laser scanner can be used for research in comparing the effects of different destruction management strategies within forest structures, and to better understand how trees change seasonally and respond to disturbances such as fires and hurricanes.

Week 1

Being that this was our first week here at Clemson, there was a lot of administrative work to do to get everything set up for the rest of the summer. After having an orientation session on Monday with the rest of my fellow cohorts, I got my first chance to meet Dr. Bo Song and her colleague Brian Williams via a Skype conference; they will both be working remotely from Georgetown, South Carolina. We discussed the project that I will be working and went over logistics, expectations, and objectives. The rest of the week was filled with a few lectures on Linux, Palmetto, HTML, learning ethically appropriate research methods, and reading plenty of academic journals and papers related to my project. I think I am beginning to settle into this new environment.

So far I have managed to successfully upload all of the data into my Palmetto storage hard drive through an FTP client (Cyberduck) as well as configure a Matlab GUI that I can access with Terminal on my computer. While the GUI works and I can load all of the data into it, the processing of client-side input is very slow. Additionally, I still need to reorganize the directory within my Palmetto account so as to save scripts/progress. It seems that only Palmetto commands are able to produce visualizations of the various stages of the Sycamore; the GUI software cannot handle the raw amount of data all by itself. Since this is the case, the reorganization/use of Palmetto seems to be necessary to continue onwards. I plan to teach myself this weekend when we have some additional time to work on our projects.

Since I am still learning the ropes of Palmetto, I took the data, copied it, and then reformatted it so that it can be read and imported by R for visualization. By doing this, I was able to generate and look at 3D visualization plots of the Sycamore over time. From this preliminary step, it seems that some noise reduction techniques will be able to be employed for better calculations. During the weekend I will be trying out a script that may have the potential to calculate the volume of the tree as a whole.

Week 2

After generating visuals of the bare tree and the full tree in Matlab, I discovered that the processing time for interacting with the visualization is substantial since plots cannot be directly opened through Palmetto. To address this issue, I generated visuals of all weeks in R due to the interactive nature of 3D plots in R and the capacity of the program to handle larger data sets relatively easily. Through these visualizations, I discovered that the collected data for the week 1 visual had not been fully cleaned, as there was a significant overlay between the Sycamore tree being considered and another, different tree that is nearby. This overlay caused a misrepresentation of the tree by making it appear as if it had additional shoot-offs of branches away from the trunk, when in reality these branches do not exist. Cross comparisons between 3D environment generations made by FARO software, photos taken of the physical tree, and the plots generated by R and Matlab were done to confirm this finding.

While this was the only cleaning error found for the structure of the tree, we confirmed our suspicion that noise due to the sensitivity of the laser scanner used is a significant factor in the quality of the data. Noise in the case of this study is the incorrect assignment of both XYZ and RGB values due to the environmental factors affecting the scanner. Since the equipment that was utilized to obtain the data is not perfect, weather is highly variable and individual tree structures are complex in terms of topology, we did expect to find some margin of error in the data at the outset of this study. The data for weeks one, seven and eight exhibit the most amount of noise. I will be heavily researching noise reduction techniques over the course of the next week.

In addition to noise reduction and working on writing programs for volume calculation and color analysis, I am still trying to get a better feel for the quality of the data collection using FARO. In order to do this, I will be attempting to convert my current data formatting from .xyz into .fbx so that the visualizations may also be produced in Unity3D. So far this has proved difficult to accomplish since Unity3D was not created to be compatible with point cloud data. I met with a faculty member of Clemson's Digital Production Arts department for a little over an hour and a half attempting to do a conversion of the data format, but unfortunately we were unsuccessful in our attempt. I plan to continue to look for methodology to complete this conversion. By putting the data into Unity 3D, we may be able to achieve a visualization of the data that reveals something that plotting in programs, such as R and Matlab, do not.

Week 3

To start off, this week has been filled with great strides in terms of visualization in Matlab. Over the weekend, Brian re-edited the point cloud data for week 1 to remove the overlaying branch from a nearby tree that did not belong in the data set. After generating a new image with the improved data, the tree appears to be fully corrected; the visual accurately represents the physical structure of the tree when compared to pictures from that data collecting week. In regards to Unity3D, the reformatting of the data has proved to be much more difficult than initially expected. Other avenues are currently being sought out to further visualize the data for greater insight, namely ParaView and TIFFS. Both of these programs specialize in handling LIDAR data.

After a conference with my mentor and two of her colleagues to discuss several different programming topics, we decided that it would be beneficial for me to look into the Image Processing Toolbox application that is built into later versions of Matlab. After loading this app up, I discovered that I could import in still pictures of my present 3D visualizations of the Sycamore and denoise the colors by doing minimal thresholding the hue, saturation, and value of each of the individual images.

Noise reduction techniques and application; image on far right is a visual of Week 0 data after HSV denoising.

While removing a large portion of the noise from the image is a good start, I also thought it would be useful to write and edit scripts to obtain 3D RGB histograms to visualize the color distribution of each image. After successfully running the script for the first time, I found that there was a significant skew in the distribution towards black and white. Due to this skew, the rest of the color distribution was not accurately represented. To fix this, an additional script was required to remove the values corresponding to the black and white from the multidimensional arrays. RGB values corresponding to black proved to be more difficult to remove than white, primarily because colors that are removed during the denoising process are assigned to be an invisible black, so-to-say. What this means is that the colors that are removed during the denoising process are assigned to black values, but do not appear when the image is being visualized. This was discovered after an exploration after I noticed that the data matrix sizes were not changing after being denoised, despite the images looking significantly different.

This progressions in noise reduction is substantial because it paves the way to move forward with volume calculations, the differentiation of tree components, and the analysis of leaf color change dynamics over the eight week period. The reason screenshots were used initially -- results shown in the first and second image from the left shown above -- was for debugging purposes. Actual point cloud data is considerably more computationally heavy, and as such all tests will need to be rerun using the full data set. The results that are presented below:

From left to right: a screenshot still of the virtually generated Sycamore with black and white, with black and white removed, and finally the real color distribution of the Sycamore point cloud data before denoising.

The script works by assigning each pixel to one of six histogram cells, computing the frequency of each RGB value, reshaping these frequencies into a 3D array for the histogram, compensating for any possible non-uniform cell lengths, emphasizing smaller frequency colors so they are not overlooked, and ultimately drawing the spheres into the histogram based on RGB frequency and color. An investigation into color thresholding may be the next step. This would involve assigning the cloud point data type to uint8 and then using the rgb2ind function within Matlab. This function relies on indexing, minimum variance quantization and dithering. What this means is that the function calculates which n specified colors best represent the RGB data, and then manipulates the data base on variance analysis to contain just these color values.

On top of everything else, I have also finished configuring and customizing my Palmetto user account in Terminal! With this I have written multiple scripts for volume calculation that are compatible with this supercomputer via parallel programming and have run a few trials. Current results for the volume of the bare and full Sycamore are consistent with previous estimations for volume based on preliminary tests which were done using FARO software, being a 17.39% drop in the volume of the tree from the first week to the last. This overall trend for volume calculations starting from week 0 and ending at week 8 is reasonable. These tests were done using a point resolution of 0.01. This means that the meshgrid being generated by our volume computation script was constructed by looking at the X, Y, and Z absolute extrema and generating areas of the empty grid by using the specified resolution, or spacing. For example, if the maximum and minimum values for X were 0 and 3, respectively, and the resolution was set at 0.10, there would be 30 segments of the line. Decreasing the resolution yields more precise volume results.

The next step is to run these tests at a point resolution of 0.001, thereby scaling our volume calculation down to millimeters^3 to achieve a greater degree of precision. Decreasing the point resolution significantly increases the time it takes for the script to run, so in the future I will be using batch jobs as opposed to interactive sessions. Batch jobs allow a cluster user to submit jobs to be executed without client-side presence. It is important to consider that these trial runs are being completed using the original, raw data. Once I finish fully developing my noise reduction methodology, I will proceed with using the cleaned, denoised data.

Week 4

If this week can be characterized by anything it would have to be trial and error, particularly in trying out new programs to accomplish our objectives. At the same time, a good deal of progress has been made using Matlab, R, ParaView, and Unity3D. Our VisREU cohort also was assigned a couple of supplemental assignments such as an elevator pitch. The abstract for this project has been written, finalized, and then approved by Dr. Song. Lastly, our midterm presentations (open to the campus) will be held on Wednesday, July 1st in the Digital Resources Laboratory. For halfway through the program, I am very happy with where I am at so far! Lets get right into this.

In terms of R, we decided that it would be best to graph the plots without any axes or labels since they are mostly irrelevant. The only exception to this is that they were helpful in figuring out that removing the branches and tree trunk would not be a simple task; the z axis on every week is entirely different. My first shot at removing everything but the leaves of the tree -- in order to analyze leaf color change dynamics over time -- was to check for identical x, y, z, and RGB values between week data sets and then remove them from the later of the two weeks being considered. While the interval of z values for one week may be [0, 25], it might be [-75, -50] for the next. With this being the case, the first script was unsuccessful. I then tried checking a couple of different variations for removal criteria. Essentially the end result was a very large reduction in data set size while maintaining an almost perfect mirror 3D plot between the reduced data set and the original data set. In one case, the data was reduced by 56.19%. The result for that test is reproduced below:

Pictured is week 0. Without a careful eye, it is difficult to tell the difference despite a more than fifty percent reduction in the number of data points.

While this is interesting, it is not what we were aiming to do. From these trials it became apparent that I would need to figure out a way to categorize the RGB values of the data, find those that correspond to the trunk and branches, and delete them. Luckily, as was mentioned last week, Matlab has the built in function rgb2ind. I am currently working on a script that can make the point cloud data compatible with this function. So far, the results look very promising for simple screenshots of the Sycamore. It is important to remember that our primary objective in removing the trunk and branches is to analyze the color change of the leaves over time. Although reducing the color set significantly seems counterproductive, it is always important to figure out how to do something and then refine the methodology afterwards. This being said once the branch and trunk RGB ranges have been identified, ideally the original color distribution will be able to be restored. Additionally, currently only about 65 of the thousands of colors present in the data are large enough to be discernable by the 3D color frequency histogram script, with about half of these being medium to large sized; reducing the number of major colors from 30 to 20-25 does not significantly reduce the quality of the data and doing this helps identify the RGB values we need to move forward in the project. Rgb2ind reassigns the color frequencies to n bins based on minimum variance quantization.

Actually knowing what is happening when denoising the point cloud data has proved to be a bit of a problem because desirable parts of the data were accidentally being removed/assigned to the invisible black. In order to address this, I decided it would be very useful to reformat the RGB values of the point cloud data into a color map. I developed two different ways to do this based on perceptual preference, although the data is still being changed the same way in both color map scripts. The only problem with this was that it reorganized the data completely into increasing size of the RGB values, which made it necessary to write a script for changing the denoised point cloud data back into a 3D virtual representation of the tree has been achieved. The script involves changing the format of the denoised data set completely, duplicating the original data set, and setting removal criteria for the duplicate data set based on commonalities between specific parts of the original data and the denoised data.

Before and after denoising. Both color maps display the same data, but the layout is perceptually different. Using both displays is helpful to avoid removal of valuable data that does not to be denoised.

As far as the volume calculations go, it seems that the jump from 0.01 point resolution to 0.001 requires exponentially greater amounts of memory and computing ability even with Palmetto as a resource. Runs with memory allocations of 120GB, 505GB, and 1.9TB have all failed. After talking with my mentor about this, we decided it may be best to try it at 0.005 resolution instead and if this does not work after a few tries, just keep the resolution at 0.01. Once we get this as well as non-leaf tree section removal figured out, we can compare volume changes over time for both the full tree and just the parts. As far as everything else goes, I have done quite a bit of optimization for all of my current scripts so that some of them will even run within just a few minutes without Palmetto.

I have played around with ParaView, MeshLab, and Unity3D and have gotten some leads on further visualizations of the point cloud. In ParaView, we were able to convert the table into points on a 3D plane but importing the color has proved difficult because doing so requires a script but I am not very familiar with the programming languages ParaView accepts. This being said, we do have a preliminary script currently and I will try to edit it for compatibility. MeshLab also requires a script to colorize but even so we got some interesting results which are displayed below. This was done via Poisson-disk sampling, base mesh subsampling, computing the normals of the point set, applying these results to the cloud, and then reconstructing the surface. Further methodology to achieve a better visualization will be looked into. The results were exportable into .obj, which is accepted by Unity3D although some of the quality was unfortunately lost during the transfer. Still, it is nice to finally have data readable by this visualization tool! Other than that, right now I have a pretty clear idea of where to go and what I need to do to finish this project. I am very excited about what can be accomplished moving into the second half of the REU.

Week 5

It's wild to believe that there's just a few weeks left of the REU -- the summer is flying by. With this being said, this week we had our midterm evaluations! I am extremely delighted to say that I have been selected by the judges at our presentations on Tuesday to receive funding to the XSEDE conference in St. Louis, Missouri from July 26th through July 30th. It is an amazing opportunity and I am looking forward to being able to present my research there. Thank you to everyone who came! I would like to extend congratulations to the entire cohort on how far everyone has gotten in their research in such a short period of time. We will all be going to a REU Mini-conference at the College of Charleston next weekend, where we will each present our work in a "lightning talk."

Since our presentations were this week, Dr. Bo Song and her colleague Brian Williams decided to come up and show support by driving over five hours to be at my presentation. With them they brought up a sizeable amount of data, being the raw point cloud scans of the environment around the three trees. Additionally, they were able to download the Scene editing software onto my assigned Clemson University desktop computer. Besides this, their presence on campus for provided a chance for me to give them both a comprehensive overview of everything that I have done so far in the program, in person. While we have been meeting regularly over the course of the REU via Skype, being face-to-face definitely had its advantages. Brian, Bo and I also got to sit down and go through all of the data and do cross comparisons between the point cloud visualizations in Scene versus what I have been able to do in R, Matlab, ParaView, and Unity3D. Finally, objectives were established with respect to time left in the program.

Although this week was jam-packed between doing elevator pitches, meeting with my mentor and her colleague in person, and prepping for my presentation, I did get a chance to work on my research a bit. What I was able to accomplish was taking the denoised point cloud data and making it run faster from 12 hours down to about 5 minutes without Palmetto. I also was able to do some test runs using rgb2ind. Due to the time constraint I had that was associated with midterms, I have not yet attempted to remove any of the color bins in order to differentiate between the various components of the tree. This will be worked on at the start of week 6 and hopefully I will have this objective completed by this upcoming Wednesday. Results for this are displayed below:

The number of colors that the data has been reduced to in the above .gif is specified by n. This number is verified by converting the RGB data to grayscale and removing the hue and saturation while preserving the value (brightness). The resultant chart on the right is the color bins, each of which spans a single 1x3 RGB correspondent bin. Beyond this, I further optimized all other scripts and also wrote a good deal of comments into them so that this methodology can be used and understood by my mentor after the conclusion of the program. I am very excited moving forward in this program and am planning to submit a paper of my work to the REUNS 2015 conference, which will be held in Dallas, Texas in October. Although the time left for this REU is short, I think that I will be able to accomplish all of my mentor's outset objectives. I am very content about how much work I have done for this REU so far.

Before and after denoising of week 0 point cloud data.

Week 6

Thanks to our CIO Mr. Jim Bottum, everyone will be attending XSEDE at the end of this month! We had the opportunity to meet with him during a round table-style meeting on Tuesday and he graciously offered funding for all those who did not currently have any way to attend the conference. Mr. Bottum is a very interesting man and it was nice to hear some of his insights about how his field has progressed since he first entered it. All in all, it was a great start to the sixth week of this program. It feels like the summer is flying by.

As far as progress in my research goes I have primarily been working on the third objective of this project, which is to differentiate the components of the trees and calculate their respective volumes. So far, early tests using the rgb2ind function on screenshots of the point cloud visualizations have proved to be somewhat successful. I am using screenshots for initial tests because they are easier and quicker to work with than the full point cloud data and we are still in the testing stage for this objective.

Early testing of the rgb2ind function. The black between the trees is due to the fact we are working with a screenshot rather than the point cloud data.

The issues that are currently being encountered are as follows: removal of too much data, and failure to remove the fringe areas of the branches and trunk. Since the rgb2ind function works by reducing the number of colors by rebinning those that are close to each other in regards to RGB value, it will assign some of the noisy trunk data to some variation of green. This problem occurs even at n≥500, where n is the number of colors the data is being restricted to. So far, I have found that n≈60 is sufficient for removing the majority of non-leaf tree components. In regards to volume, it has been discovered through visualizing the Matlab computation results that denoising the data before doing any calculations may be necessary.

The volume of the tree should be decreasing continuously over time since the tree is only shedding leaves, and thereby only decreasing in volume, during the transitionary period from fall into winter.

The next step in my research is to begin working with the denoised point cloud data because the denoised data contains a more refined color set than the original, noisy data. The results from this should be interesting and push us closer to accomplishing our objective. Being that there are only two weeks left of this REU, I will be focusing on optimizing the RGB removal techniques and methodology and working on my paper submission to the REUNS 2015 conference. I am also working with a little bit of statistics in order to elucidate how effective the methodology that I have been establishing actually is.

In other news, the cohort took a road trip to the beautiful city of Charleston this weekend. We were hosted by the College of Charleston and were invited to give two-minute lightning talks to other REU students from around the nation. Although it was very hard to condense all of the material that has been covered in this blog into such a short time frame, part of the program included breakout sessions where audience members got the chance to have a few minutes discussing the presenter's work in a one-on-one setting. It was nice being exposed to so many, really different research projects. Other than more fully realizing that computer science can be applied to just about any discipline or idea, I also got to learn a little bit about the concept of "deep learning." Deep learning uses a set of algorithms in an effort to model complex abstractions in data sets by using models based on multiple non-linear transformations.

Week 7

This penultimate week may very well have been the most stressful and work-packed for me so far. For most of it, I was working on my REUNS paper submission. I submitted it on Friday, July 17 -- our fingers are crossed for having it accepted for publication! On the research-side of things, I have been working heavily in the optimization of the outlaid methodology to accomplish our objectives. Additionally, I had the wonderful chance to meet with Columbia University's Dr. Bernice Rogowitz who gave me some really solid insights in regards to true grounding my findings and visual considerations to have moving forward into the home stretch of this REU.

First and foremost, the L*a*b* (Lab) color space has been investigated for denoising via Matlab's Color Thresholder application as well as component differentiation. Lab thresholding offers a three-dimensional color space in which perceived color differences correspond to colorimetrically measured distances. L* represents the lightness of the image whereas a* and b* represent opposite color dimensions; a* measures from green (-a) to red (+a) and b* measures from blue (-b) to yellow (+b).

A conceptual visualization of the L*a*b* color space and how a typical denoising screen using this space in the Color Thresholder looks to the user.

The Lab color space has the advantage over rgb2ind by maintaining the full color distribution and although the HSV color space might feel more familiar, studies have shown that the Lab color space is more effective in removing target color ranges than the former. This is also discussed a bit in the 1969 publication A grammar of color. At the same time, Dr. Rogowitz brought up that perhaps the most important thing to do is to make sure that the denoising process is as objective as possible. While this is partly achievable by exporting the thresholds specified by the user into a function to be applied to the point cloud data and more precisely editing the values, it is also necessary to look at the 3D RGB histogram generations of the color distribution both before and after denoising. Doing this cross comparison led to the discovery that the data appeared to be non-normal, despite its very large size. This suspicion was confirmed by converting the image into grayscale and analyzing the histogram of the conversion.

3D RGB histograms show the color distribution to be non-normal. Grayscale conversion analysis confirms this suspicion.

While this lends support to the notion that the point cloud does need to be denoised to account for the environmental variability when collecting the data, it does not necessarily prove that we are removing the correct color ranges. It seems obvious to remove colors like white and red from a tree with green leaves and a brown trunk, but in research it is important to true ground anything that is included as a result. Dr. Rogowitz suggested looking at the color distribution of one of the photographs taken by the scanner and observe what the color distribution of the sun rays are. Luckily, one of the photographs that was taken featured veil illumination, in which half of the image was hit by a sun glare. The results from this analysis supports our color removal methodology; the majority of the distribution fell into the regions we have been removing up until this point.

Right: image used displaying veil illumination. The black box contains the image on the left, which was the segment we cropped down to true ground our denoising process.

I had the chance to run a few more tests in Matlab regarding the volumetric estimations of each of our weeks of data. The griddata function that is the basis of our volume script uses the results from meshdata generation to interpolate the volume using Delaunay triangulation. Basically Delaunay triangulation works by subdividing the 3D point cloud into triangles. Every point must be on the circumcircle (edge) of a triangle. This leads to the mesh having the potential for large holes because Delaunay triangulation tries to keep the inner angles bigger than the outer. The triangulation pattern for each point cloud is not unique. What does this mean for us? Well, when calculating the volume using the griddata function, we are relying on having fullness of the data i.e. very good data for both the crown and the trunk. To put that reliance into context the crown of our tree's data is very spotty, which was why the mesh application in MeshLab and Unity3D failed to render like we would have wanted it to back in the Week 4 update.

Since our crown data is spotty, the volume can be said to be "in the ballpark" so-to-say, but not necessarily exact. The triangulation method becomes more and more problematic the lower the quality of the data, such as during week 7 and 8 when the overcast sky combined with the peeled bark significantly impaired the scanner's ability to recognize the points. This variability in data quality causes volume calculations to not show the trends that we would expect as the tree loses its leaves, or in other words steady and continual decreases in volume from week 0 to week 8. To better understand this, visualizations were created within Matlab. In the making of these visualizations, we actually discovered an outlier in the week 8 data! After removing it, volume significantly decreased to a more reasonable number. Still, it is being overestimated.

Left: 3D visualization of Delaunay triangulation and outlier detection.
Middle: 2D visualization of Delaunay triangulation coordinate location.
Right: Delaunay triangulation visualization for week 8 data after outlier removal. Overestimation of volume can be seen, since the tree is bare by this week.

For the last week of this program I will be working on my XSEDE poster, preparing for my final presentation, figuring out how to accurately calculate volume via nearest neighbor Delaunay triangulation, and enjoying my remaining time in Clemson. It's crazy to think in a week we will all be in St. Louis and in two weeks I will be back in New York.

Week 8

Wow, where did the summer go? It is hard to believe that the cohort arrived at Clemson over eight weeks ago and we are already heading out to St. Louis to present our respective projects at the XSEDE 2015 supercomputing conference. While this week I primarily worked on getting my poster finalized for printing and putting together my final presentation, I did manage to also tie together some of the loose ends on my project. These include finally figuring out the volumetric computation algorithm, running final tests on Palmetto for volume projections, and perhaps getting rid of the tree trunk once and for all. Additionally, I am very happy to announce that both my cohort fellow Claudia Salazhar and I's paper submissions were accepted for publication by the REUNS 2015 conference! We are both very excited for this opportunity.

The method that we have decided on for the Delaunay triangulation algorithm is natural neighbor. In the previous trial runs of the data, we had been using nearest neighbor. The nearest neighbor method assumes that the data has a relatively uniform spacing between points. As a result, it does not require large amounts of computing power at lower resolutions. For example when the resolution is set at 0.10, it can run on my personal computer without Palmetto supercomputing resources in about 80 seconds for the largest data set. When using the nearest neighbor method, the Z value of each grid node is simply the Z value of the nearest original data point to that grid node. The nearest neighbor to a grid node uses a simple separation distance without taking anisotropy into account. In other words, this method is isotropic and often leads to overestimations for volume. If two or more points tie as the nearest neighbor to a plotted grid node, the tied data points are sorted on X, then Y, and then Z values. The smallest value of the tied data points is selected by the algorithm as the nearest neighbor.

On the other hand, the natural neighbor gridding method is commonly applied to data sets that have dense data in some areas but sparse data in others. The natural neighbor algorithm estimates the grid node value by finding the closest subset of input data points to a grid node and then applying weight to each based on distance, thereby making it an anisotropic interpolation method for volumetric computation. The downside to this is that it is very CPU intensive, requiring Palmetto resources at 0.10 resolution and still taking 30+ minutes for a single run. The natural neighbor method does not extrapolate the Z grid values beyond the range of data and it does not generate nodes in areas without data, which is why this method is popular to use on data with some sparsity. Below is a comparison of the two different methods. Natural neighbor triangulation offers more flexibility for the weeks of the tree that are bare (weeks 7 and 8) since the data is more sparse in these weeks than in the others. Since the leaves had shed by this point in the data collection period, the volume should have decreased rather than increased as was the case when using the natural neighbor method.

Comparison between nearest neighbor and natural neighbor Delaunay triangulation.

While the trend is still not exactly as it should be, this is due to human error while editing the original point cloud taken by the laser scanner down to just the Sycamore in question. In the data for weeks three through six, visualizations showed that an extraneous branch is still extant. Removing these will give more accurate volume projections of the weeks in question by reducing overestimation as well as possibly correct the trend pictured above. For objectivity, the same crop should be saved within SCENE 3D and applied to each week of data.

Pictured is two examples of why the volume trend is not characterized by monotonicity. Left: Week 3 extraneous branch. Right: Week 4 extraneous branch. Weeks 5 and 6 feature similar errors caused by the complex editing process of the original point cloud.

Although time constraints did not allow for its implementation, I was playing around with another method to differentiate the components of the tree. It focuses upon analysis of the 3D RGB histogram generations and removal of color based on the plotted spheres, rotating the histogram so that just the Cartesian plane for red and green values is visible, and removing entire columns of colors. While it may not be as exact as the L*a*b* color space thresholding method, we do not know unless we try. Additionally, it may be good supplementation to the Color Thresholder application rather than as a primary method of differentiation. Still, this would be tried in future works.

Experimental approach to component differentiation.

Other than all of this, our last week has mostly been filled with putting together our final presentations, putting the finishing touches on our posters, and preparing for the XSEDE 2015 conference in St. Louis. We also got to speak to high school students about our research, which was a great opportunity to refine content down to a more widely understandable level.

Overall, this REU has been an incredible experience. I have learned so much while I have been here and met some truly great people. I wish that the summer was not over quite yet, but all good things eventually end. I would like to thank my mentor Dr. Bo Song, my VizMentor Dr. Vetria Byrd, and all of the faculty over at the CCIT department for believing in both myself and my cohort throughout this summer. Everyone has been unbelievably supportive and helped make Clemson really feel like a home away from home.

In My Own Words

My VisREU Experience

My XSEDE 2015 Experience

Project Summary

Final Presentation

XSEDE Poster

Last updated: 08/02/2015