High-Performance Computing Video Erosion/Dilation and Movement Detection with C++
Carter L. Woetzel
High Performance Computing Final Project
Project 9 of High-Performance Computing was centered around the Parallelization of code that performs a series of actions on a set of 800 images that make up a short video. Parallelization involved using Bethel University’s cluster of nodes dedicated to computations. In order to do this, the MPI (Message Passing Library) was utilized in order to communicate arrays, variables, matrices, and preprocess directives to each one of the nodes. The Bethel cluster is a MIMD (Multiple Instruction Multiple Data) network cluster which allows true parallelization of complex computations. The program creates single dimension arrays to hold the pixel value difference between a foreground image and a background image at each pixel. After this, a threshold check is performed by checking for pixels that fall above and below the threshold. Next, erosion is applied to insignificant white pixels in the threshold array. Insignificant pixels being any 255 pixel that is touching a black pixel. Finally, dilation is applied after image noise is reduced by erosion. This dilated image is a binary image, and only contains pixels that are either 0 or 255.
After two binary images are created, they are compared to each other for any differences between frames. Any difference in frame implies there is movement. If there is movement, the center of the mass of where the movement occurs is calculated for. This is done by creating a row and column index average by using white pixels that are in the first image but not in the second image and vice versa. These x and y coordinates are stored in a 2D array which will eventually pass the results off into an image that contains only black pixels. The pixels where the center of mass was detected are turned white, giving an image that visually tracks the movement of a mass through the entirety of the 800 images. This entire process is split up between nodes — each node covers a certain range of indices and passes of the results to the masternode which will then apply the respective changes to the output image and textfile.
The program.cpp starts by establishing all necessary 1D and 2D arrays which added up to a total of 14 arrays. In addition, variables are set based off of command line inputs — a total of 6 of these are used to establish number of nodes, prefix word for iterating through files, starting index, ending index, file output name, and image output name, as well as the critical threshold value which is the central user control for how they want to filter the images.
After these important variables are established, the program enters the for-loop where the bulk of the computational resources are spent on. The for-loop runs based off each node’s rank and range of images they will be covering. Eight methods (which includes sub methods) are used inside of this for loop. The first method createDifferenceArray() creates an array of pixel values by calculating the absolute value of subtracting the foreground image to the background image. The second method createThresholdAppliedArray() takes the pixel values from differenceArray and cross checks them with the threshold value the user entered- creating a new array of 0 (black) and 255 (white) pixels based off of falling above or below the threshold value. The third method applied is createErodedArray() which takes the threshold array and looks for white pixels that are touching any black pixels. If they are, the new array at said element is set to black from white because the pixel is considered unimportant. This reduces noise in the image. Next, the fourth method createDilatedArray() is applied by using the eroded array. The dilation method looks for any significant white pixels in the eroded array and sets all surrounding pixels to white in the new dilated array. The final product of this series of filters and methods is a binary image. This process is repeated for image 1 and image 2. The two binary images are then compared to each other using xyCoordOfChange(). If any changes are detected between image 1 and image 2 the center of mass that moved is calculated for and written out to xy2DCoordinates — an array that stores the x and y data values that have been calculated for in each node’s unique range. This center of mass x and y location is created by taking the average of significant white pixels that are in binary image 1 but not in binary image 2 and vice versa. This was somewhat unorthodox compared to how other students in the class chose to create their center of masses, but it resulted in better representation of pathing in my opinion. A key feature in the for-loop which reduced a significant amount of time is after the initial pair of images are calculated for, only 1 of the 2 binary images in deleted from memory. This is done because the next set of two images will involve 1 of the same images used in the previous pair of images. Instead of creating the same redundant binary image, the pointer for image 1 in the new pair of images to be compared is simply pointed to image 2 of the last pair of images — saving a significant amount of time.
After exiting the for loop, there is a series of send/receives using MPI::COMM_WORLD to transfer the array of x and y values from each of the nodes to the masternode. Once this messaging is complete, the masternode then uses its now filled 2D array to write out to a .txt file containing all the coordinates. In addition to this, an image that consists of all black pixels has x and y coordinates of pixels in the array (which were recording as being the center of a mass during a movement between frames) turned white — giving a visual indicator that tracks the movement through the frame. Finally, total time to run the entirety of the computation is printed out using MPI::Wtime().
Computation Sharing Method
Splitting up work between nodes involved creating unique upper and lower bounds based off of each Node’s unique MyRank. Lower index was set equal to startingIndex + (myRank * slicedRange) where slicedRange is equal to the number of images divided by the number of nodes. Upper index was set equal to startingIndex + (myRank * slicedRange) + slicedRange — 1. Next, a unique 2D array was created for each node depending on whether the node was the masternode or not. Size of memory allocation was equal to the slicedRange +1 unless the processing ID of the node was 0 — then the memory allocation was equal to the entire range (endingIndex — startingIndex + 1). The startingIndex and endingIndex are entered in by the user at the command line. Once the unique bounds were established, each node calculated its own unique set of images, stored the x and y results in an array, and sent the results row by row back to the masternode.
Conclusion: Performance is close to logarithmic with very quick performance increase with each doubling of the total number of nodes utilized, eventually leveling off because of limitations as a result of overhead as well as resource allocation limitations. The implication and take away is understanding that some issues cannot be solved with more resources — this can be as a result of bottleneck or simply due to the sheer cost inefficiency of such a solution. In addition, this time chart does in fact show that parallelization is extremely powerful. Running the computation with 40 nodes is nearly 18x quicker than running the computation with only 1 node!
The results of the images are harder to interpret. This may be as a result of my unique approach to detecting mass changes compared to my fellow students. The one difference that is noticeable is certain clumps of center mass pixels are less represented in the 40 node computation .pgm than in the single and double node computation. This smooths out the image and makes it seem tighter, but in some ways having less nodes seems to give a more accurate representation of the movement changes.