User Interface for Biometric Data Analysis

Participants: 

Suhas Kumar
Arsalan Rahman
Jaskirat Khangoora

Advisor: 
Prof. Dario Pompili



What is big data?

Big data is a collection of data sets so large and complex that it becomes difficult to process using on-­‐hand database management tools or traditional data processing applications. The three main properties of Big Data, known as the Big V’s, are Volume, Variety, and Velocity. Volume refers to the amount of data, variety refers to the types of data, and velocity refers to the speed of data

Goals
Develop software allowing for a user-­‐friendly interface and statistical tools for analysis of big data.


Contributions
• Provide ways to deter high velocity and large volume data collection
• Allow an easy way to import datasets
• Provide several tools such as regression, clustering and causality analysis
• Provide visualization for the analysis tools listed above



Motivations
Due to the technological improvements over the past decade, there has been a data explosion. According to IBM, we use about 2.5 quintillion bytes of data everyday and 90% of the data we have today has been created in the past two years alone. Almost every electronic device generates some kind of digital data. However, most of this data is not being used due to challenges such as storage, analysis, and visualization. It is difficult to process and analyze such large data sets using traditional data processing applications and on-­‐hand database management tools.

Conclusion

Using the user interface we developed, we ran analysis on sample biometric data. Two methods of Big Data Analysis were carried out. Both methods utilized the simple process of sampling Big Data and running the appropriate analysis on that sample. The results of the analysis on the samples were fairly consistent with the analysis run on the total data. However, there were several abnormalities from the sampling analysis compared to the analysis of the full dataset. This goes to show that real-time analysis is doable and is a fairly consistent manner of analyzing large volumes and high velocities of data. However, there has to be a human component to back up the program due to the abnormalities mentioned above. This is especially true in applications concerning life and death.