After Emina Soljanin earned a master’s degree from the University of Sarajevo in the 1980s, she held a position helping to optimize electric power production and distribution for utility grids. Those skills continue to serve her well as she advances ways to efficiently store and retrieve vast amounts of data. Soljanin joined the Department of Electrical and Computer Engineering at the School of Engineering in 2016 after a 21-year communications research career at Bell Laboratories.
Q: How is your work important in the era of big data?
A: If you have big data, someone has to store it. Someone has to make it accessible and also secure. That’s where I come in. Being in storage is not a bad idea! During the California gold rush, for example, the people who got rich weren’t always the miners, who were engaging in speculation. Providing infrastructure for miners was the surer path to wealth: Levi Strauss got rich by making the work pants that miners wore.
Q: How is your background related to information storage?
A: My technical areas are coding and information theory. Coding theory is behind both error correction and data compression. Error correction is necessary in all the systems. You have error correction in your smartphone and on your computer’s hard and flash drives. Applying error correction to the signal coming off a magnetic disk enabled disks to grow in capacity and shrink in size. Today we use it to protect data stored in the cloud, and we hope to reduce the size of data centers.
Then there is data compression and de-duplication—lossless reduction of acquired and generated data so that it doesn't take up as much storage space and can be distributed securely. Coding theory is also used to make data incomprehensible to adversaries.
Q: What kind of challenges are involved in retrieving data?
A: We have to retrieve data quickly. If there is a delay of more than a tenth of a second, people aren’t going to wait. They’re just going to give up, and that amounts to one percent of sales. For Google, a one second delay means that page views drop 11 percent. That’s money lost because of lost advertising revenue.
Data stored in the cloud may reside in several physical locations. I compare it to shopping at a grocery store. One person in a family can get all of the groceries, but that takes a long time. Or you can split up the shopping list and send every member of the family into the store. Then you look at the cashier lines and try to choose the shortest ones. These are all mathematical queuing problems: how to allocate data so delay is minimized.
Q: How do you tie your earlier work in electrical utilities to your research in storage?
A: Data centers are energy hogs. They consume two to three percent of the power we generate, and that consumption is growing at a rate of 20 percent per year. If I can minimize storage, I can cut energy consumption. Distributing data in a way is similar to electricity generation, where we had to constantly balance our sources of power and try to maximize our use of the cheapest source.