How to archive data so you can access it
By Claes Nygren
Historical data helps you analyze trends, see what worked in the past, make predictions about the future, and gain valuable insights about your customers.
Why should retailers archive historical data?
Two and a half quintillion data bytes were created daily in 2020. A quintillion has 18 zeros after the 2.5, just in case you were wondering. With so much data it can be hard to know what to keep, but when it comes to your retail business, historical data is the key to future success.
Historical data can help you analyze trends, see what worked in the past, make predictions about the future, and gain valuable insights about your customers. Data can also help build out AI, which requires as much data as you can get! But all that data has to go somewhere! How do you archive data so you can find it when you need it? And can you do it without breaking the bank?
Is data hard to archive?
When you are running operations, it's very common to collect data and save that data. But it's easy to forget why you're saving that data and let the data just sit there.
If you take the time to archive the right data correctly, you can use it to help you know what to do next. Historical data is what you use to know where you came from and help you know where to go.
Why don't people keep historical data?
People throw away historical data because it costs money to store. Data storage always has costs associated with it. It's important to have a plan to store it in a way that is eventually accessible so you can easily access it again, but that requires a budget for storing data.
What happens without historical data?
When it comes to historical data, the biggest risk of not having it is an information gap that can't be restored. That gap means you have to rely on assumptions, whereas if you had the data you could verify it. Not having data decreases the quality of your interpretations. If you have data, you can correlate your data with your assumptions.
What tools can I use to help store data?
There are several options when it comes to storing data. Keep in mind that there are different types of data storage you need, depending on your business model. Right now, we're talking about ways to archive data so you can easily access it in the future. Here are a few options:
Amazon S3, Glacier, and Parquet files
One of the easiest ways to store data is using Amazon S3, and putting the data into Glacier. This is a cheap way to store data and includes an easy documentation process so you know how to bring the data back to a readable state.
A format that is especially good for achieving data is Parquet files. This format is built to be compressed, and when you use Parquet there are tools available to read it and make queries right away so you can easily find archived data. You don't have to import, you can actually query the fields as they are. That's the technical side of storing historical data that is important. These files can be really large but you can still query them in the file system. Amazon, AWS, Athena has built-in support for Parquet files, so one can immediately search across multiple Parquet files by storing them on Amazon S3.
The hard drive solution
Another option is to download your data and store it on a hard drive. There's a lot more risk involved with this option. First off, you may physically lose or damage the hard drive. Additionally, the documentation process for storing data so you know how to find what you're looking for is a lot more involved.
Another option available is database backup. These are great for catastrophe handling. For example, if you're doing backups and everything fails, then you can restore to where you were. But database backups aren't ideal as a long-term solution because databases are evolving and the form they use for each backup ages each month. It may look like a good solution at the time, but if you want to read those files you need a database with the exact version when you did the backup, and you may burn a lot of time at the point of restoring the data.
How does historical data help retailers?
Historical data gives you peace of mind when making decisions about the future, as you have a way of "checking" your predictions.
As retailers have transitioned from brick-and-mortar to e-commerce it can be easy to think that the "old" data from the brick-and-mortar environments isn't useful. In fact, it's useful and very important because likely the pattern of sales will be similar to before, it will just be through a different channel.
Essentially you don't want to waste any opportunity to learn and improve your business, and when you can use data to do that, you're able to make and check your predictions
Need help with your data?
At Assemble we specialize in helping retailers find the right technology to meet their goals. If you're thinking through how to collect, analyze, and archive your data we'd love to talk.