How is Data Stored?

Small figurines working on circuit board

While data is acknowledged as the basis for business strategy and competitive advantage, the storage of that same data is often overlooked, or treated as a commodity. It’s time to change that — because how data is stored has a profound impact on your business, affecting data security, productivity, and costs. 

Why the right data storage strategy is important: 

So let’s start by asking: why is it so important to make the right data storage decisions? 

  • Security: The most important role of storage is to keep your data secure. Data is at risk from virus or hacking attacks, or from being deleted accidentally, and the loss of data is usually a serious business loss. Data storage needs to be designed to keep data stored securely and backed up.

  • Compliance: A number of regulations specify how an enterprise should store data, and for how long, so storage systems and protocols must be designed for compliance.

  • Productivity: Do stakeholders have access to the data that they need, when they need it? Convenient access to data is a key driver of productivity throughout the organization.

Ultimately, when it comes to data security, there’s a lot at stake, so it’s important to get it right by developing a storage strategy and architecture that map to your specific business needs.

What is the right data storage architecture for you?

How do you design your storage architecture and what are the best practices in storage? Start by examining the following questions to arrive at the answers:

  • What is the application, and how does storage need to support computing?
    Systems that have heavy loads and high speed requirements need storage systems that can support this. Cloud storage is often chosen as it offers elasticity and scalability, making additional processing power available when required.

  • What is the value of this data to the business?
    Here, two metrics, Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs), are often used to arrive at a data protection plan. RPOs define the maximum acceptable time of data loss, or, in other words, the maximum age of the backup. RTOs define how quickly you need to resume operation if the application goes down. If RPOs and RTOs are very short, then you may need to move away from scheduled backups and towards continuous replication of data. You may need to store your data in multiple data centers in different geographical locations.

    While planning backups, recovery or replication, you need to prevent redundant data from accumulating. You will need a process to identify redundant data and delete it without impacting operations.

  • Is the data structured, unstructured or semi-structured?
    Structured data is data organized into a fixed format. Data from enterprise systems, such as an ERP, HR Information System, or CRM, is structured data. Structured data fits well into a block storage system, in which files are split into evenly sized blocks of data, each with its own address but with no additional information (metadata) to provide more context for what that block of data is.

    For structured data analytics, a data warehouse is created. This data warehouse has a predefined structure and schema in order to optimize for fast SQL queries.

    On the other hand, unstructured data does not map well to block storage and is more suited to being stored in an object storage system. In object storage, data is not split into blocks. Instead, it is stored in an object that contains the data, metadata and a unique identifier. Object storage works well for data that keeps increasing, as the architecture can be scaled up and additional nodes can be added easily.

    Now instead of a data warehouse, a data lake could be created to manage structured, unstructured and semi-structured data. There is no predefined schema, and data is stored in its native format without limitations on size. Unstructured data analytics can be done using SQL queries, big data analytics and machine learning.

  • Which parts of the data need to be accessed and how fast?
    Today a number of organizations are adopting tiered storage solutions where different categories of data are assigned to different storage media based on the nature of data and access required. Tiered storage enables data to be moved between the cloud and other storage such as storage area network (SAN) or direct-attached storage (DAS) systems to optimize performance and investment.

    Data could be moved from fast arrays used for critical applications to slower, disk storage systems and eventually, tape as time passes and it’s no longer critical.

    You need to consider the different stakeholders who require access to data and manage permissions, as well as ensure availability of data when it is required.

  • What are the regulations that govern data storage? For how long does the data need to be retained?
    Governments and regulatory agencies stipulate how data collected by businesses should be stored, managed and protected, and for how long it needs to be retained. For example, the Sarbanes-Oxley Act requires public accounting firms to retain audit-related documentation for seven years. Europe’s General Data Protection Regulation (GDPR) law requires that personally identifiable information on EU users should be properly classified, protected and tracked. The Health Insurance Portability and Accountability Act, or HIPAA governs how protected health information (PHI) is managed and shared. The Payment Card Industry Data Security Standard (PCI DSS) stipulates how businesses should protect and manage credit card data.

    It’s important to know the regulations that apply to your business. You need to classify data to be able to manage how regulated data is being stored throughout its lifecycle. As many regulations require long-term data retention, you need to plan for infrastructure that will support this. You may also need to check the processes followed by your cloud storage service provider for compliance with regulations. 

Once you have considered all the above factors, you can create your data architecture. It’s now time to consider decisions regarding data storage infrastructure. Most businesses use a combination of on-premise and cloud storage facilities. What are your options and which of them will help you to ensure high performance and availability?

Data storage infrastructure:

Your data storage is likely to be an on-premise and cloud hybrid model. For your on-premise needs, options have evolved over the years.

  • All-flash array (AFA) or solid-state array (SSA) is a type of storage infrastructure that consists of only flash memory drives instead of spinning-disk drives.  A flash array can transfer data to and from solid-state drives (SSDs) much faster than electromechanical disk drives, so you get the advantage of speed and lower latency. Another advantage is that AFA includes certain native software services for data management and data protection. 

  • Hard disk and tape systems are still used extensively in data centers. Although slower than solid-state drives, they are still fast enough for a number of applications.

When it comes to cloud storage, you have an option between using a public cloud service or creating a private cloud for storage.

  • Public cloud: A third-party service provider hosts this storage platform, and you can use it over the internet. This model practically does away with capital investment in storage infrastructure and enables you to ramp up or ramp down computing resources based on business needs.

  • Private cloud: This is cloud storage on infrastructure that is owned by your organization, and is not a shared resource. The responsibility of setting up, maintaining and protecting the infrastructure lies within your own organization.

As we’ve seen, there are a number of factors that need to be considered when making storage related decisions, as well as a wide variety of infrastructure options. Storage needs change dynamically, making monitoring and managing storage a challenging proposition. Future developments in AI and IoT technologies will lead to massive amounts of data being generated, requiring storage to reach new levels of performance, availability, and reliability. 

Given these challenges, we are sure to witness more automated solutions for storage management, such as flagging inactive or less frequently used data for moving to lower storage tiers, distributing loads better across infrastructure, identifying non-compliance risks, or managing access rights. Data storage will continue to be a critically important aspect of business, and better hardware and software solutions will undoubtedly keep evolving.

Ready to Transform Your Business?