Home > Blog > Catalogic Software Blog

Catalogic Software Blog

Join the conversation with Catalogic Software. Our industry expert bloggers bring you the latest insights and most relevant topics in data protection today.

New at Catalogic Software

Syncsort Data Protection is Now Catalogic Sofware

Looking for old blog topics?
Please visit the Syncsort Blog.

Is your Enterprise data broken down into different categories? Like most companies, the bulk of your data is visible to your employees internally while a smaller amount of it is massaged into projecting the company view externally. The external corporate data includes financials, marketing, supply chain EDI, website information and more. This highly visible data is analyzed and reported on regularly, but your Enterprise data is similar to an iceberg. Your corporate data is visible externally but more than 80% of your Enterprise data is under the surface.

So what is the Enterprise data that is below the surface in your organization?

  • Corporate data is visible to the board, executive management team and external audiences.
  • Business unit data is mostly below the surface and generally not exposed to external communities. It helps each business unit plan their investments, manage their key performance indicators and track their financials.
  • Principal data such as supply chain pricing, business partner relationships, wages, legal agreements, IP internal research data, business metrics, real estate etc. is below the surface.
  • External Connected data from suppliers and research is below the surface
  • Meta data about the data is below the surface
  • Copy, backup and archive data, which can be a considerable amount, (up to 60%), of the total data capacity, is below the surface.
Because of the visibility of corporate data, a large portion of an IT budget is allocated to serve up this data quickly and ensure that it is protected. No one wants to be in the way of decision making and the external presentation of the data which impacts company performance and stock price, so this is entirely understandable. 

But what about the rest of your enterprise the data? How important is it? How should it be protected, and how quickly does it need to be recovered? One way companies try to manage the bulk of their data is to reduce their data sets. To do this, they employ data services like compression and deduplication. These technologies can certainly reduce data footprint, but have no knowledge of the content and the relative importance of the data. This is why it is increasingly important for companies to understand their data taxonomy, establish data management policies and build an enterprise data catalog.

This can be a difficult task. Establishing a corporate data governance plan designed by the executive management team to understand the importance of all data to their business and the significance of having it well managed takes time and senior management commitment.

In order to facilitate this process, IT departments should start analyzing and cataloging data within departments to make better decisions about how to store, classify and protect it. Implementing catalog management software will enhance your ability to efficiently manage data and improve the flow of information within the company towards corporate data governance.

Imagine if you had a catalog of every file and snapshot that enables you to pinpoint areas of storage inefficiency. You can easily identify unwanted objects such as old files that have not been accessed for a specified period of time. Find extremely large files, file types that may violate storage policies (video, music), unused virtual machines and so on. You can begin to manage inefficient copy data that stretches across many departments. With this information, you can effectively manage utilization and improve compliance and DR policies across your company.

Once deployed at a departmental level, having an effective data catalog can be expanded across the enterprise. If your company data resembles an iceberg, then it may be time for you to deploy an enterprise data catalog.
Posted: 5/19/2014 9:34:54 AM by Kristin Baskett | with 0 comments

I recently consulted with a customer with 8 NetApp FAS 2000/3000 storage systems with 500,000+ files spread across 4 locations.  They wanted to improve their storage efficiency and data protection operations across their company.  With data growing at 40-50% per year, it was becoming too difficult to determine what data was current and how many copies were kept across the company.  Their NetApp administrators used NetApp SnapMirror and SnapVault tools to automate data replication and backup, but meeting backup SLA times were becoming a significant challenge.  In addition, they needed to better understand their storage consumption. We recommended that they deploy Catalogic ECX, enterprise catalog software.  It searched across their NetApp filers and cataloged all of their files and VM objects.  Several departments across the company had copies of the data, some with more than 20 snapshot copies that were automatically scheduled.  ECX was able to identify them quickly showing what type of files, when the files were copied, how large the files sizes were and how old the copies were. Having a summary report allowed the administrators to simplify his backup, archive and DR policies to save time and meet required SLAs.  They were able to identify that 68% of their files were important and required meeting their backup SLA, while 32% of their files were deemed as “old”and no longer requiring snapshot copies. Those files were archived. This customer is well on their way to seeing storage efficiency improvements by 30-40%  More and more customers should deploy ECX to improve their data protection management.
Posted: 4/10/2014 4:18:34 PM by Kristin Baskett | with 4 comments

A data catalog is an invaluable asset to any organization. Let’s take a look at what an effective data catalog achieves before identifying the top 5 reasons to implement a data catalog.
Like any organization, you have growing amounts of data that are constantly changing. New data is generated every day and is spread across multiple heterogeneous servers, virtual machines and storage systems. Depending on the size and reach of your organization, this data may be housed across multiple physical locations. How much do you know about the nature of your data?
How many copies do you have? What are the top 100 files in terms of size? How long has it been since these files have been accessed? Are the critical files and applications protected? What files should be archived? These and many more questions are imperative for implementing effective data management and controlling storage costs.
“A catalog should provide you with an accurate and comprehensive view of your data across the organization. It should provide you with a searchable repository of metadata for improved data management which leads to increased business efficiency.”
Here are the top five reasons you should be deploying a data catalog:
  1. Visibility - You cannot manage what you cannot measure! A global catalog of data enables a better understanding of how data flows within your business. This facilitates better decision making. You should be able to BROWSE, SEARCH and QUERY data across your entire organization, similar to managing a database. Achieving visibility across your data enables you to build a dashboard of elements to monitor and improve data management. There are many reasons to query your catalog. I have provided a few examples below:

Do I have data that is not being backed up?
Do I have data I am managing that has not been accessed for years and is not part of my data retention policy?
How many VM’s do I have across my IT environment and how fast are they expanding?
How many copies of a particular file do I have and where are they located?
Who has access to my secure files and is there any data leakage across the company?

  1. Utilization – How efficiently is your data being stored?  It is quite possible that the largest drain on your storage capacity is an excess of copy data. Many primary storage and backup products provide deduplication functionality. However, they tend to de-dupe and store data within siloes. You may have de-duplicated data on one storage system, but have unwanted duplicates residing on another. Do you have data on primary storage that should be in an offline archive? Not effectively migrating data through its lifecycle is another way to waste expensive primary storage. Creating an enterprise data catalog enables you to query for duplicate files and locate them across your organization. With an enterprise data catalog you can eliminate unnecessary files, free up capacity and reduce storage costs. You can also query files for age and when they were last access to determine which data should be protected and archived.
  2. Security – Is your data properly protected? Is it housed securely within the right location? Do the right people have access to it? These are common concerns for IT departments. What if you could receive a simple report showing all of your unprotected files? What if you can easily determine if any copy data leaked their way onto unauthorized network segments? With increasing security concerns, a data catalog is a necessary tool to add to your security monitoring solutions. If the NSA can have documents stolen by a contractor, it can happen to anyone. Eliminating data leakage improves data security.
  3. Compliance – Each industry has their own special compliance needs and regulations which impact many aspects of your data. Most industries have strict governmental regulatory compliances such as HIPAA within healthcare, FDIC & SPIC within banking & finance and Sarbanes Oxley (SOX) within public corporations. How long should you keep various types of data? Who should have access to different classes of data? How well protected is your data? How quickly can you retrieve your data for eDiscovery? Which data should be destroyed and when? What are your data privacy requirements? A data catalog becomes a requirement to ensure compliance and governance.
  4. Efficiency – You may already have processes in place to handle security and compliance, but how much easier would those processes be if you had a global data catalog? The more efficiently you can profile, collate, manage, browse, search, query and report on your data, the more time you can spend on other projects that drive your business forward and improve productivity. There are real cost savings from implementing an enterprise cataloging solution as part of your IT data protection and data management strategy.
Is it time for you to get your data under control? Creating an enterprise catalog of your primary and copy data is a great place to start!
Posted: 4/1/2014 8:59:56 AM by Kristin Baskett | with 285 comments

A foodservice industry company was facing the challenge, just as many growing companies do, of having to continually expand its IT infrastructure to meet data growth requirements. Starting with 100 TB of primary capacity supporting critical applications such as SQL databases, email, financials, and payroll applications, the company recently added 120 TB of new primary storage and 220 TB specifically for data protection. The company had already made a significant commitment to NetApp for its storage needs and most recently purchased a NetApp Cluster Data ONTAP (cDOT) primary storage system and a secondary ONTAP system for data protection.

The foodservice company’s previous data protection solution was local, tape-based over a 1Gb network, and involved a lot of trial, error, and time to create complete accurate backups. Some restores took days, depending on the amount of data and type of applications.

Catalogic DPX™ enabled better data protection and sped the migration to NetApp Clustered Data ONTAP. With Catalogic DPX, they now have the ability to centrally manage their data protection process and perform restores in less than an hour.

See how this company succeeded with NetApp’s latest storage architecture, cDOT, and Catalogic DPX, a high-efficiency data protection solution.

Posted: 3/19/2014 11:36:25 AM by Kristin Baskett | with 13 comments

Sometimes nothing enables you to appreciate a problem like a real world experience. This week I had an incident with my backup of my home data. I have nothing too sophisticated at home. 3 PCs might be online at any one time but that is really it. Still they contain data that is all too valuable to lose. Work documents, 20 years of financial records, photographs, emails etc. Over the years I have used various methodologies to keep everything protected and have not lost any data in 20 years. This week though a potential disaster struck. My backup server failed and it is scrap. Just to clarify, I have no data in the cloud. My wife is not comfortable with it, and that is the end of that. I will not get into the details of what happened but the server is non recoverable and the data is lost. I am of course lucky enough to have my primary data still intact and my local backups for each PC still in place.
Of course I now had to decide what to do to replace my current backup system, and looking at the infrastructure I realized I had 14TB of backup disk in my current infrastructure for 3.5TB of primary disk in my PCs. On the 3.5TB of primary disk I had approximately 1.5TB of unique data. That is nearly a 10:1 ratio of backup capacity to actual unique data. Of course my problem was I had multiple copies of everything. Too much copy data!
I have a small home system, but in business, copy data can be a killer when it comes to backup and recovery times, and the cost of the total infrastructure.  In IT organizations, data copies are made constantly by procedures such as snapshots, backups, replication, DR processes and test and development. This is made even more problematic by the silos of infrastructure that are involved in protecting and managing the data and each creates their own copies. Indeed IDC has stated that some organizations keep over 100 copies of some forms of data. 

There are of course technologies attempting to address the issue. Storage vendors may offer zero footprint snapshots and clones. These are read/write virtual volumes that can be instantly mounted to bring data back online. Storage and backup vendors are delivering deduplication technologies to consolidate the data back down at the time of storage or backup. The problem is that each of these is also a silo and may not take into account data in direct attached storage.
One way you could effectively manage your data is to centralize your snapshot infrastructure onto a common target with the ability to quickly mount a snapshot clone from the centralized data. Even better is if this infrastructure has the same support processes, methodologies and toolsets that you currently use for the management of your primary data. Using this methodology, you can get near continuous data protection with extremely fast recovery times, improving both your RPO (Recovery Point Objective) and RTO (Recovery Time Objective).
Recently, NetApp introduced clustered Data ONTAP (cDOT). With the scalability it offers you can effectively consolidate much more of your primary and backup storage infrastructure in a single system. cDOT enables consolidation of 10’s of Petabytes of data and thousands of volumes, onto a multi-protocol clustered system that provides non-disruptive operations. With the recently released NetApp cDOT 8.2 you can support up to 8 clustered nodes in a SAN environment and up to 24 in a NAS deployment. This delivers up to 69PB of data storage within a single manageable clustered system. Of course you can also build smaller clustered systems.
Catalogic DPX not only delivers rapid backup and recovery with data reduction, helping solve your copy data problems with near continuous data protection. DPX also facilitates migration of both virtual and physical infrastructures to new cDOT-based systems by snapshotting and backing up your servers to an existing NetApp system, and then recovering them to a cDOT system. This offers considerable time savings over traditional migration methods. DPX will take care of the disk alignment, and can also migrate iSCSI LUNS to vmdk files on the new clustered storage.
With careful planning and the right infrastructure you can consolidate storage management and get copy data under control. If you don’t, you risk spending more on infrastructure, and wasting valuable time managing duplicate data.

See how one Catalogic customer migrated to a Clustered Data ONTAP infrastructure using DPX.

Take a look at the Catalogic Solution Paper, “Solving the Copy Data Management Dilema”, for more information.

Posted: 3/11/2014 2:50:57 PM by Kristin Baskett | with 3 comments