Archiving101.com; in depth no nonsense information about archiving and related technologies.
27th August 2007

While protecting yourself, you can reap business benefits

Computerworld posted an interesting article on email archiving last week with a nice quote of me in it as well.  The article can be found here:

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9032621

posted in Uncategorized | 0 Comments

25th August 2007

Why are PST files bad and how can a bad consolidation give you headaches?

 The PST migration utilities that archiving vendors pitch is another one of those slick highly marketed features that tries to make you believe rescue is near.  For those that haven’t realized it yet… PST migrations even with the most advanced tools will take a long time.  Any vendor that claims that their PST migration stuff is better than sliced bread should be red flagged. 

Here is the scoop … it is the truth that these utilities will save you an enormous amount of time compared to the other option you might have which is searching the network and manually import the PST  back in the Exchange Mailbox.  Many of these PST file ingestion tools are highly advanced and allow for some serious tailoring to your environment… others… well… they are not much more beyond a login script. 

One of the options that some of these tools offer is bringing these PST files into the Exchange Mailbox and let the stubbing take care of the rest , or it would bring the data back into the archive directly and create stubs in the mailbox.  As with one of my previous posts … I’m convinced that the legacy way of stubbing is a bad idea and with the table below I’d like to point out a few areas of concern.

I’m taking a stub average of about 4k which is in the middle of the road from my experience… some vendors stubs are slightly larger .. Some slightly smaller.

GB           GB toKB    KB average  stubs      Total size
1             1,000,000         10         100,000      400MB
1,500       1,000,000         25          40,000      160MB
1,500       1,000,000         50          20,000        80MB

 So in the worst case scenario you brought  1GB of PST data in the backend, but still added about 400MB of stubs to the mailbox and with a mailbox server that would host 1,000 mailboxes you just increased your EDB with 400GB  .. so much for ‘storage savings’.   What in my opinion the best way to ingest PSTs is not to create the stubs in the mailbox considering what the alternatives are.  The archive provides a much more superior way to search the content. 

For a proper PST migration I would suggest the following:

  1. First scope out the size of the problem.  Most likely you will have far more data out there then you expected
  2. After scoping out the problem you need to ensure that you have allocated enough storage on the archiving system to ingest the data.  Most likely you will gain some serious storage gains with SIS from bringing the data back in.
  3. Setup a plan for which data you want to capture first, most likely this is the data that is stored on the file servers.
  4. Ensure that the PST files are going to the right account, after all it would be a bad idea if the CEO’s data ends up in someone else’s archive
  5. Have a plan ready on what to do with orphaned data.  You’ll need to talk to your HR and legal team about this.  A suggestion is to create a bulk historical archive account where you store this data in for reference.
  6. Notify those users who use password protected PSTs to either give their password or to remove it
  7. Plan for people who work remotely. Maybe setup a meeting when they get to an office, after all no one would be happy to see a 2GB PST be uploaded over their DSL line.
  8. Block PST file creation when you are done importing them. 

There will be points that I’ve missed here and if I do .. let me know.  The rule of thumb is that you can ingest at max about 2GB per hour into an Archiving system. Some could do more when you run multiple threads.  You do the math on how long that ingestion will take .. don’t be surprised if this takes months in a reasonably complex environment. Now why are PSTs bad? (from the Exchange FAQ at http://www.swinc.com/resource/exch_faq_appxf.htm 

Reprinted by permission of the author (Ed Crowley). Items 11, 12 and 13 courtesy of Stephen Gutknecht. Based on input from the many PST=BAD proselytizers in the Exchange Discussion List and personal experience. 

  1. They’re fragile, especially as they get big. They get corrupted too easily. Users aren’t the best at ensuring that their systems are properly shut down. 
  2. You have to run the Inbox Repair Tool on them way too often. 
  3. Your users don’t back them up. Presumably you do back up the server. 
  4. Your users don’t compact them. They just get bigger and bigger. 
  5. Your users forget their PST passwords. Even though there are unsupported tools to crack them, it can take a significant amount of time to do so. 
  6. You lose single instance store (SIS).
  7. Messages take up more space in a PST than in an Exchange store.
  8. It’s simply nuts to store PSTs on a network drive. They just end up taking up more space. Is disk space on your file server cheaper than disk space on your Exchange server?
  9. One might think that it will be easier to restore a single mailbox by using server-based PSTs. However, with proper implementation of the Ed Crowley Never Lose a Mailbox Procedure, it should never ever be necessary to restore a mailbox.
  10. For road warriors, OSTs are a much superior storage technique, especially with the improvements made with earlier versions of Outlook. They allow untethered computing at a higher level than with PSTs, plus with the added security of a backed-up information store on the server.
  11. A PST can be opened by only one machine at a time. This precludes a manager and assistant from working from the same PST simultaneously, and precludes team access.
  12. You cannot use Outlook Web Access to read your downloaded messages.
  13. Future applications, such as unified messaging, will be poorly implemented when using PSTs. Groupware applications that work with the mailbox probably won’t work at all. 
  14.  PST files are not secure. Anyone with access to the PST file can open it using the right tools.
  15. You cannot clean up PST files after virus infestations.

Why PSTs are good

  1. They’re just about all you have when using a POP3 mail source. (We maintain that use of POP3 in an enterprise, unless that’s the only client available, is a reflection of administrative sloth.)
  2. They’re useful as an archive for those who simply can’t ever delete a message, as long as the user understands that they could lose all their data, and as long as they keep it on their local hard drive.

posted in compliance, competition | 1 Comment

22nd August 2007

The value of email

As you probably know I follow the data storage, archiving and eDiscovery industry pretty closely and I’d like to explain in a little bit more words the future of archiving systems from what I posted earlier this week.   Almost all archiving vendors like to put ‘fear’ in their selling tactics.  “If you don’t store your email you go to jail” or “Be aware of any liability in your data” and more of those things.  They make it sound like it is bad to keep information and that keeping any sort of information is going to cost organizations money or make people go to jail.

 First of all … it is of my opinion that ethical companies have nothing to worry about.  Second of all the lifecycle of archiving systems has now slowly reached stage 4 or its maturity.  Organizations are starting to look and use classification and monitoring systems to ensure that only proper content is kept.  Monitoring systems (from companies like Proofpoint and Vontu) enable organizations to take proactive measures against their business processes, whereas eDiscovery is post mortem.  

Stage 5 of the evolution means that organizations should leverage the data that they have stored to streamline their business processes and that they should leverage the data for that.  There shouldn’t be fear in keeping information.

posted in compliance, eDiscovery | 1 Comment

20th August 2007

The evolution of Archiving systems

I thought this was worth sharing.  The following table shows the likely 5 stages in evolution that archiving systems will go through.  Stage 0 was when the first systems came to market for storage management.

 

Stages

Business Drivers Organizational Response IT/Storage Implications
Stage 1 Plaintiff attorneys look for the smoking gun in an organization’s unstructured data and realize email is particularly vulnerable. Response to litigation is generally reactive and ad-hoc. Limited systematic approach for records retention, e-discovery and litigation holds. Technology strategy driven by business needs rather than risk. Limited interaction between legal, records and IT. Storage administration in “react mode” to respond to email discovery requests. Archived records exist in hardcopy, microfiche, user hard drives, .pst files and tape.
Stage 2 Businesses realize email is largest risk and initiate email archive strategies.   Focus on email and IM archiving and management. Begin to establish records management policy and procedures and structured/repeatable approaches to e-discovery and litigation holds.   Legal, records and IT begin working together to influence organizational technology strategies. Storage administration has implemented email archive systems and begun to eliminate .pst creep and consolidate archived email data. Information risk management begins to negatively impact information value by imposing constraints on flexibility, reuse and distribution of information.
Stage 3 Businesses realize information risk must be viewed holistically across all unstructured information to address to address inefficiencies and weak links in capabilities. Focus on unstructured data broadly and begin building approaches to manage unstructured data in email, IM’s, IP, voice mail, file shares, document mgt systems and collaboration tools. Established records management policies and procedures, early automation of records management for unstructured data. Structured, repeatable and measured processes for e-discovery and litigation holds. Full legal, records and IT partnership and early enterprise governance. Information risk continues to negatively impact value of information by imposing constraints on flexibility, reuse and distribution. Best practices from email archiving being applied to unstructured data more broadly but challenges remain including automated data classification and segmentation.
Stage 4 Businesses recognize constraints and limits on information value imposed by lack of classification and segmentation.   Initiate implementation of auto-classification of unstructured information which leads to improved automation of records management. IT leverages classification technologies to begin proactive monitoring to prevent the creation or distribution of documents that violate company policies and procedures. Full governance around information risk in the context of enterprise risk. Begin leverage of auto-classification to automate and reduce constraints on flexibility, reuse and distribution.
Stage 5 Business demand true information lifecycle management (ILM) capabilities where information risk and value is intrinsic to all applications and data. Organizations have fully implemented and integrated auto-classification into their technology stacks to drive automated records retention, e-discovery, legal holds, knowledge mining, sharing and reuse.   Information risk is managed intrinsically and fundamental to legal, records and IT management. Risk is no longer a constraint on flexibility, reuse and distribution. Information value is fully unlocked without increasing risk or costs (nirvana).

Likely Development of Archiving & Retention Systems within Organizations (Source: Wikibon.org)

posted in Uncategorized | 0 Comments

15th August 2007

The Death of Store Management?

Store Management, Stubbing, Archiving, Mailbox Extension … this feature has many names and each archiving vendor has their own claim to fame on this.   The principle is easy… you take a message that is in an email database… and replace it with a shortcut to the archived item.  The idea behind this is that the end user can still open up the older larger messages in his mailbox while the mail administrator gets the most bang for the buck on his store and reduces backup/recovery time.  Sounds good right?   This principle is what archiving systems are building on … this is the one principle that made these product rise to great heights and created the marketing buzz. 

 Well… in my opinion this whole feature set has reached its maximum life expectancy and its time to kill it. To prove my point I’d like to go back in history.  In the mid 90s people were using Exchange Server 5.5 (a good mail server for its time), 9Gb hard disk were awesome, 16Gb mail stores maximum, tape backup and a 5-10Mb mailbox limit.  With those limitations it is perfectly understandable that organizations would like to squeeze out the last byte out of their storage capacity.  In 2007 I can go to CompUSA and buy a 1.5TB NAS drive for about 400 US dollars, Exchange 2007 is available in 64Bit, backups are done to disk and people have mailboxes that are over 100Mb.  I’ve seen archiving vendors products stub 2kb messages with a 4-6kb shortcut and therefore actually increasing storage needs.  Stubbing also takes a hit on your I/O and CPU plus it contributes to database fragmentation.   I understand that vendors might cling on to stubbing out of sentimental value, but it truly doesn’t offer the benefits anymore that existed in the early days.

Here is my take on this:Organizations could greatly benefit from much easier policies:

  • Take a copy of the messages that are in the database and that are send and received… store them in an archiving server with appropriate policies

  • Leave only 6 months of messages in the Exchange Stores

  • If you really want to gain storage… stub only the attachments… leave message bodies alone

 There you go… no need for cleaning up stubs, I/O hit .. It’s easy for end users to understand where their data is, less helpdesk calls to find data that could be in either location.   

posted in competition, history | 3 Comments

14th August 2007

Wikibon

I recently decided to join the Wikibon  project. Wikibon is a worldwide community of practitioners, consultants and researchers dedicated to improving the adoption of technology and business systems.  I’m particularly interested in the Archiving piece of wikibon and am extremely impressed with the contributors to this project.

 My goal is to regularly contribute to the project as it would help create some sort of guidance in the industry in a similar way “The Sedona Conference” created their guidelines for eDiscovery.

posted in Uncategorized | 0 Comments

9th August 2007

Vendor Selection Part 2 - Financial Stability

This is part two of the vendor selection series.  After I discussed data ownership the financial stability of the vendor you might choose or at least put on your short list is also important.  The reason for this is that archiving and compliance products will become a vital part of your organization and you preferably would plan to keep that system around for quite some time.  Moving from one vendor to another, as I briefly discussed in the earlier post, is a lengthy and costly exercise.

 Financial stability is key .. you want a vendor that will be around for some time .. that way you won’t end up with a system that might have all your organizations IP and compliance information in it without any support.  An example of an archiving vendor who’s future I think is really grim and one that is ready for the archiving graveyard is AXS-One.  AXS One was placed in the Visionary quadrant by Gartner earlier this year, however their financial record has been extremely rocky.  Last week they published their financial numbers and they were pretty shocking.  In short the company has only 2 quarters worth of cash left, only had 500k worth of new revenue and has from what I’ve seen a pretty grim record on profits.  If I were putting together a short list of vendors for my archiving or compliance project .. placing your bet on a company like this is extremely risky. 

So … one of the questions you should ask your vendor :  “Are you going to be around for a little while?”

posted in vendor selection, financial | 1 Comment

8th August 2007

Arkansas Court: Content Analysis should determine if emails are public records

Last week the Arkansas’s high court ruled that a “neutral court” should use content analysis to determine if emails are public records or not.  This specific and in my opinion controversial ruling came on a Freedom of Information Act case reported earlier by the Death by Email blog.  The ruling is connected to the case of former Pulaski County Comptroller Ron Quillin, who was accused of embezzling $42,000 while being in office.  During the investigation of Ron Quillin, emails were found during the eDiscovery process that were reportedly “very personal and graphical”.

The defense claimed that the letters were personal and should be considered private.  The prosecutor however claimed that the emails were on a government computer system and, therefore, would be public records.  As expected, an Arkansas judge ruled that the act of sending an email to a government email address means that there is “no expectation of privacy.” 

However, last week, the state high court decided to have their say on the matter.  In an in my opinion shocking ruling it ruled that personal email messages stored on state-owned computers should be reviewed first by a “neutral court” to determine whether they qualify as public records and are subject to the state Freedom of Information Act.  The full ruling can be read here.

The ruling is based on the court’s interpretation of Arkansas’s public record law which states that a public record is one “that constitutes a record of the performance or lack of performance of official functions that are or should be carried out by a public official or employee.”  The Arkansas’s high court also noted that with the large amount of employees using computers at work for personal email, such correspondence on public computers does not automatically count as a record of the “lack of performance of official functions.”

The court then went on to adopt a ‘content-driven analysis’ to determine if email messages on public or government computer should count as public records. The case was then sent back to the lower court to review the emails of Ron Quillin in question and determine if there was a relation between the emails and the officialls activities.  More can be read here.  

posted in compliance, eDiscovery | 0 Comments

7th August 2007

Vendor Selection - Data Ownership

This is the first part of a series that I will discuss over the next week or so on what critical items companies should look at when creating their shortlist of vendors. 

The first one is ‘data ownership’ … I’ve tried to find a better word for it, but this seems to cover the bases.  Generally when a customer selects an archiving or compliance product he chooses one for a long period of time.  It is extremely complicated to move from one vendor to another, not because of technology, but because of the volume of data.  However it is of my opinion that the data that is stored in an archiving system belongs to the customer and not to the archiving vendor and the customer should ALWAYS be able not only to access, dispose but to bring back the data to its native format where it originated from.

And here is the catch.  Not all vendors allow customers to export information back out of the system in a relatively easy way and in fact EMCs EmailXtender is the most notorious one of them all.  I’ve had over the years customers almost desperately plea for help on getting their data out of EmailXtender so that they could move to a new solution.

The most common available format for exporting information is PST.   This seems to have become the email archiving’s industry XML format .. it is compatible with almost all vendors as far as ingesting and export.   Don’t underestimate the time it takes to bring the data to your new system though.  The general rule of thumb is that export is about 1-2GB/h.

posted in vendor selection | 0 Comments

1st August 2007

New storage rules may complicate records management

This article was posted last week on computer weekly.  Two new bills have been registered in the US in the House of Representatives and are waiting for debate. These bills are currently known as H.R. 4127 and H.R. 3997.   Both bills were introduced to federalize data breach laws that are already in effect in some US States (the most famously known is California’s SB1386). 

Tagging along with these laws are some provisions for individual data privacy that I would welcome for the archiving and compliance industry. The best matching equivalent to the EU standard for data archiving in the US is the HIPAA act.

What is of most interest in the EU is that these European laws require end users to “opt in” to email archiving and that they can demand that certain items be deleted from company archives.  This is why the archiving industry is complicated, for instance .. would this opt in apply only to data that is stored on EU soil ? What if the company uses a centralized storage repository in North America?  Administrators will have to keep up to date with laws and regulations that apply to the industry their employer is in, but isn’t this going to be too much to ask, do we want lawyer/administrator hybrids ? 

posted in compliance, eDiscovery | 1 Comment