Friday, November 16, 2007

How to handle a Data Loss Situation with Servers/ Raid systems

A data loss situation is usually characterized by the sudden inability to access data involving a previously functioning computer system or backup or the accidental erasure of data or overwriting of data control structures.With this blog I will try to shed light on data loss within servers and raid environments and give recommendations what to do.


Server/Raid Data Loss situations

The symptoms related to physical or logical issues can be many. You can have a sudden server crash during operation or power up. Also it is possible that one or two hard drives are offline without any warning in advance.In another situation, you can have a Raid controller failure that makes the hard drive(s) inaccessible. Depending on the kind of (raid) server that you are running it can make definitely things more complex to find the right solution.

From a simple looking Windows 2003 server (with one hard drive) to complete server racks (Raid 1, 5, 1+0) it all depends on what kind of system you are running that reveals how serious you data loss can be. Every different server systems comes with their own diagnostic programs and there are several ways possible of communicating with the hard drives. Also there are various settings that can be used to analyze the hard drives and raid controllers. Some diagnostic programs can read/write on the hard drives, others have options to only read from the hard drives.

Find out together with the provider of the server/raid what you can do and cannot do when analyzing your raid system. The last thing to take into consideration is to think about the logical structure of your data. A UNIX system manages the data differently than a Window System. Especially when you are reconstructing or rebuilding a Raid/Server this might be the difference if the data can be recover or not be recovered by a data recovery company.

Recommendations: what you can do!

Always remain calm and see if you have a contingency plan in place, if not follow a step by step procedure where you do not alter anything on the server without first having talked to a specialized engineer(s).
  • Try to access any log files that exists on the Raid server ·
  • Try to determine the scope of the damage ·
  • Try not to restart the Raid server too many times ·
  • Try not to restore any back up. ·
  • Try to get a 360 view of your possibilities (solutions overview) ·
  • Try not to go for a quick fix because of time purposes/ pressure ·
  • Try not to directly the implement /recommendations solutions given.(never on the damaged live system).
  • Try not to rebuild the raid configuration if there are any doubts (this must be a last resort)
Before implementing a solution
Think if the solution you about to implement can be made undone afterwards. Once the solution is being implemented you do not often have a second chance to try something else. Let experts guide you through the whole process and you can bring down the risk of never be able to access your data again. In a worst case scenario start making images from all other drives that you think they work and try any solution with those hard drives. In that case you are not working with a live system and you always can return to the original situation.
Recommended people you should have around you in a time of crisis:
  1. Local provider of the server (technical engineer)
  2. Provider of the Raid Controller
  3. A data recovery provider
With this blog info I hope you always can reduce the risk of having a data loss witin your server environment. Be calm and try to be as analytical as you can be. Take calculated risks and be careful because sometimes the tiniest thing can be overlooked and can have a huge impact in the whole data recovery process










2 comments:

Unknown said...

For complex data loss situations in RAID setups, it is better to leave it as it is and contact some data recovery experts. RAID Recovery is the most complex and complicated form of recoveries; I think only recovery experts can handle them better. The precautionary measures after a data loss scenario provided in the article are very helpful. However, the actions taken after data loss also determine the chances of data recovery.

Jacob Ekker said...

Nelson,

I think you are right, but dealing with a situation where panic, emotions and feelings sometimes take over its better that the person responsible for servers, raid realize what he is doing.

Growing awareness of what you can do as a IT manager was the input of this story.

Of course however you should contact a data recovery expert immediately but unfortunately this often happens too late. Also this article is to create awareness for data recovery so they are called in at the first stage of such as data loss scenario

If you have any more comments please make them so we can discuss this furter

Kind regards,
Jacob Ekker