Tuesday, December 21, 2010

Weak current College】 【how to leverage data to improve the storage of classification accuracy? --- Power By】 【China power house network.



First, use offline storage to improve the life of the device.

Stored in the data on the tape device is generally likely to rarely used. As businesses need to database applications such as data backup software. You can back up data to a tape device. According to the practice, generally 24 hours a day on this tape device might only need half an hour. In this case, if the tape device is still 24 hours a day, then run is quite a waste of time. Not only a waste of power, but also affects the service life of the tape device.

But if in tiered storage in offline storage, then you can solve this problem. Offline storage for short, is the usual tape device is in a sleep state. When needed, you can set the tape device wake-up. And then to tape devices that read or write data. Wait until the operation is complete, the tape system will automatically be converted to leisure. Good design, you can set the tape device run time control within the shortest time, such as an hour. So you can improve the life of the device.

Typically, offline storage mainly used data on online storage for backup, to prevent possible data disaster. To do this in tiered storage, also called the offline storage backup level storage. Here it is important to note that offline storage media data in read-write is a sequential. When you need data to read, you need to strap a volume header, and then positioning. When you need to have written data is modified, all data is needed to override all. So offline mass storage efficiency was relatively low. But its biggest advantage is that you can to lower cost, mass storage.

To do this offline store does not fit all situations. Typically, offline storage used to store those infrequently used data, such as data backup, and so on. And in General in tiered storage combined with offline storage, used to improve the efficiency of the device.

Second, consider how effective data classification?

The reason why sometimes uses a tiered storage strategy for data can improve the good effect, but sometimes the effect without obvious? author believes that this is a very important reason is that data classification policy indicators are not appropriate. In other words, what Fish stocking ponds in which are stored in the system according to certain rules. If this rule is set incorrectly, if only according to the size of the fish, then the same ponds may have different kinds of fishes. At this point is not conducive to query and manage data. Visible, the classification rules is the key.

Recommendations, in the use of data classification storage products, it is best to select those based on many parameters of data classification policy.

In other words, depending on the data lifecycle, last access time, size, frequency of access, and so on more than one parameter to the value of the data rate. Specifically, the data rate needs to take into account are as follows.
One is the best in data creation will be able to determine the level of the data. Because if the data to create will be able to predict its access properties for the appropriate classification, you can reduce unnecessary data transfer without trouble. This means that in their daily management we can initiate a manual for some data classification, without taking the system for rating. As under the previous experience, you can predict user for almost one months of email data requires frequent access. You can specify, for the last month of data is specified as the level of data. But other time of the message data according to certain rules to let the system automatically. To manually specify the system automatically determine, often can play a good effect.

Second, according to data of static characteristics and dynamic features for classification can also serve to good effect. As you can to determine file is static or dynamic. One is based on the file system of static characteristics (such as file size distribution), the second is based on the file system of macro-access rules (such as size, file access times distribution), three are based according to the associated feature between access (like a job in a file is accessed, there may be other files can also be accessed), the fourth is the file access mode character (such as whether access limitations). In practice, according to these characteristics on file for manual collation. You can buy time, to determine if the system has a corresponding classification policy.

Third, how to effectively reduce conflict in the migration process?

Another feature of tiered storage is based on the data access levels, in different devices for data migration between. As in the beginning some data is stored on a hard disk or disk array. Later, tiered storage system found it already has nearly half have not been accessed, you can move data from your hard disk or disk array migrated to tape. This type of migration is also called a degraded migration (moving data from performance equipment to low performance devices). On the contrary, when a user at a time when frequently accessed is stored in the data on the tape, and tiered storage system will migrate the data from the tape to the hard disk or disk array, this is called upgrade migration (data storage device has a slow and low level storage device to fast storage devices or high level storage migration).

Here need to note that migration conflicts might occur. According to the author's experience, generally in degraded migration device input and output of conflict is not very serious. But the migration of the upgrade, you must consider the data migration caused by conflict I/Q. Because the test found that data migration happens, essentially correspond to the device I/Q most intensive time. For this reason in the hierarchy design, you must take into account, how can the greatest degree of lower data migration process for I/Q conflicts, reduce the number of home visits by other users. Now commonly used as a means to ensure that only tape and disk arrays for data upgrade migration. Because on the other hand, canIncrease the number of your hard disk to increase the throughput of data input and output, thus reducing the conflict I/Q. The reduction in the tape with a single transfer of data between your hard disk. According to the author of the test found that the timely performance of the highest hard disk upgrade in the event of a data migration process, inevitable encounter I/O conflicts.

In the data migration process, apart from considering data conflicts, you need to pay attention to the consistency of data. According to current technology, the most common means by read-write locks to ensure data consistency. System migration process for the current block applications read and write locks to ensure that the migration process and write the data consistency between processes. Typically, you also choose tiered storage products must take account of the parameters.

The above mentioned three content, not just when the deployment issues, and buying a product, or you can take these criteria to be evaluated.

No comments:

Post a Comment