The market and demand for commercial web archiving is growing at a phenomenal pace. From large conglomerate enterprises to small scale research and think tank firms are show more and more reliance on web archiving strategies to bolster their businesses.
Web archiving is a mechanism to collect and preserve chunks of relevant and context sensitive information for the purposes of future research, and data analytical purposes. The strategy that the web archivists generally employ is called web crawlers to automate the capture of massive amount of information for the relevant field of study. One of the largest web archiving organization is the Internet archive that uses bulk crawling approaches to archive the entire world wide web.
Web archiving is employed by many organizations for specific field of study to capture a huge knowledge repository of advancement pertaining to that field across the internet. For research purposes, national libraries, historians, scientists and researchers are the key consumers for archiving culturally important web content. Commercial we archiving strategies, on the other hand are employed by large organizations which require to archive content for their legal and regulatory requirements, for research and development and also to maintain corporate heritage.
However such massive amount of information is only as good as it could be re used for future purposes. Hence, effective we archiving should also include two key aspects, being readability and usability of the accumulated information and the speed at which data is made available. Let me illustrate further on these two aspects of web archiving.
Web archiving is memory hungry! Also, companies would not be keen to invest millions of dollars on collecting information unless it is not readily reportable. Therefore modern state of art commercial web archiving solutions should not only be installed on servers having hundreds of terabytes of disk space but the machines also need to have significant RAM capacity to churn out data at a rapid pace and usually at an on demand basis.
Its worth noting that not only such servers will need to have enough memory to contain the web data but also a lot of memory bandwidth to allow indexing of the stored information to generate responses to data queries. Indexing of information consumes a lot of memory grunt as it happens on a near real time basis.
Coming to the availability of the data for reporting purposes, the objective set forth by organizations investing on commercial web archiving is usually that data should be available anytime and anywhere with reliable speed and accuracy. Simple as this request may seem, it comes at a cost and often a significant one. Many modern commercial web archiving solutions offer cloud based solutions with multiple servers scattered across the globe at different locations strategic to the sponsoring organization. This ensures that there is minimal delay in information fetch as the database requests initiated by the end user is smoothly redirected to the nearest available location.
Often small and medium sized business find such costs and associated overheads unaffordable. This is why several commercial web archiving providers have opened the channels to SAAS or Software as a Service wherein all the strategic capital (servers, operation and maintenance, etc) is borne by the provider. The customer, commonly known as the access seeker or the subscriber pays a monthly fee to subscribe and have access to the providers infrastructure.
The pricing for such SAAS model varies a lot based on the customer requirement and the field in question for which the customer is seeking information. The more niche the area of specialized information is, the more expensive the services usually are. Typically the subscribers pay a flat base rate regardless of the usages and then an overhead on the bandwidth consumption. Such structures are agreed at a per licence level wherein one licence could you used only by a particular user from the customer party. Some providers may also offer a simplified flat fee or a pay as you go types subscription.
Even though such commercial web archiving models may be complex to work out, it is a very popular option among companies which are reluctant to invest a fortune in such start up capitals. Also for large enterprises, usually the providers offer a discounted rate per license under the assumption that there would be many users from that account.