We have seen for the last few years data mining has emerged as a field of importance in the information science field. Information exploration has been always a threatening issue for researchers. Institutions now have change their focus towards explorations and exploitation of information rather than collecting and storing them. Thus technology helps in the following way to access, analyze, summarize, and interpret information intelligently and automatically. Data mining is defined as “the process of exploration and analysis of previously unknown, valid and actionable information by automatic and semiautomatic means from large quantities of data in order to discover meaningful patterns and rules”.
Need of Data mining:
Data mining represents the most effective way to close the knowledge gap. It helps decision-makers extract intelligence from a huge amount of data for better understanding and leverages the data warehouse. It provides the advantage of strategic knowledge in a fast-paced, changing information marketplace.
Steps of Data Mining:
The steps involved in the process of extracting hidden knowledge from a data warehouse. an information file or any other databases are enumerated below:
- Identify the objective: The first thing is, be clear on what is to be accomplished with the analysis. Know in advance the goal of the data mining. It is established whether or not the goal is measurable.
- Select the Data: The next step is to select the data to meet this goal. This may be a subset of the data warehouse or data mart that contains specific information.
- Prepare the Data: Once the data has been assembled, the attributes to be converted into usable formats have to be decided.
- Audit of Data: Evaluate the structure of the data in order to determine the appropriate tools. Balance the objectives assessment of the structure of the data against users’ need to understand the findings.
- Select the Tools: two concerns drive the selection of the appropriate data mining tool- the organization objectives and the data structure. Both should lead to the same tool.
- Format of Solution: in conjunction with the data audit, the business objectives and the selection of a tool determine the format of the solution.
- Construct the Model: The data mining process begins at this point. Usually here it is to use a random number seed to split the data into a traini9ng set and a test set, and then construct and evaluate a model.
- Validate the Findings: Share and discuss the results of the analysis with the organization or domain expert. Ensures the findings are correct and appropriate to the organization goal.
- Deliver the Findings: Provide a final report of the organization or client. The report should document the entire data mining process, including data preparation, tools used, test results, source code and rules.
- Integrate the Solution: Share the findings with all interested end-users in the appropriate organization units. The results of the analysis may have to be incorporated into the company business procedures.
Data mining in Libraries:
As per fifth law of library science “Library is a growing organization” so the volume of the library data is also growing at an enormous rate. For efficiently and effectively doing the library administration and extending library services the need of library automation and e-Library occur. But simply automating the library or developing an e-Library is not the only solution unless and until we are not able to explore the hidden information from the large amount of database. This can be done by applying the data mining in the library data in the following ways:
- Classification - By using data mining we can develop a computer program that will replace the manual classification with the automatic classification of library contents. Classification mimics library cataloging procedures by grouping structured and unstructured data according to certain criteria such as source (e.g., government bodies), document type (e.g. maps), language, subject, or a number of other criteria.
- Link analysis- Likewise the paper materials, where similar documents tend to have similar bibliographical references, and frequency of citation is often considered to reflect the quality or importance of document, link analysis assumes that higher-quality or otherwise more desirable documents will generally be linked to more frequently than other documents, and that links in ac document reveal something about the content of a document. Link analysis can place frequently linked-to-documents at the top of a list or identify documents that are associated with each other .
- Sequence analysis- Sequence analysis uses statistical analysis to identify unlinked documents that users are likely to want to read together. It examines the paths that users follow when searching for information and can help identify which documents users are likely to want together .
- Summarization- Though machine generated abstracts are inferior to human-generated ones in terms of readability and content, yet they can be very useful for helping users decide what items they need. Abstract-generating software typically works by identifying significant words or phrases based on position within documents association with critical phrases.
- Clustering- Clustering is similar to classification, except that the classes are determined by finding natural groupings in the data items based on probability analyses rather than by predetermined groupings. Clustering and classification are often used as a starting point for exploring further relationships in data. For example, many search engine (such as Northern Light) break down sites by location, subject, or language before sub-arranging data.
Future of data mining in library working:
In future Data Mining can provide the new road map for next generation of libraries by applying it for the following activities of library.
- Reference Service- Since the data of library is continuously growing at an exponential rate and the main problem is how one can reference the required information from the large amount of redundant information of the library. This can be possible by applying data mining techniques, so one can say that data mining is the future of reference service.
- Classification- It will replace the manual classification of content of the library with the computer assisted classification, so that the classification task can be accomplished by less skilled person in a fast and efficient way. This will simplify the classification task of the library.
- Acquisition- As per third law of library science “Every book its reader”. By applying the data mining in the library data it can be easily find out the required contents that are necessary to acquire next. This will reduce the work of library staff related to the acquisition as well as the efficient use of budget allocated to the library.
We have seen data mining is widely used in applications specially in electronic commerce, personalized environments and search engines. It should be noted that data mining has also been applied to other application domains, such as bioinformatics, digital libraries, and web based learning etc. in recent times.