PDF | This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, applications and the architecture of. The fifth section opens a window to the future of data warehousing. The sixth section deals with On-Line Analytical Processing (OLAP), by providing different. Abstract- This paper provides an overview of Data warehousing,. Data Mining, OLAP, OLTP technologies, exploring the features, new applications and the.
|Language:||English, German, Dutch|
|ePub File Size:||22.38 MB|
|PDF File Size:||20.21 MB|
|Distribution:||Free* [*Sign up for free]|
On-Line Analytical Processing (OLAP). – technology used to perform complex analysis of the data in a data warehouse. 4. OLAP is a category of software. Data warehousing. Online Analytical Processing (OLAP). Data Mining. Data Warehouses and RDBMS/OLTP Systems. Architecture of Data. applications and the architecture of Data Warehousing. The data warehouse supports Keywords: Data Warehousing, OLAP, OLTP, Data Mining,. Decision Making and Record, 26(1), March. Full text (pdf) available, as on 02/10/
It is used for building, maintaining, managing, and using the data warehouse. AKGEC 11 Technical metadata, which contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks.
Technical meta data documents include Information about data source Transformation descriptions, i. AKGEC 12 Business metadata contains information that gives users an easy-to understand perspective of the information stored in the data-ware house.
Internet home pages. Other information to support all data warehousing components. For example, the information related to the information delivery system see Sec. Data warehouse operational information, e. AKGEC 13 Metadata repository management software can be used to map the source data to the target database, generate code for data transformations, integrate and transform the data, and control moving data to the warehouse.
One of the important functional components of the metadata repository is the information directory. AKGEC 14 From a technical requirements point of view, the information directory and the entire metadata repository Should be a gateway to the data warehouse environment, and thus should be accessible from any platform via transparent and seamless connections Should support an easy distribution and replication of its content for high performance and availability Should be searchable by business-oriented key words Should act as a launch platform for end-user data access and analysis tools Kapil Tomar, IT Deptt.
AKGEC 15 Should support the sharing of information objects such as queries, reports, data collections, and subscriptions between users Should support a variety of scheduling options for requests against the data warehouse, including on-demand, one-time, repetitive, event-driven, and conditional delivery in conjunction with the information delivery system Should support the distribution of the query results to one or more destinations in any of the user-specified formats in conjunction with the information delivery system.
Should support and provide interfaces to other applications such as email, spreadsheet, and schedulers Kapil Tomar, IT Deptt. AKGEC 16 Access Tools The principal purpose of data warehousing is to provide information to business users for strategic decision making. These users interact with the data warehouse using front-end tools.
For the purpose of this discussion let's divide these tools into five main groups: Data query and reporting tools Application development tools Executive information system EIS tools On-line analytical processing tools Data mining tools Kapil Tomar, IT Deptt. Reporting tools can be divided into i.
AKGEC 18 Production reporting tools will let companies generate regular operational reports or support high-volume 'batch jobs, such as calculating and printing paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for end users. Managed query tools shield end users from the complexities of SQL and database structures by inserting a metalayer between users and the database. The metalayer is the software that provides subject-oriented views of a database and supports point-and-click creation of SQL.
These tools are based on the concepts of multidimensional databases and allow a sophisticated user to analyze the data using elaborate, multidimensional views. Typically business applications for these tools include product performance and profitability, effectiveness of a sales program or a marketing campaign, sales forecasting, and capacity planning. Knowing this information, an organization can formulate effective business, marketing, and sales strategies; precisely target promotional activity; discover and penetrate new markets; and successfully compete in the marketplace from a position of informed strength.
A relatively new and promising technology aimed at achieving this strategic advantage is known as data mining. The goal of knowledge discovery is to determine explicit hidden relationships, patterns, or correlations from data stored in an enterprise's database.
Specifically data mining can be used to perform: Segmentation e. A rigorous definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart is directed at a partition of data often called a subject area that is created for the use of a dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data.
Sometimes, such a set could be placed on the data warehouse database rather than a physically separate store of data. In most instances, however, the data mart is a physically separate store of data and is normally resident on a separate database server, often on the local area network serving a dedicated user group.
AKGEC 24 it is often a necessary and valid solution to a pressing business problem, thus achieving the goal of rapid delivery of enhanced decision support functionality to end users. The business drivers underlying such developments include Extremely urgent user requirements The absence of a budget for a full data warehouse strategy The. AKGEC 25 In summary, data marts present two problems: 1 scalability in situations where an initial small data mart grows quickly in multiple dimensions and 2 data integration.
Therefore, when designing data marts, the organizations should pay close attention to system scalability, data consistency, and manageability issues. The key to a successful data mart strategy is the development of an overall scalable data warehouse architecture; and the key step in that architecture is identifying and implementing the common dimensions.
AKGEC 26 Data Warehouse Administration and Management Security and priority management Monitoring updates from multiple sources Data quality checks Managing and updating metadata Auditing and reporting data warehouse usage and status for managing the response time and resource utilization, and providing chargeback information Purging data Replicating, subsetting, and distributing data Backup and recovery Data warehouse storage management [e.
Automatic Clustering Detection 4. Data Mining with Neural Network 5. Data warehousing technologies belong to just one of the many components in IT architecture. This chapter aims to define how data warehousing fits within the overall IT architecture, in the hope that IT professionals will be better positioned to use and integrate data warehousing technologies with the other IT components used by the enterprise.
The business requirements of an enterprise are constantly changing, and the changes are coming at an exponential rate. Business requirements have, over the years, evolved from the day-to-day clerical recording of transactions to the automation of business processes.
Exception reporting has shifted from tracking and correcting daily transactions that have gone astray to the development of self-adjusting business processes. Technology has likewise advanced by delivering exponential increases in computing power and communications capabilities. However, for all these advances in computing hardware, a significant lag exists in the realms of software development and architecture definition. Enterprise Architectures thus far have displayed a general inability to gracefully evolve in line with business requirements, without either compromising on prior technology investments or seriously limiting their own ability to evolve further.
In hindsight, the evolution of the typical Enterprise Architecture reflects the continuous, piecemeal efforts of IT professionals to take advantage of the latest technology to improve the support of business operations. Unfortunately, this piecemeal effort has often resulted in a morass of incompatible components. Meet business requirements through Information Technology and integrate new technology into the existing Enterprise Architecture.
Meet Business Requirements The IT professional must ensure that the enterprise IT infrastructure properly supports a myriad set of requirements from different business users, each of whom has different and constantly changing needs, as illustrated in Figure 1.
We need to get this modified order quickly to our European supplier I need to find out why our sales in the South are dropping Someone from XYZ, Inc. Figure 1. Different Business Needs Take Advantage of Technology Advancements At the same time, the IT professional must also constantly learn new buzzwords, review new methodologies, evaluate new tools, and maintain ties with technology partners. Not all the latest technologies are useful; the IT professional must first sift through the technology jigsaw puzzle see Figure 1.
At this point, therefore, it is prudent to step back, assess the current state of affairs and identify the distinct but related components of modern Enterprise Architectures. The two orthogonal perspectives of business and technology are merged to form one unified framework, as shown in Figure 1. The InfoMotion Enterprise Architecture 1. Operational Technology supports the smooth execution and continuous improvement of day-to-day operations, the identification and correction of errors through exception reporting and workflow management, and the overall monitoring of operations.
Information retrieved about the business from an operational viewpoint is used to either complete or optimize the execution of a business process. Decisional Technology supports managerial decision-making and long-term planning.
Decisionmakers are provided with views of enterprise data from multiple dimensions and in varying levels of detail. Historical patterns in sales and other customer behavior are analyzed. Decisional systems also support decision-making and planning through scenario-based modeling, what-if analysis, trend analysis, and rule discovery.
Informational Technology makes current, relatively static information widely and readily available to as many people as need access to it. Examples include company policies, product and service information, organizational setup, office location, corporate forms, training materials and company profiles. The InfoMotion Enterprise Architecture Virtual Corporation Technology enables the creation of strategic links with key suppliers and customers to better meet customer needs.
In the past, such links were feasible only for large companies because of economy of scale. Now, the affordability of Internet technology provides any enterprise with this same capability. Operational Needs Legacy Systems The term legacy system refers to any information system currently in use that was built using previous technology generations.
Most legacy systems are operational in nature, largely because the automation of transaction-oriented business processes had long been the priority of Information Technology projects. These databases were traditionally passive repositories of data manipulated by business applications. It is not unusual to find legacy systems with processing logic and business rules contained entirely in the user interface or randomly interspersed in procedural code.
IT professionals are now able to bullet-proof the application by placing processing logic in the database itself. This contrasts with the still-popular practice of replicating processing logic sometimes in an inconsistent manner across the different parts of a client application or across different client applications that update the same database. Through active databases, applications are more robust and conducive to evolution.
Unlike the databases of OLTP applications that are function oriented , the Operational Data Store contains subject-oriented, volatile, and current enterprise-wide detailed information; it serves as a system of record that provides comprehensive views of data in operational systems. Data are transformed and integrated into a consistent, unified whole as they are obtained from legacy and other operational systems to provide business users with an integrated and current view of operations see Figure 1.
Data in the Operational Data Store are constantly refreshed so that the resulting image reflects the latest state of operations.
The business user obtains a constantly refreshed, enterprise-wide view of operations without creating unwanted interruptions or additional load on transaction processing systems. Workflow Management and Groupware Workflow management systems are tools that allow groups to communicate and coordinate their work. Early incarnations of this technology supported group scheduling, e-mail, online discussions, and resource sharing. More advanced implementations of this technology are integrated with OLTP applications to support the execution of business processes.
Decisional Needs Data Warehouse The data warehouse concept developed as IT professionals increasingly realized that the structure of data required for transaction reporting was significantly different from the structure required to analyze data.
It was designed to contain summarized, historical views of data in production systems. This collection provides business users and decision-makers with a cross functional, integrated, subject-oriented view of the enterprise.
The introduction of the Operational Data Store has now caused the data warehouse concept to evolve further.
Data warehousing, data mining, and OLAP
The data warehouse now contains summarized, historical views of the data in the Operational Data Store. In doing so, the enterprise obtains the information required for long term and historical analysis, decision-making, and planning.
User-friendly formats, such as graphs and charts are frequently employed to quickly convey meaningful data relationships. They provide users with a new ability to both explore and publish information with relative ease. Unlike other technologies, web technology makes any user an instant publisher by enabling the distribution of knowledge and expertise, with no more effort than it takes to record the information in the first place.
Maintenance and update of information is straightforward since the information is stored on the web server. Virtual Corporation Needs Transactional Web Services and Scripts Several factors now make Internet technology and electronic commerce a realistic option for enterprises that wish to use the Internet for business transactions.
The increasing affordability of Internet access allows businesses to establish cost-effective and strategic links with business partners. This option was originally open only to large enterprises through expensive, dedicated wide-area networks or metropolitan area networks. Improved security and encryption for sensitive data now provide customers with the confidence to transact over the Internet.
At the same time, improvements in security provide the enterprise with the confidence to link corporate computing environments to the Internet. Improved user-friendliness and navigability from web technology make Internet technology and its use within the enterprise increasingly popular. The majority of the architectural components support the enterprise at the operational level. However, separate components are now clearly defined for decisional and information purposes, and the virtual corporation becomes possible through Internet technologies.
One example is the group of applications collectively known as office productivity tools such as Microsoft Office or Lotus SmartSuite. Components of this type can and should be used across the various layers of the Enterprise Architecture and, therefore, are not described here as a separate item.
Which are recommended for fulfilling those needs. Legacy Integration The Need The integration of new and legacy systems is a constant challenge because of the architectural templates upon which legacy systems were built. Legacy systems often attempt to meet all types of information requirements through a single architectural component; consequently, these systems are brittle and resistant to evolution.
Despite attempts to replace them with new applications, many legacy systems remain in use because they continue to meet a set of business requirements: Legacy programs that produce and maintain summary information are migrated to the data warehouse.
Historical data are likewise migrated to the data warehouse. Data required for operational monitoring are moved to the Operational Data Store. Table 1. The Operational Data Store and the data warehouse present IT professionals with a natural migration path for legacy migration. By migrating legacy systems to these two components, enterprises can gain a measure of independence from legacy components that were designed with old, possibly obsolete, technology.
Legacy systems are typically structured around functional or organizational areas, in contrast to the cross-functional view required by operations monitoring. Different and potentially incompatible technology platforms may have been used for different systems.
Data may be available in legacy databases but are not extracted in the format required by business users. Or data may be available but may be too raw to be of use for operational decision-making further summarization, calculation, or conversion is required. And lastly, several systems may contain data about the same item but may examine the data from different viewpoints or at different time frames, therefore requiring reconciliation.
Legacy Integration: Architectural View The Recommended Approach An integrated view of current, operational information is required for the successful monitoring of operations. Instead, an Operational Data Store, coupled with flash monitoring and reporting tools, as shown in Figure 1.
Like a dashboard on a car, flash monitoring and reporting tools keep business users apprised of the latest cross-functional status of operations. These tools obtain data from the Operational Data Store, which is regularly refreshed with the latest information from legacy and other operational systems. Business users are consequently able to step in and correct problems in operations while they are still smaller or better, to prevent problems from occurring altogether.
Once alerted of a potential problem, the business user can manually intervene or make use of automated tools i. Operational Monitoring: Technology advances have made it possible to build and modify systems quickly in response to changes in business processes.
New policies, procedures and controls are supported and enforced by the systems.
What is Data Warehousing
In addition, workflow management systems can be used to supplement OLTP applications. A workflow management system converts business activities into a goal-directed process that flows through the enterprise in an orderly fashion see Figure 1. The workflow management system alerts users through the automatic generation of notification messages or reminders and routes work so that the desired business result is achieved in an expedited manner.
Process Implementation Decision Support The Need It is not possible to anticipate the information requirements of decision makers for the simple reason that their needs depend on the business situation that they face.
Decisionmakers need to review enterprise data from different dimensions and at different levels of detail to find the source of a business problem before they can attack it. They likewise need information for detecting business opportunities to exploit.
Decision-makers also need to analyze trends in the performance of the enterprise. Rather than waiting for problems to present themselves, decision-makers need to proactively mobilize the resources of the enterprise in anticipation of a business situation.
Process Implementation: Alternatively, the IT professional is pressured to produce an ad hoc report from legacy systems as quickly as possible. If unlucky, the IT professional will find the data needed for the report are scattered throughout different legacy systems. An even unluckier may find that the processing required to produce the report will have a toll on the operations of the enterprise.
These delays are not only frustrating both for the decision-maker and the IT professional, but also dangerous for the enterprise. The information that eventually reaches the decisionmaker may be inconsistent, inaccurate, worse, or obsolete. The Recommended Approach Decision support applications or OLAP that obtain data from the data warehouse are recommended for this particular need.
The data warehouse holds transformed and integrated enterprise-wide operational data appropriate for strategic decision-making, as shown in Figure 1. The data warehouse also contains data obtained from external-sources, whenever this data is relevant to decision-making.
Decision Support Decision support applications analyze and make data warehouse information available in formats that are readily understandable by decision-makers.
Hyperdata Distribution The Need Past informational requirements were met by making data available in physical form through reports, memos, and company manuals.
This practice resulted in an overflow of documents providing much data and not enough information. Enterprises encountered problems in keeping different versions of related items synchronized. There was a constant need to update, republish and redistribute documents. Decision Support: Architectural View In response to this problem, enterprises made data available to users over a network to eliminate the paper. It was hoped that users could selectively view the data whenever they needed it.
This approach likewise proved to be insufficient because users still had to navigate through a sea of data to locate the specific item of information that was needed. The Recommended Approach Users need the ability to browse through nonlinear presentations of data. Web technology is particularly suitable to this need because of its extremely flexible and highly visual method of organizing information see Figure 1.
Users are therefore able to locate information with relative ease. Hyperdata Distribution: Architectural View Virtual Corporation The Need A virtual corporation is an enterprise that has extended its business processes to encompass both its key customers and suppliers. Its business processes are newly redesigned; its product development or service delivery is accelerated to better meet customer needs and preferences; its management practices promote new alignments between management and labor, as well as new linkages among enterprise, supplier and customer.
A new level of cooperation and openness is created and encouraged between the enterprise and its key business partners. The Recommended Approach Partnerships at the enterprise level translate into technological links between the enterprise and its key suppliers or customers see Figure 1. Information required by each party is identified, and steps are taken to ensure that this data crosses organizational boundaries properly.
Some organizations seek to establish a higher level of cooperation with their key business partners by jointly redesigning their business processes to provide greater value to the customer. Internet and web technologies are well suited to support redesigned, transactional processes. Thanks to decreasing Internet costs, improved security measures, improved userfriendliness, and navigability. Virtual Corporation: Architectural View 1. The strategies presented in the previous section enable organizations to move from their current technology architectures into the InfoMotion Enterprise Architecture.
This section describes the tasks for any migration effort. Review the Current Enterprise Architecture As simple as this may sound, the starting point is a review of the current Enterprise Architecture.
It is important to have an idea of whatever that is already available before planning for further achievements. The IT department or division should have this information readily available, although it may not necessarily be expressed in terms of the architectural components identified above.
A short and simple exercise of mapping the current architecture of an enterprise to the architecture described above should quickly highlight any gaps in the current architecture. Gaps should cause concern only if the absence of an architectural component prevents the IT infrastructure from meeting present requirements or from supporting long-term strategies.
For example, if transactional web scripts are not critical to an enterprise given its current needs and strategies, there should be no cause for concern. Develop a Migration Plan Based on Requirements It is not advisable for an enterprise to use this list of architectural gaps to justify a dramatic overhaul of its IT infrastructure; such an undertaking would be expensive and would cause unnecessary disruption of business operations. Instead, the enterprise would do well to develop a migration plan that consciously maps coming IT projects to the InfoMotion Enterprise Architecture.
The Natural Migration Path While developing the migration plan, the enterprise should consider the natural migration path that the InfoMotion architecture implies, as illustrated in Figure 1. For most companies, this core layer is where the majority of technology investments have been made. It should also be the starting point of any architecture migration effort, i.
It also provides the succeeding technology layers with a more stable foundation for future evolution. The scenarios provide generic roadmaps that address typical architectural needs. The migration plan, however, must be customized to address the specific needs of the enterprise.
Each project defined in the plan must individually contribute to the enterprise in the short term, while laying the groundwork for achieving long-term enterprise and IT objectives.
By incrementally migrating its IT infrastructure one component and one project at a time , the enterprise will find itself slowly but surely moving towards a modern, resilient Enterprise Architecture, with minimal and acceptable levels of disruption in operations. Monitor and Update the Migration Plan The migration plan must be monitored, and the progress of the different projects fed back into the planning task.
One must not lose sight of the fact that a modern Enterprise Architecture is a moving target; inevitable new technology renders continuous evolution of the Enterprise Architecture. IN Summary An enterprise has longevity in the business arena only when its products and services are perceived by its customers to be of value. Likewise, Information Technology has value in an enterprise only when its cost is outweighed by its ability to increase and guarantee quality, improve service, cut costs or reduce cycle time, as depicted in Figure 1.
The Enterprise Architecture is the foundation for all Information Technology efforts. It therefore must provide the enterprise with the ability to: These requirements form the basis for the InfoMotion equation, shown in Figure 1. It also defines data warehousing concepts and cites the typical reasons for building data warehouses. The differences in operational and decisional information requirements presented new challenges that old computing practices could not meet.
Below, we elaborate on how this change in computing focus became the impetus for the development of data warehousing technologies. The Business Cycle 24 Policy In Chapter 1, it is noted that much of the effort and money in computing has been focused on meeting the operational business requirements of enterprises.
After all, without the OLTP applications that records thousands, even millions of discrete transactions each day, it would not be possible for any enterprise to meet customer needs while enforcing business policies consistently.
Nor would it be possible for an enterprise to grow without significantly expanding its manpower base. With operational systems deployed and day-to-day information needs being met by the OLTP systems, the focus of computing has over the recent years shifted naturally to meeting the decisional business requirements of an enterprise. Figure 2. Decision-makers themselves cannot be expected to know their information requirements ahead of time; they review enterprise data from different perspectives and at different levels of detail to find and address business problems as the problems arise.
Decision-makers also need to look through business data to identify opportunities that can be exploited. They examine performance trends to identify business situations that can provide competitive advantage, improve profits, or reduce costs.
They analyze market data and make the tactical as well as strategic decisions that determine the course of the enterprise. Operational Systems Fail to Provide Decisional Information Since these information requirements cannot be anticipated, operational systems which correctly focus on recording and completing different types of business transactions are unable to provide decision-makers with the information they need. As a result, business managers fall back on the time-consuming, and often frustrating process of going through operational inquiries or reports already supported by operational systems in an attempt to find or derive the information they really need.
Alternatively, IT professionals are pressured to produce an adhoc report from the operational systems as quickly as possible. It will not be unusual for the IT professional to find that the data needed to produce the report are scattered throughout different operational systems and must first be carefully integrated. Worse, it is likely that the processing required to extract the data from each operational system will demand so much of the system resources that the IT professional must wait until non-operational hours before running the queries required to produce the report.
Those delays are not only time-consuming and frustrating both for the IT professionals and the decision-makers, but also dangerous for the enterprise. When the report is finally produced, the data may be inconsistent, inaccurate, or obsolete.
There is also the very real possibility that this new report will trigger the request for another adhoc report. Decisional Systems have Evolved to Meet Decisional Requirements Over the years, decisional systems have been developed and implemented in the hope of meeting these information needs. Most decisional systems, however, have failed to deliver on their promises. Each query frequently results in a large results set and involves frequent full table scan and multi-table joins.
What is a data warehouse?
William H. Integrated A data warehouse contains data extracted from the many operational systems of the enterprise, possibly supplemented by external data. For example, a typical banking data warehouse will require the integration of data drawn from the deposit systems, loan systems, and the general ledger.
Each of these operational systems records different types of business transactions and enforces the policies of the enterprise regarding these transactions. If each of the operational systems has been custom built or an integrated system is not implemented as a solution, then it is unlikely that these systems are integrated.
Thus, Customer A in the deposit system and Customer B in the loan system may be one and the same person, but there is no automated way for anyone in the bank to know this.
Customer relationships are managed informally through relationships with bank officers. A data warehouse brings together data from the various operational systems to provide an integrated view of the customer and the full scope of his or her relationship with the bank. Modern operational systems, in turn, have shifted their focus to the operational requirements of an entire business process and aim to support the execution of the business process from start to finish.
A data warehouse goes beyond traditional information views by focusing on enterprisewide subjects such as customers, sales, and profits. These subjects span both organizational and process boundaries and require information from multiple sources to provide a complete picture. Databases Although the term data warehousing technologies is used to refer to the gamut of technology components that are required to plan, develop, manage, implement, and use a data warehouse, the term data warehouse itself refers to a large, read-only repository of data.
At the very heart of every data warehouse lie the large databases that store the integrated data of the enterprise, obtained from both internal and external data sources. The term internal data refers to all data that are extracted from the operational systems of the enterprise.
External data are data provided by third-party organizations, including business partners, customers, government bodies, and organizations that choose to make a profit by selling their data e. Also stored in the databases are the metadata that describe the contents of the data warehouse. A more thorough discussion on metadata and their role in data warehousing is provided in Chapter 3. Required for Decision-Making Unlike the databases of operational systems, which are often normalized to preserve and maintain data integrity, a data warehouse is designed and structured in a demoralized manner to better support the usability of the data warehouse.
Users are better able to examine, derive, summarize, and analyze data at various levels of detail, over different periods of time, when using a demoralized data structure.
For example, while a finance manager is interested in the profitability of the various products of a company, a product manager will be more interested in the sales of the product in the various sales regions. In this manner, a decision-maker can start with a high-level view of the business, then drill down to get more detail on the areas that require his attention, or vice versa.
The time-stamping of each fact also makes it possible for decision-makers to recognize trends and patterns in customer or market behavior over time.
Data at the most detailed level, i. Aggregates presummarized data are stored in the warehouse to speed up responses to queries at higher levels of granularity. If the data warehouse stores data only at summarized levels, its users will not be able to drill down on data items to get more detailed information. However, the storage of very detailed data results in larger space requirements.
The term dynamic report refers to a report that can be quickly modified by its user to present either greater or lesser detail, without any additional programming required.
Dynamic reports are the only kind of reports that provide true, adhoc reporting capabilities. When the summary calls attention to an area that bears closer inspecting, the decision-maker should be able to point to that portion of the report, then obtain greater detail on it dynamically, on an as-needed basis, with no further programming.
To Provide Business Users with Access to Data The data warehouse provides access to integrated enterprise data previously locked away in unfriendly, difficult-to-access environments. Business users can now establish, with minimal effort, a secured connection to the warehouse through their desktop PC. Security is enforced either by the warehouse front-end application, or by the server database, or by the both.
Because of its integrated nature, a data warehouse spares business users from the need to learn, understand, or access operational data in their native environments and data structures.
To Provide One Version of the Truth The data in the data warehouse are consistent and quality assured before being released to business users. Since a common source of information is now used, the data warehouse puts to rest all debates about the veracity of data used or cited in meetings.
Oracle Database 18c
The data warehouse becomes the common information resource for decisional purposes throughout the organization. While these differences may seem trivial at the first glance, the subtle nuances that exist depending on the context may result in misleading numbers and ill-informed decisions.
The operational systems will not be able to meet this kind of information need for a good reason. A data warehouse should be used to record the past accurately, leaving the OLTP systems free to focus on recording current transactions and balances.
Instead, historical data are loaded and integrated with other data in the warehouse for quick access. To Slice and Dice Through Data As stated earlier in this chapter, dynamic reports allow users to view warehouse data from different angles, at different levels of detail business users with the means and the ability to slice and dice through warehouse data can actively meet their own information needs.
The ready availability of different data views also improves business analysis by reducing the time and effort required to collect, format, and distill information from data.
To Separate Analytical and Operational Processing Decisional processing and operational information processing have totally divergent architectural requirements. Attempts to meet both decisional and operational information needs through the same system or through the same system architecture merely increase the brittleness of the IT architecture and will create system maintenance nightmares. Data warehousing disentangles analytical from operational processing by providing a separate system architecture for decisional implementations.
This makes the overall IT architecture of the enterprise more resilient to changing requirements. To Support the Reengineering of Decisional Processes At the end of each BPR initiative come the projects required to establish the technological and organizational systems to support the newly reengineered business process.
Although reengineering projects have traditionally focused on operational processes, data warehousing technologies make it possible to reengineer decisional business processes as well.
Data warehouses, with their focus on meeting decisional business requirements, are the ideal systems for supporting reengineered decisional business processes. The concept of the data mart is causing a lot of excitement and attracts much attention in the data warehouse industry.
Mostly, data marts are presented as an inexpensive alternative to a data warehouse that takes significantly less time and money to build. However, the term data mart means different things to different people. A rigorous definition of this term is a data store that is subsidiary to a data warehouse of integrated data.
The data mart is directed at a partition of data often called a subject area that is created for the use of a dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on the data warehouse database rather than a physically separate store of data. In most instances, however, the data mart is a physically separate store of data and is normally resident on a separate database server, often on the local area enterprises relational OLAP technology which creates highly denormalized star schema relational designs or hypercubes of data for analysis by groups of users with a common interest in a limited portion of the database.
All these type of data marts, called dependent data marts because their data content is sourced from the data warehouse, have a high value because no matter how many are deployed and no matter how many different enabling technologies are used, the different users are all accessing the information views derived from the same single integrated version of the data. Unfortunately, the misleading statements about the simplicity and low cost of data marts sometimes result in organizations or vendors incorrectly positioning them as an alternative to the data warehouse.
This viewpoint defines independent data marts that in fact represent fragmented point solutions to a range of business problems in the enterprise. This type of implementation should rarely be deployed in the context of an overall technology of applications architecture. Indeed, it is missing the ingredient that is at the heart of the data warehousing concept: Each independent data mart makes its own assumptions about how to consolidate the data, and the data across several data marts may not be consistent.
As a result, an environment is created in which multiple operational systems feed multiple non-integrated data marts that are often overlapping in data content, job scheduling, connectivity, and management. In other words, a complex many-to-one problem of building a data warehouse is transformed from operational and external data sources to a many-to-many sourcing and management nightmare. Another consideration against independent data marts is related to the potential scalability problem: But, as usage begets usage, the initial small data mart needs to grow i.
It is clear that the point-solution-independent data mart is not necessarily a bad thing, and it is often a necessary and valid solution to a pressing business problem, thus achieving the goal of rapid delivery of enhanced decision support functionality to end users.
The business drivers underlying such developments include: To address data integration issues associated with data marts, the recommended approach proposed by Ralph Kimball is as follows. For any two data mart in an enterprise, the common dimensions must conform to the equality and roll-up rule, which states that these dimensions are either the same or that one is a strict roll-up of another.
The time dimensions from both data marts might be at the individual day level, or, conversely, one time dimension is at the day level but the other is at the week level. Because days roll up to weeks, the two time dimensions are conformed. The time dimensions would not be conformed if one time dimension were weeks and the other time dimension, a fiscal quarter.
The resulting data marts could not usefully coexist in the same application. In summary, data marts present two problems: Therefore, when designing data marts, the organizations should pay close attention to system scalability, data consistency, and manageability issues. The key to a successful data mart strategy is the development of overall scalable data warehouse architecture; and key step in that architecture is identifying and implementing the common dimensions.
A number of misconceptions exist about data marts and their relationships to data warehouses discuss two of those misconceptions below. Data Marts can be Built Independently of One Another Some enterprises find it easier to deploy multiple data marts independently of one another. At the first glance, such an approach is indeed easier since there are no integration issues. Different groups of users are involved with each data mart, which implies fewer conflicts about the use of terms and about business rules.
Each data mart is free to exist within its own isolated world, and all the users are happy. Unfortunately, that enterprises fail to realize until much later is that by deploying one isolated data mart after another, the enterprise has actually created new islands of automation. While at the onset those data marts are certainly easier to develop, the task of maintaining many unrelated data marts is exceedingly complex and will create data management, synchronization, and consistency issues.
Multiple data marts are definitely appropriate within an organization, but these should be implemented only under the integrating framework of an enterprise-wide data warehouse. Each data mart is developed as an extension of the data warehouse and is fed by the data warehouse. The data warehouses enforces a consistent set of business rules and ensures the consistent use of terms and definitions.
Although both technologies support decisional information needs of enterprise decisionmakers, the two are distinctly different and are deployed to meet different types of decisional information needs. Inmon, C. Imhoff, and G. Unlike the databases of OLTP applications that are operational or function oriented , the Operational Data Store contains subject-oriented, enterprise-wide data.
However, unlike data warehouses, the data in Operational Data Stores are volatile, current and detailed. However, some significant challenges of the ODS still remain. Table 2. The ODS provides an integrated view of the data in the operational systems. Data are transformed and integrated into a consistent, unified whole as they are obtained from legacy and other operational systems to provide business users with an integrated and current view of operations.
Flash Monitoring and Reporting Tools As mentioned in Chapter 1, flash monitoring and reporting tools are like a dashboard that provides meaningful online information on the operational status of the enterprise. Operational Monitoring Relationship of Operational Data Stores to Data Warehouse Enterprises with Operational Data Stores find themselves in the enviable position of being able to deploy data warehouses with considerable ease.
Since operational data stores are integrated, many of the issues related to extracting, transforming, and transporting data from legacy systems have been addressed by the ODS, as illustrated in Figure 2. The ODS is free to focus only on the current state of operations and is constantly updated in real time.
Although the task of calculating ROI for data warehousing initiatives is unique to each enterprise, it is possible to classify the type of benefits and costs that are associated with data warehousing. Benefits Data warehousing benefits can be expected from the following areas: The quantification of such costs in terms of staff hours and erroneous data may yield surprising results.
Benefits of this nature, however, are typically minimal, since warehouse maintenance and enhancements require staff as well. At best, staff will be redeployed to more productive tasks.
Analysts go through several steps in their day-to-day work: Unfortunately, much of the time sometimes up to 40 percent spent by enterprise analysts on a typical day is devoted to locating and retrieving data. The availability of integrated, readily accessible data in the data warehouse should significantly reduce the time that analysts spend with data collection tasks and increase the time available to actually analyze the data they have collected.
This leads either to shorter decision cycle times or improvements in the quality of the analysis. The most significant business improvements in warehousing result from the analysis of warehouse data, especially if the easy availability of information yields insights here before unknown to the enterprise.
The goal of the data warehouse is to meet decisional information needs, therefore it follows naturally the greatest benefits of warehousing that are obtained when decisional information needs are actually met and sound business decisions are made both at the tactical and strategic level. Understandably, such benefits are more significant and therefore, more difficult to project and quantify. Costs Data warehousing costs typically fall into one of the four categories.
These are: This item refers to the costs associated with setting up the hardware and operating environment required by the data warehouse. In many instances, this setup may require the acquisition of new equipment or the upgrade of existing equipment. Larger warehouse implementations naturally imply higher hardware costs. This item refers to the costs of downloading the licenses to use software products that automate the extraction, cleansing, loading, retrieval, and presentation of warehouse data.
This item refers to services provided by systems integrators, consultants, and trainers during the course of a data warehouse project. This item refers to costs incurred by assigning internal staff to the data warehousing effort, as well as to costs associated with training internal staff on new technologies and techniques.
Data Warehousing OLAP and Data Mining
ROI Considerations The costs and benefits associated with data warehousing vary significantly from one enterprise to another. The effect of data warehousing on the tactical and strategic management of an enterprise is often likened to cleaning the muddy windshield of a car.
It is difficult to quantify the value of driving a car with a cleaner windshield. Similarly, it is difficult to quantify the value of managing an organization with better information and insight. Lastly, it is important to note that data warehouse justification is often complicated by the fact that much of the benefit may take sometime to realize and therefore is difficult to quantify in advance. In Summary Data warehousing technologies have evolved as a result of the unsatisfied decisional information needs of enterprises.
With the increased stability of operational systems, information technology professionals have increasingly turned their attention to meeting the decisional requirements of the enterprise. A data warehouse, according to Bill Inmon, is a collection of integrated, subject-oriented databases designed to supply the information required for decision-making. Each data item in the data warehouse is relevant to some moment in time.
A data mart has traditionally been defined as a subset of the enterprise-wide data warehouse. Many enterprises, upon realizing the complexity involved in deploying a data warehouse, will opt to deploy data marts instead. Although data marts are able to meet the immediate needs of a targeted group of users, the enterprise should shy away from deploying multiple, unrelated data marts.
The presence of such islands of information will only result in data management and synchronization problems. Like data warehouses, Operational Data Stores are integrated and subject-oriented. However, an ODS is always current and is constantly updated ideally in real time.
The Operational Data Store is the ideal data source for a data warehouse, since it already contains integrated operational data as of a given point in time. Although data warehouses have proven to have significant returns on investment, particularly when they are meeting a specific, targeted business need, it is extremely difficult to quantify the expected benefits of a data warehouse. The costs are easier to calculate, as these break down simply into hardware, software, services, and in-house staffing costs.
PEOPLE Although a number of people are involved in a single data warehousing project, there are three key roles that carry enormous responsibilities. Negligence in carrying out any of these three roles can easily derail a well-planned data warehousing initiative. This section of the book therefore focuses on the Project Sponsor, the Chief Information Officer, and the Project Manager and seeks to answer the questions frequently asked by individuals who have accepted the responsibilities that come with these roles.
Every data warehouse initiative has a Project Sponsor-a high-level executive who provides strategic guidance, support, and direction to the data warehousing project. The Project Sponsor ensures that project objectives are aligned with enterprise objectives, resolves organizational issues, and usually obtains funding for the project.
The CIO is responsible for the effective deployment of information technology resources and staff to meet the strategic, decisional, and operational information requirements of the enterprise. Data warehousing, with its accompanying array of new technology and its dependence on operational systems, naturally makes strong demands on the physical and human resources under the jurisdiction of the CIO, not only during design and development but also during maintenance and subsequent evolution.
The warehouse Project Manager is responsible for all technical activities related to implementing a data warehouse.
Ideally, an IT professional from the enterprise fulfills this critical role. It is not unusual, however, for this role to be outsourced for early or pilot projects, because of the newness of warehousing technologies and techniques. This chapter attempts to provide answers to questions frequently asked by Project Sponsors.
It is naive to expect an immediate change to the decision-making processes in an organization when a data warehouse first goes into production. End users will initially be occupied more with learning how to use the data warehouse than with changing the way they obtain information and make decisions.
It is also likely that the first set of predefined reports and queries supported by the data warehouse will differ little from existing reports.
Decision-makers will experience varying levels of initial difficulty with the use of the data warehouse; proper usage assumes a level of desktop computing skills, data knowledge, and business knowledge. Desktop Computing Skills Not all business users are familiar and comfortable with the desktop computers, and it is unrealistic to expect all the business users in an organization to make direct, personal use of the front-end warehouse tools.
On the other hand, there are power users within the organization who enjoy using computers, love spreadsheets, and will quickly push the tools to the limit with their queries and reporting requirements. Data Knowledge It is critical that business users be familiar with the contents of the data warehouse before they make use of it.
In many cases, this requirement entails extensive communication on two levels. First, the scope of the warehouse must be clearly communicated to property manage user expectations about the type of information they can retrieve, particularly in the earlier rollouts of the warehouse.
Second, business users who will have direct access to the data warehouse must be trained on the use of the selected front-end tools and on the meaning of the warehouse contents. The answers that the warehouse will provide are only as good as the questions that are directed to it.
As end users gain confidence both in their own skills and in the veracity of the warehouse contents, data warehouse usage and overall support of the warehousing initiative will increase. As the data scope of the warehouse increases and additional standard reports are produced from the warehouse data, decision-makers will start feeling overwhelmed by the number of standard reports that they receive.
Decision-makers either gradually want to lessen their dependence on the regular reports or want to start relying on exception reporting or highlighting, and alert systems. For example, instead of receiving sales reports per region for all regions within the company, a sales executive may instead prefer to receive sales reports for areas where actual sales figures are either 10 percent more or less than the budgeted figures.
Alert Systems Alert systems also follow the same principle, that of highlighting or bringing to the fore areas or items that require managerial attention and action. However, instead of reports, decision-makers will receive notification of exceptions through other means, for example, an e-mail message.
As the warehouse gains acceptance, decision-making styles will evolve from the current practice of waiting for regular reports from IT or MIS to using the data warehouse to understand the current status of operations and, further, to using the data warehouse as the basis for strategic decision-making.
At the most sophisticated level of usage, a data warehouse will allow senior management to understand and drive the business changes needed by the enterprise.
A successful enterprise-wide data warehouse effort will improve financial, marketing and operational processes through the simple availability of integrated data views. Previously unavailable perspectives of the enterprise will increase understanding of cross-functional operations.
The integration of enterprise data results in standardized terms across organizational units e. A common set of metrics for measuring performance will emerge from the data warehousing effort. Communication among these different groups will also improve. The very process of consolidation requires the use of a common vocabulary and increased understanding of operations across different groups in the organization. While financial processes will improve because of the newly available information, it is important to note that the warehouse can provide information based only on available data.
For example, one of the most popular banking applications for data warehousing is profitability analysis.
Unfortunately, enterprises may encounter a rude shock when it becomes apparent that revenues and costs are not tracked at the same level of detail within the organization. Banks frequently track their expenses at the level of branches or organization units but wish to compute profitability on a per customer basis. With profit figures at the customer level and costs at the branch level, there is no direct way to compute profit.
As a result, enterprises may resort to formulas that allow them to compute or derive cost and revenue figures at the same level for comparison purposes. Marketing Data warehousing supports marketing organizations by providing a comprehensive view of each customer and his many relationships with the enterprise. Over the years, marketing efforts have shifted in focus. Customers are no longer viewed as individual accounts but instead are viewed as individuals with multiple accounts.
This change in perspective provides the enterprise with cross-selling opportunities. The notion of customers as individuals also makes possible the segmentation and profiling of customers to improve target-marketing efforts. The availability of historical data makes it possible to identify trends in customer behavior, hopefully with positive results in revenue.
Operations By providing enterprise management with decisional information, data warehouses have the potential of greatly affecting the operations of an enterprise by highlighting both problems and opportunities that here before went undetected. Strategic or tactical decisions based on warehouse data will naturally affect the operations of the enterprise. It is in this area that the greatest return on investment and, therefore, greatest improvement can be found.
As mentioned in Chapter 2, return on investment ROI from data warehousing projects varies from organization to organization and is quite difficult to quantify prior to a warehousing initiative. However, a common list of problems encountered by enterprises can be identified as a result of unintegrated customer data and lack of historical data. A properly deployed data warehouse can solve the problems, as discussed below.
Customers are annoyed by requests for the same information by different units within the same enterprise. The inconsistent use of terms results in different business rules for the same item.Rounding out the dimension tables 7. Instead, the enterprise would do well to develop a migration plan that consciously maps coming IT projects to the InfoMotion Enterprise Architecture. Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker executive, manager, analyst to make better and faster decisions.
These vendors provide the relational database management systems that are capable of storing up to terabytes of data for warehousing purposes. These tasks are structured and repetitive, and consist of short, atomic, isolated transactions. Application systems often seem rigid and unable to adapt to evolving management information needs.
AKGEC 12 Business metadata contains information that gives users an easy-to understand perspective of the information stored in the data-ware house.