|
All section posts by date

I love Excel, I think it's pretty obvious. I'm not alone, millions and millions of people around the world use it too and, more or less, love it. The vast majority of these people use it to work on, manipulate and display data of all kind.
What's strange about Excel is that the very reasons because it is loved by people are the same reasons why it is execrated by IT professionals.
At the grassroots of MS Excel diffusion is the idea of control. The user has data on grids and worksheets; these can be manipulated down to the single cell, organized at will, formatted as they prefer. Total freedom. Maybe only a fraction of this freedom is actually used, but having it is reassuring.
The IT has mixed feelings about total control. It may be an escape hatch when there's no time or possibility to do the job but, on the contrary, who knows what users can do with all that freedom? How can be assured that those data are "right?".
The other pillar of Excel success is ownership. The user owns her data, once they are on the file. She can do everything, and she's the sole responsible for that. No one must be contacted before rounding figures or cutting corners.
As before, the IT professionals are concerned of users making "unapproved changes" with no previous notice, or creating "unofficial versions" and spreading them all over the company.
The third element is collaboration. I'm not talking of Groove or other similar tool, I'm talking the ability to pack all your work in a file and e-mail it. You can broadcast your work that easily, with a system that everybody is familiar with and, from the user perspective, is extremely manageable.
The IT, of course, has concerns about data version proliferation, security threats or even attachments size.
The fourth element is the " Swiss army knife" attitude. Excel extreme freedom sometimes (or often) gives rise to complex environments to make the job of a specific application. These may be complex and prone to break but, at last, they are tailored exactly on user requirements and are user controlled.
These creations are sometimes kept secret, because the IT would merrily use a flamethrower on them.
So, there's a clash of visions around Excel that, ultimately, have roots in different visions. For business users, the whole damn computer thing is a matter of tools to do the job and Excel, for the knowledge worker, is a powerful tool. For the IT department, everything stays inside a process, one of the many which live inside a company.
Is there an equilibrium somewhere between these two contrasting visions? Yes of course! Or, maybe, not.
Excel is here to stay; even the Gartner Group, recently, gave up to that, in a certain way. Currently software connects to Excel to export data and, sometimes, to import from it. This approach draws a borderline between the orderly and well organized world of business applications and the wild territories of personal computing. Usually the IT stays entrenched on the border and reluctantly raids the unknown territory outside.
Users happily work with their Excel files but, sometimes, feel that they need something more automated and more organized. It is on this requirement that a cautious encounter may take place. What we need is a new generation of applications which interact with Excel and tackle or ease the issues mentioned above. With Viney@rd I'm giving a contribution in this direction, there are some others who are working along the same lines. The future appears to be interesting.
What do you think about?
Photo courtesy of recycledstardust
 Whenever I use the term " Natural key" in a tweet or on Facebook, someone always DM me to remind me how natural keys are worthless. I should use the brave and muscular surrogate keys for each and every task. Maybe is a frame of mind, but whenever there are strong technical debates I feel naturally inclined to the minority.
In this case my point is that natural keys are perfectly adequate for many Business Intelligence tasks, and should be replaced by surrogate keys only when necessary.
If you are still with me, let's go.
A bar counter definition of natural key is: "the code which uniquely identifies your record". Every transactional system assigns codes to its entities, customers, salesmen, products etc. The policy used to assign this code, is up to the transactional system and I could not care less.
Given that natural keys can a) change over time b) have a business meaning that can change over time as well c) not be the ideal field format for a join, often a surrogate key is introduced. A surrogate key is a key, usually numeric, unrelated with the natural, that uniquely identifies the record. That is, it does exactly the job already done by the natural key. (if you want a less mundane description of this subject you can find a good article here).
This consideration alone should drive to the conclusion that the widespread use of surrogate keys in a datawarehouse is often overkill.
I made literally tenths of systems with no surrogate keys which worked fine for years. Often we think to DWs as multi terabytes systems which integrate data from hundredths of systems, but these are only a fraction of the total. There's also a midsized business market which needs the bulk of the abilities provided by a datawarehouse with far less data and far less systems to be integrated. In these cases, a simple approach, based on natural keys, may be perfectly suited.
Building surrogate keys, is often a rather complex task, which slows development, hinders the long and delicate subsequent trimming work and slows loadings. More subtly, they're an entirely technical subject, hard to be understood by non technical project sponsors thus hard to justify on a project schedule.
Surrogate Keys should be introduced when they are necessary and there are, indeed, cases where surrogate keys are highly recommended.
In those multi terabytes DWs, where the smallest fact table has half billion records, performance becomes such an issue that the use of small numeric keys is practically mandatory. If many systems are integrated together in the DW, chances are that codes overlap, so an unrelated key is a good solution to that. Surrogate keys also shield DWs from changes of the naturals, whose propagation in a very large DW is an awful job. Also Surrogate Keys are a good artifact to factor in slowly changing dimensions. Etc. Etc. etc.
So, the next time I tweet the term "Natural Key", please do not flood my account with "Surrogate Keys"!
It is often said and widely acknowledged that a datawarehouse for business data analysis is something for the big guys. I do not agree at all! Having little data does not mean that the environment is not complex. I had small customers which required very advanced analytics to keep up with what is going on. So, I try to explain as simply as I can
why you need a database for business data analysis (i.e. a datawarehouse) EVEN IF you are a small or medium company, have few data and your server can handle all the workload etc.
1) Your history is there even if your OLTP systems blows up. It's not a matter of backups: the vendor goes belly up and there's no further support, your business changes too much to keep using the old system, your new CEO loves a different SW etc.
2) Building a datawarehouse forces you to think to an analysis model for your business. A stable model makes comparisons with the past possible. The need for changes in a data analysis database mirrors the need to change the company strategy; changing too often means that your company does not know where she's going.
3) Building such a database forces you to think to key performance indicators for your business. There are figures for every business; "magic numbers" that give you an immediate idea about what's going on. No business, no matter how small it is, can do without some key performance indicators. Often, they're not formally defined, but they do exist in people's minds.
4) Likely you have different systems with different master data. The datawarehouse is the place where you can match all your siloed data . All those small MS Access databases, MS Excel worksheets etc. that you have around generate an inextricable mess to understand what's going on. If you build such an analysis database makes you actually tackle all the different data and reconcile them.
5) Often business people think by categories not implemented in a business application. Your datwarehouse may be the only place where some data may reside thus being applicable to your analysis model.
6) having a different DB, maybe on a different technology, make the analysts feel special compared to all those data entry people, and make it more acceptable to the upper management.
As you see, the point is hardly technical but is related to the way a business looks onto itself. A datawarehouse , to such extent, is like a mirror in front of which you business can realize that there's that little spot on the neck.
Do you have examples about datawarehousing in SMBs? Do you share my vision? If yes, can you define it better? If no, why?
Viney@rd will fit, at least by now, inside the broader "workgroup intelligence" category. This is often criticized because of the lack of focus on delivering a single version of truth. In my opinion, backed by a respectable number of disappointing projects, there's nothing like a single version of truth. In fact, enforcing it may do more evil than good.
Let's consider the most basic of all BI outcomes: sales. More precisely let's talk about invoices. Nothing appears to be easier the defining the monthly invoiced amount; just sum all the invoices. Many senior managers will think this way and will be irritated by any further question, save being irritated as well by realizing that the accounting department provides different numbers than those in their sales reports.
How are credit notes supposed to be taken into account? Should they decrease the invoiced amount? All of them? And returns? What about those occasional invoices not issued upon a sale but for other services? Are all the products that produce a revenue relevant to sale performance (ex.. paid merchandising)? How many other particular cases lie inside a complex organization?
Accounting has its own standards to comply with, and knows exactly what to do with every document, so they're out of the game.
The other offices will likely have an answer for each of the questions above (and probably will be surprised that the answer is not obvious to everyone), but they'll all have different answers.
This is not misalignment, the point is that everyone has a different goal and measure her success on different metrics. So the sales manager should count in returns but not those caused by product quality issues; marketing will need to include the merchandising sales; production will be interested more in returns than anything else and the top management, by definition, should care about everything but should also be able to clearly discriminate each component.
Everybody will have a different number, but it must be the number which actually measures her performance and must be confronted with.
Sales are only the most immediate example but the same happens in every other analysis topic. Costs, for example, are even more complex.
Our single version of truth paradigm so becomes something different. What's crucial is the ability to relate each metric to the others and decompose the various effects which account for variations. There must always be a calculus through which the sales according to group A are transformed to the sales according to group B.
I often found useful sharing this schema with all the senior managers involved in the project. This extinguishes potential arguments from the beginning and promotes alignment among all the particular goals toward a common economic result.
This usually works well till the CEO abruptly asks for the sum of all invoices…
Enjoy
I've been a BI consultant since the beginning of my career. My experience ranges from small projects with a handful of users to large corporation level systems. I've been working in luxury offices with moquette and boiserie and in country settlements with the smell of the cattle coming in through the windows.
Despite this wide range of different environments, there is one element that accomunates them all: the ansiety to demonstrate that the project delivers actual value.
Better than ansiety, I'd say hurry, when not frenzy.
BI, opposed to transactional systems, is often hard to communicate when it is first introduced in a company environment.
From a commercial point of view it is relatively easy to rely on special effects and eye candy to sell the package. Costs often require the approval from people whose job is to run the company as a whole and does not have the specific knowledge required to anticipate what a BI project is really like.
So it is almost unavoidable to create unjustified expectations. If I told you that the bulk of the job is ETL and data quality, would you have bought all the same? If you had known the effort required to define a common intelligence framework, a common language, common data rules, would you have signed the contract?
THe answer is often a clear "no". You had kept going with your half baked reporting widgets.
So, what is the perceived way out? It is to think big and start small. Just start delivering a chunk of data, just start working with a group of users and bring them live as soon as you can. Just start delivering value and people will realize that this datawarehouse thing is helpful for sure.
But.
If development has been too fast, if users have not been properly involved, you are likely to commit a suicide that will permeate the entire project, if any remains. If results are unreliable and people did not buy in the project, you are doomed.
The initial stages of a BI project are a true minefield. An erp project fails at go-live, BI fails when you sit at the table for the second time. What's good is that, once the initial stage is succesfully over, everyone can breathe freely, and the following stages are easier to go through.
Of course, we BI professional do not need this sort of reassurance:We are able to see how the system will look like, to assess advantages and the ROI. It is rather easy if you know what you are talking about. The point is that few senior level managers have the culture to understand our point. They have no doubt about needing an erp to manage daily operations, but often do not get at all why they may need this datawarehouse thing. Usually a message like "consolidation will be easyer" or "you can have daily figures on your Blackberry" are effective but they subside as they are, giugno giving at best a partial image of the project. Nothing is more embarassing then a CEO asking the CIO for how the new consolidation system is going, and the CIO answers "what system?".
As usual there is no substitute for deep domain knowledge, understanding and trust. It is no longer cool to admit of knowing nothing about technology, a thorough knowledge of IT related issues should be part of a CEO knowledge no less than financials or marketing. Till that point our lives will keep being much more complex than necessary.
|