Business intelligence: December 2013

Saturday, 28 December 2013

Data Flow Task

A data flow defines a flow of data from a source to a destination. You do not start on one data flow task and move to the next. Data flows between your selected entities (sources, transformations, destinations).

Moreover within a data flow task, you cannot perform tasks such as iteration, component execution, etc

Data flow - are for just moving data from one source to another.

Control flow - provide the logic for when data flow components are run and how they are run. Also control flows can: perform looping, calling stored procs, moving files, managing error handling, check a condition and then call different tasks (incl data flows) depending on the result), processing of SSAS cubes.

If you've moving data from one location to another and it's the same each time, not based on any other condition then you can get away with a package with just a dataflow task, but in most cases packages are more complex than that.

We use the control flow for many things. First all our data concerning the data import is stored in tables. So we run procs to start the dataflow and end it, so that our logging works correctly, we do looping through a set of files, we move files to archive locations and rename with the date and delete them from processing locations. We have a separate program that does file movement and validates the files for the correct comlumns and size. We run a proc to make sure the file has been validated before going into the dataflow. Sometimes we have a requirement to send an email when a file is processed or send a report of records which could not process. These emails are put into the control flow. Sometimes we have some clean up steps that are more easily accomplished using a stored proc and thus put the step in the control flow.

Wednesday, 25 December 2013

Summary of Dimensions and Facts :-

1.)What is Dimension Table ?

Dimensions are Tables ,Attributes are columns in Dimension Tables. Attributes Describe the characteristics of data such as product size and product color . Through attributes you can group and analyze members of dimension based on characteristics. Dimension tables are referenced by fact table using keys. When creating a dimension table in a data ware house, a system-generated key is used to uniquely identify a row in the dimension . The surrogate key is used as the primary key in the dimension table . The surrogate key is placed in the fact table and a foreign key is defined between the two tables .when the data is joined ,it does so just as any other join within the database.

2.)What is Fact Table ?

Measures are columns in Fact Tables.

Fact tables have foreign key columns referring to the Dimension Tables along with Measure Columns.Measure column values can be used to perform some sort of mathematical function.
In general dimension tables holds the description /textual information about facts stored in fact tables.

Dimension tables are referenced by fact table using keys.

Example:-
Measure is a revenue of a company. you find average revenue and overall revenue
What if one wants to know Revenue per Day or Revenue per Product or Revenue per Country?
These are Dimensions : Date,Product ,Country .Dimensions give you context for measure.
An attribute is similar to dimension and sometimes is used interchangeably. However ,one
Dimension can have multiple attributes :what day of the year is a date ?What week of the year is a date ?