Monday 30 December 2013

Measures, Attributes and Hierarchies

Measures

       A measure represents a data column which contains quantifiable data, usually numeric, that can be aggregated. A measure is generally mapped to a column in a fact table.


      In SSAS measures are grouped by underlying fast tables, which called Measure Groups. Measure Groups are used to associate dimensions with measures.


      Measure are selected while creating Cube, which can later be modified for various usage like its formatting, aggregate behaviour etc. Click here to see how to modify measure for formatting. 


Attributes 


       Attribute represent one or more column in dimension table. Attribute is used to particular expect of the dimension. For example product colour  represent colour attribute of the product dimension, which can later be used for slicing/dicing/selecting the cube values. Each dimension in cube contains one Key attribute, which is generally primary key in dimension table.


     Click here to see how to define/update attribute for dimension.


Hierarchies


       As name suggest Hierarchies is used to define hierarchy in dimension. For example Product and Product Category, National and International region etc. Hierarchy can be defined in Hierarchies table in cube design.


     Click here to see how to add/delete user-defined Hierarchy.

Monday 16 December 2013

Tool to view big data file

I have to load big data files using SSIS or SQL server Import/Export wizard. Most of the time it works fine but if it fails and I have to view the data to debug then it is simply impossible to view the data because most of the text editor cannot open file which too big (size in multiple of GBs). So  I have developed small tool which can help to read or view a big file which cannot be opened in normal text editor.  This tool can also be use full if you are developing SSIS package to load big file and want to create small sample file from big file during development.

Click here to download the tool.

Features of the tools

 - Load big file (no limit on size)
 - View data page by page. Page size is configurable.
 - Save page in different file.
 - Go to any page in file.
 - Displays line number across  pages.

Limitations.
 - It uses Temp directory to store data pages while opening file. So virtually it can occupy similar disk space as the original file.
- It can only read file line by line. So if lines are not separated by {CR}{LF} then it cannot do data paging.
- It can only work with ANSI data.


Read big data file | View big data file | Open big data file | tool to open big file | tool to read big data file | Preview big data file | Get sample data from big data file | Read specific data from big file | View specific data from big file

Dimension and Cube



Dimension
        A database dimension is a collection of attributes usually related objects, which can be used to provide elaborated information about fact data in cube(s). Typical example of attributes in a Employee dimension might be Employee name, Employee Designation, Employee Qualification, Joining Date etc. Dimensions are linked to one or more columns in one or more tables in a data source view. By default, these attributes are visible as attribute hierarchies and can be used to understand the fact data in a cube. Attributes can be organized into user-defined hierarchies that provide navigational paths to assist users when browsing the data in a cube.

Click here for step by step instruction to create dimension in using existing table in SSDT.
Click here for step by step instruction to create dimension in using Dimension wizard in SSDT.

Cube 
         A cube is a multidimensional object that contains information for analytical purposes; which contains dimensions and measures. Dimensions define the structure of the cube that is used to slice and dice over, and measures provide aggregated numerical values of interest to the end user. As a logical structure, a cube allows a client application to retrieve values, of measures, as if they were contained in cells in the cube; cells are defined for every possible summarized value. A cell, in the cube, is defined by the intersection of dimension members and contains the aggregated values of the measures at that specific intersection.

Click here for step by step instruction to create Cube in using Cube wizard in SSDT.

Friday 29 November 2013

Data Source and Data Source View

Data Source
 
       A data source in Analysis Services specifies a direct connection to an external data source. In addition to physical location, a data source object specifies the connection string, data provider, credentials, and other properties that control connection behaviour.

      Information provided by the data source object is used during the following operations:

  • Get schema information and other metadata used to generate data source views into a model.
  • Query or load data into a model during processing.
  • Run queries against multidimensional or data mining models that use ROLAP storage mode.
  • Read or write to remote partitions.
  • Connect to linked objects, as well as synchronize from target to source.
  • Perform write back operations that update fact table data stored in a relational database.
      When building a multidimensional model from the bottom up, you start by creating the data source object, and then use it to generate the next object, a data source view. A data source view is the data abstraction layer in the model. It is typically created after the data source object, using the schema of the source database as the basis. However, you can choose other ways to build a model, including starting with cubes and dimensions, and generating the schema that best supports your design.

       Regardless of how you build it, each model requires at least one data source object that specifies a connection to source data. You can create multiple data source objects in a single model to use data from different sources or vary connection properties for specific objects.

       Data source objects can be managed independently of other objects in your model. After you create a data source, you can change its properties later, and then pre-process the model to ensure the data is retrieved correctly.

Click here for step by step instruction to create data source.

Data Source View

         A data source view (DSV) is an abstraction of a relational data source that becomes the basis of the cubes and dimensions you create in a multidimensional project. The purpose of a DSV is to give you control over the data structures used in your project, and to work independently of the underlying data sources (for example, the ability to rename or concatenate columns without directly modifying the original data source).

        You can build multiple data source views in an Analysis Services project or database on one or more data sources, and construct each one to satisfy the requirements for a different solution.

Click here for step by step instruction to create data source view.

Thursday 28 November 2013

SSAS Tutorials : Multidimensional Modeling

SSAS Multidimensional Modelling Tutorials

  1. Data Source and Data Source View
  2. Dimension and Cube
  3. Measures, Attributes and Hierarchies
  4. Advanced Attribute and Dimension Properties
  5. Relationships Between Dimensions and Measure Groups
  6. Calculations
  7. Key Performance Indicators (KPIs)
  8. Actions
  9. Perspectives and Translations
  10. Administrative Roles

SQL Server Analysis Services (SSAS) business intelligence (BI) semantic modelling options

SQL Server Analysis Services (SSAS) provides three different options to create business intelligence semantic model. Namely those 3 options are tabular, multidimensional and PowerPivot.

  • Tabular model uses relational database modelling objects such as tables, views and relationships for modeling data, and the xVelocity in-memory analytics engine for storing and calculating data.
  • Multidimensional model uses OLAP modelling constructs such as cubes and dimensions and uses MOLAP, ROLAP, or HOLAP storage.
  • PowerPivot model is a self-service BI solution that lets business analysts build an analytical data model inside an Excel workbook using the PowerPivot for Excel add-in. PowerPivot also uses xVelocity in-memory analytics engine within Excel and on SharePoint.


Table below summarizes feature available for each modelling option.


 MultidimensionalTabularPowerPivot
ActionsYesNoNo
AggregationsYesNoNo
Calculated MeasuresYesYesYes
Custom AssembliesYesNoNo
Custom RollupsYesNoNo
Distinct CountYesYes (via DAX)Yes (via DAX)
DrillthroughYesYesYes (detail opens in separate worksheet)
HierarchiesYesYesYes
KPIsYesYesYes
Linked objectsYesNoYes (linked tables)
Many-to-many relationshipsYesNoNo
Parent-child HierarchiesYesYes (via DAX)Yes (via DAX)
PartitionsYesYesNo
PerspectivesYesYesYes
Semi-additive MeasuresYesYesYes
TranslationsYesNoNo
User-defined HierarchiesYesYesYes
WritebackYesNoNo
 
See below links to learn about each modelling option.


Wednesday 20 March 2013

How to setup database mirroring session in MS SQL server.

Following is step by step guide for setting up database mirroring session. There are 2 major steps involved in setting up database mirroring setup.
  1. Prepare mirror database.
  2. Configure database mirroring session.
 
Now let's see both of these steps in detail.
 
Prepare mirror database.
  1. Take full backup of the primary database. You can use existing full backup file if you have available one. Before taking full backup make sure that database recovery model is set to full. To check recovery model Right click on database - Click on property - Select Options Tab.
  2. Restore full back from Step - 1. This restore must be done using WITH NORECOVERY option. And resored database name must be same as primary database.
  3. Restore differential backup (if any) taken after full backup used to restore in step 2. Also restore log backup (if any) taken after full backup or differential backup. All these restore must use WITH NORECOVERY option.
 
Now your secodary database is ready to start. Now lets see how to configure actual mirroring session.
         
 
Configure databae mirroring session.
 
  • Open database mirroring setup wizard in SSMS. Right click on database - Select Tasks - Click on Mirror... OR Right click on databaes - Click on Property - Select Mirroring Tab
  •  
     
  • Open Databaes Mirroring Security Wizard by clicking on Configure Secutity... button.
  •  
  • Click on Next. To make things simple we will skip witness setup option. So select No on Include Witness Server page and click next. Deselect witness server instance and click next.
 
     
  • This will open Primary server Instance wizard. Enter end point name and Listener port (leave default for this example) and click on Next.
 
     
  • Click on connect to Secodary Serever. Provide authentication details and connect to secodary server. Leave other details as default and click on next.
 
  • Now your session is almost ready. Click on finish. Which popup below window. You can start mirroring immidiately or start later form main mirroing tab. See below screen.
 
 
 
You can also generate script before clicking on Ok on main tab. Scripit is use full for documentaion and future reference. So it is always advisable to generate script and save somewhere safe, probably in source control.
 

T-SQL reference for Database Mirroring

  • How to manually failover to partner server?
There are 2 ways to manually failover to partner server in SQL server database mirroring session.
First is from primary server server and second is from secondary server. See below script for partner failover.

--1. Manual Failover from Primary server
ALTER DATABASE <Database_Name> SET PARTNER FAILOVER;

--2. Manual Failover from Secondary server
ALTER DATABASE <Database_Name> SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS;

Failover from primary server does not cuase any data loss but if you do it from secondary server then you are forcing secondary server to loss data. Data loss will only happend if primary server is not accesible while failover.
  • How to remove database mirroring session?
Use below script to remove database mirroring session. This script can be used on either primary server or on secondary server.

--Remove mirroring for database
ALTER DATABASE <Database_Name> SET PARTNER OFF;

  • How to stop and start databaes mirroring session?
Use below script to pause and resume the database mirroring session. You can SUSPEND mirroring session from either primary server or from secodary server. But It can only be RESUMEd from primary server.
 
--Pause/Suspend database mirroring session
ALTER DATABASE <Database_Name> SET PARTNER SUSPEND;
--Resume database mirroring session
ALTER DATABASE <Database_Name> SET PARTNER RESUME;
  • How to add/remove witness server to/from already running database mirroring session?
Use below script to add or remove witness server to/from already running database mirroring session

--Add witness server to database mirroring session
ALTER DATABASE <Database_Name> SET WITNESS <witness_server>;

--Remove witness server from database mirroring session
ALTER DATABASE <Database_Name> SET WITNESS OFF;

SET WITNESS is only allowed on primary server.
 

Wednesday 13 March 2013

Convert rows into columns using PIVOT and columns into rows using UNPIVOT

You might have used pivot functionality in Microsoft Excel. Same you can do in MS SQL using  PIVOT and UNPIVOT statement. PIVOT is used to convert values in rows to use as column header and UNPIVOT does the reverse of PIVOT. See below example for better understanding.


--PIVOT examle

IF OBJECT_ID('tempdb..#sales') IS NOT NULL
      DROP TABLE #sales

CREATE TABLE #sales (ProductName VARCHAR(200), SaleDate DATE, SalesQty INT)
INSERT INTO #sales(ProductName, SaleDate, SalesQty)
      VALUES('Bike', '2013-01-04', 56)
      ,('Bike', '2013-02-25', 45)
      ,('Bike', '2013-01-26', 87)
      ,('Bike', '2013-03-01', 12)
      ,('Bike', '2013-05-04', 62)
      ,('Bike', '2013-05-04', 73)
      ,('Bike', '2013-04-27', 34)
      ,('Scooter', '2013-03-12', 13)
      ,('Scooter', '2013-04-05', 75)
      ,('Scooter', '2013-01-09', 24)
      ,('Scooter', '2013-01-16', 76)
      ,('Scooter', '2013-01-16', 23)
      ,('Scooter', '2013-05-14', 36)
      ,('Scooter', '2013-01-16', 68)
      ,('Scooter', '2013-02-16', 88)

SELECT ProductName, Jan, Feb, Mar, Apr, May
FROM
      (SELECT ProductName, LEFT(DATENAME (month, SaleDate), 3) [monthName], ISNULL(SalesQty, 0) AS SalesQty  FROM #sales) AS A
PIVOT
      (
      SUM(SalesQty) --Aggregate function
      FOR [monthName] IN (Jan, Feb, Mar, Apr, May) --Column list
      ) AS B

IF OBJECT_ID('tempdb..#sales') IS NOT NULL
      DROP TABLE #sales

 PIVOT has two important section first is aggregate function (see comment in above query) and second is FOR section which contains list of column names and original column which contains the column names.

See below for UNPIVOT example. I have used ouput of PIVOT query as input of UNPIVOT query.


--UNPIVOT example
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
      DROP TABLE #sales 

IF OBJECT_ID('tempdb..#sales_PIVOT') IS NOT NULL
      DROP TABLE #sales_PIVOT           

CREATE TABLE #sales (ProductName VARCHAR(200), SaleDate DATE, SalesQty INT)

INSERT INTO #sales(ProductName, SaleDate, SalesQty)
      VALUES('Bike', '2013-01-04', 56)
      ,('Bike', '2013-02-25', 45)
      ,('Bike', '2013-01-26', 87)
      ,('Bike', '2013-03-01', 12)
      ,('Bike', '2013-05-04', 62)
      ,('Bike', '2013-05-04', 73)
      ,('Bike', '2013-04-27', 34)
      ,('Scooter', '2013-03-12', 13)
      ,('Scooter', '2013-04-05', 75)
      ,('Scooter', '2013-01-09', 24)
      ,('Scooter', '2013-01-16', 76)
      ,('Scooter', '2013-01-16', 23)
      ,('Scooter', '2013-05-14', 36)
      ,('Scooter', '2013-01-16', 68)
      ,('Scooter', '2013-02-16', 88)

SELECT ProductName, Jan, Feb, Mar, Apr, May
      INTO #sales_PIVOT
FROM
      (SELECT ProductName, LEFT(DATENAME (month, SaleDate), 3) [monthName], ISNULL(SalesQty, 0) AS SalesQty  FROM #sales) AS A
PIVOT
      (
      SUM(SalesQty)
      FOR [monthName] IN (Jan, Feb, Mar, Apr, May)
      ) AS B

SELECT ProductName, [MonthName], SalesQty
FROM
      (SELECT ProductName, Jan, Feb, Mar, Apr, May FROM #sales_PIVOT) AS A
UNPIVOT
      (
      SalesQty FOR [monthName] IN (Jan, Feb, Mar, Apr, May)
      ) AS B
 
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
      DROP TABLE #sales

IF OBJECT_ID('tempdb..#sales_PIVOT') IS NOT NULL
      DROP TABLE #sales_PIVOT

One important point to remember here is UNPIVOT is reverse process of PIVOT, but only limited for conversion of columns to rows. It does not regenerate the original result. This is due to aggregate function used by PIVOT. In our example we have used SUM function and there is no reverse function available for SUM to generate original value. There are some senarios where we can generate the original output using UNPIVOT. Please post if you have such example.    

Friday 8 March 2013

Script to get the dababase backup history.

Often I have to get the backup history for reporting purpose. So I have writtent small script which give all details of database back history which is required for my report. I have used backupset and backupmediafamily tables to get the database backup history. Both of these tables resides in msdb database. So for some reason if you have restored msdb database. Then it will show history till the restored database time. Please see below script to get the database backup history.
 


SELECT
      A.database_name
      ,CASE A.[type] WHEN 'D' THEN 'Full' WHEN 'I' THEN 'Differential' WHEN 'L' THEN 'Transaction Log' END AS backup_type
      ,B.physical_device_name
      ,CAST(CAST(A.backup_size / (1024*1024) AS INT) AS VARCHAR) + ' ' + 'MB' AS size
      ,CAST(DATEDIFF(S, A.backup_start_date, A.backup_finish_date) AS VARCHAR) + ' ' + 'Seconds' time_taken
      ,A.backup_start_date
FROM msdb.dbo.backupset A
      INNER JOIN msdb.dbo.backupmediafamily B ON A.media_set_id = B.media_set_id
ORDER BY A.database_name
      , backup_start_date DESC
 
This script gives history of all database, if you need details for only one database then restict result by using WHERE caluse on database_name field. 
 
For more information on backupset and backupmediafamily tables, see below links.
 

How to load data from PDF file in SSIS package using script component


I have come across the case where I have to read PDF document, parse it and write content to database using SSIS. PDF document is not usual source for any data so there is no component available to read PDF file from SSIS.  I have used iTextSharp library in Script Component to read the PDF file from SSIS package.  I also found very good link where I found good example on how to use iTextsharp library.
 
Using iTextsharp library it is very easy to process PDF document. In my case it was simple case of reading PDF document page by page and parses the content using custom logic. You can do much more than just reading PDF document using this library.

To read data from PDF file and write to database, I have used Script Component as data source, which reads the PDF document and return result and output record of Script Component. The simply used OLE DB destination to write it to database. You might be aware that any third party dlls can be used in SSIS Script component by referencing it. But to add it as reference and use it in script component you have to register dll in GAC. It can be referenced without adding it but dll might not be available on the production environment at the same location. So better add them to GAC. In my case I was not allowed to add dll to production server directly, so I used .Net reflection to load the dll dynamically and added dll location as configuration in SSIS package.

Below is my code


//Below two variable are used to pass ITextSharp dll location and PDF file path.
//Which ultimately supplied from dts config file.
String configITextPath = Variables.itextdllfilepath;
String configSourceDataFile = Variables.filepath;

//Load ITextSharp library           
Assembly assembly = Assembly.LoadFrom(configITextPath);

//Get PdfReader object and create its instance
Type readerType = assembly.GetType("iTextSharp.text.pdf.PdfReader");
object reader = Activator.CreateInstance(readerType, new object[] { configSourceDataFile });

Type extractorType = assembly.GetType("iTextSharp.text.pdf.parser.PdfTextExtractor");

Type extractionStrategyType = assembly.GetType("iTextSharp.text.pdf.parser.LocationTextExtractionStrategy");

object extractionStrategy = Activator.CreateInstance(extractionStrategyType);

//Get number of pages in PDF document
Int32 pages = (Int32)reader.GetType().GetProperty("NumberOfPages").GetValue(reader, null);

//Read PDF document page by page and process it and generate Output rows for script component.
String pageText;
List<PepRecord> pepRecords = new List<PepRecord>();
for (int page = configStartPageNumber; page <= pages; page++)
{
    reader = Activator.CreateInstance(readerType, new object[] { configSourceDataFile });
    extractionStrategy = Activator.CreateInstance(extractionStrategyType);
    pageText = (String)extractorType.GetMethods()[0].Invoke(reader, new Object[] { reader, page, extractionStrategy });

   //code to process PDF file content and create Output rows for script component   
}
 
I know better way to do this is to add dll in GAC and refer from there. But in my case I have to quickly found the solution to process few PDF document suing SSIS package, so I have used reflection to load it.