Tuesday, March 19, 2013

KEY Lookup & RID Lookup

In sql server there are lot of terms which confuse the developer many times about the functionality but while understanding indexing it is very important to understand these as by properly using it the query performance could be increased.

Here i will quickly discuss about two terms i.e. KEY LOOKUP(Previously known as bookmark lookup) and RID Lookup.


KEY LOOKUP:

If a table has a clustered index on it is then it is called as KEY LOOKUP. In sql server data is stored is stored in Binary Tree format & by following this way data is stored. So here the table having a clustered index have a key point to data information.


RID LOOKUP (ROW IDENTIFIER LOOKUP):

In a case where a table does not have clustered index, but the non-clustered index is created on it then it is called RID lookup.

This operation is very expensive as syetem needs to scan through the B-Tree heap with the row Identifier.
 
The best suggested for overcoming these scenarios is to have a clustered index on a table.

Tuesday, February 19, 2013

Joins in SQL SERVER

Sql Server is all about data & the data may be stored on the multiple places(tables) which could be queried to trace information at any point of time. Whenever thinking about retrieving the concerned information from all the tables in the form of report the joins come in picture. Joins are an very important part of database & without using them it is really impossible to even think about database.

These are the most used & main type of joins in SqlServer,
  1. Inner Join
  2. Outer Join
  3. Cross Join

Use below mentioned script on any of your test database for understanding the various types with example.

CREATE TABLE STUDENTS(STUDENTID INT IDENTITY(1,1), NAME VARCHAR(100), MOBILENO BIGINT)
CREATE TABLE SPECIALCOURSE(ID INT IDENTITY(1,1), STUDENTID INT, STATUS BIT)

INSERT INTO STUDENTS(NAME, MOBILENO)
SELECT 'AJAY',9797979797
UNION ALL
SELECT 'SANJAY',1234567890
UNION ALL
SELECT 'SAMMY',9874563210
UNION ALL
SELECT 'SAMUAL',8789456512
UNION ALL
SELECT 'GANESH',7412589630
UNION ALL
SELECT 'MAHESH', 9632587400

INSERT INTO SPECIALCOURSE(STUDENTID, STATUS)
SELECT 1,0
UNION ALL
SELECT 3,1
    Inner Join

1.1) Inner Join:
Inner join is the most used join type in SqlServer. Whenever trying to check the concerned data only from the system the inner join is used. Inner join is very useful whenever user needs to get the only matching data from the concerned tables. To use this join type the matching columns from all the required tables needs to be mention so while executing query SqlServer engine consider the specified column for returning the proper information.

Ex. Code:
SELECT A.STUDENTID, A.NAME, A.MOBILENO, B.STATUS
FROM STUDENTS A INNER JOIN SPECIALCOURSE B ON A.STUDENTID = B.STUDENTID

1.2)Self Join:
Another type of inner join is Self join which could be defined as join within same table for extraction of related data within same table. The best example for this will be Manager/Employee relationship between same table. Unfortunately I have not designed my table which has this relationship otherwise I could have showed the example query.

Outer Join
2.1) Left Outer Join:
      Whenever we wish to get all the data from the left table we need to use Left Outer Join.. Left join will return all the matching rows from the left table & in case of no match it will return NULL.
      It is really useful when we want to find all the information from one side table with the relevant information from second table.

Here the left table is STUDENTS & in second table i.e. SPECIALCOURSE there is some information about few students who have joined special course.
Ex. Code:
SELECT A.STUDENTID, A.NAME, A.MOBILENO, B.STATUS
FROM STUDENTS A LEFT JOIN SPECIALCOURSE B ON A.STUDENTID = B.STUDENTID

2.2) Right Outer Join:
There may be a situation during the query when user needs information from second second & the matching or non-matching records from first table. In these scenarios developer will not swap the table from left to right as there is already a provision called Right Outer Join where the purpose can be achieved by using Right Outer Join.
In this scenario user will receive all the the information from second(Right) table & the matching information from first table.

Ex. Code:
SELECT A.STUDENTID, A.NAME, A.MOBILENO, B.STATUS
FROM STUDENTS A RIGHT JOIN SPECIALCOURSE B ON A.STUDENTID = B.STUDENTID


2.3) FULL Outer Join:
FULL outer join do not look for the match between the two tables & returns all the information from both the tables. This kind of results may be required when there is a need of all the data from system which both the tables.

Ex. Code:
SELECT A.STUDENTID, A.NAME, A.MOBILENO, B.STATUS
FROM STUDENTS A FULL OUTER JOIN SPECIALCOURSE B ON A.STUDENTID = B.STUDENTID


CROSS JOIN

Working with Cross join like working with multiplication between tables. Whenever there is some kind of situation when a user needs to map all the records from left table with all records of left table then in this situation the Cross join is brought into picture.

Here in this below query Students table 6 records 6 records & SPECIALCOURSE table has 2 records. As mentioned above it will return the multiplication the output will be 6 * 2 = 12 records & in output all the students will be mapped to the all SPECIALCOURSE records.

Ex. Code:
SELECT A.STUDENTID, A.NAME, A.MOBILENO, B.STATUS
FROM STUDENTS A CROSS JOIN SPECIALCOURSE B

Note: If the any of the two tables do not have record then there will be no output from this join.
Like other join type this join type needs not to have a ON condition.

Friday, February 8, 2013

Normalization VS De-Normalization

Normalization & De-Normalization

Normalization can be described as a breaking up of the information into multiple parts or logical units in the form of tables. The concept of normalization is used for organizing the data in database in a managed manner where the redundancy of data does not happen & data remains accurate.
Normalization is basically a process of reducing redundancy i.e. Data repetition in database for similar type of information where-in De-normalizations only limited number of tables are created. There are lot of many more merits & demerits of Normalization & De-normalization so let’s look them.

 Normalization:
As mentioned above Normalization is a process of reducing redundancy i.e. Data repetition in database for similar type of information so While applying normalization on database the data is spilt into multiple sets of table like, master tables are created for each category of information so that the same string do not gets repeated into system. Normalization is a process of set guidelines which needs to be followed for implementing normalization.

A non-normalized database may contain some tables which may contain the most of data into a few tables. There are some points which indicate the general problems of non-normalized database.
1.       Data is stored in few tables only instead of multiple tables but this is not good as data is easily identical which is bad practice from the security perspective

2.        Query performance will be affected as the table will grow drastically

3.       Activities like Update will be affected causing the poor efficiency of database and Data integrity.

In a process of applying normalization on database few points needs to be keep in mind among which some are as follow,

·         Data should not be redundant (Duplicate) or very minimum for some cases for ex. In a candidate management system where candidates information is stored. Here in this case the address & other optional information is not required to stored in a same table as it is not always required while searching basic information about candidate.
In the process of normalization naming convention is always an important point because in vice-versa to de-normalization the data is split into multiple tables & during the needs the tables needs to be joined with one another & in this case the identical & proper naming convention will play a key role.

Normalization benefits the database in many ways some of them are as follow,

1.       Proper database organization

2.       Avoiding redundant data

3.       Database consistency and integrity in a which assures the accurate & co-related data within same database.

4.       Proper & Secure database system as an unknown & unfamiliar user cannot find & understand data easily.

5.       Referential integrity means the values of one table is dependent upon the one or more than one number of columns

6.       Having a parent child relations between tables so that any referential data is not lost

As like the other thing normalization also has some drawbacks which is that due to heavy normalization during the processing of data during the insert, update or select all the relational objects are required to be bring in use which cause the high CPU use & IO operation increase so although the huge benefits the normalization should be carried out wisely.

De-Normalization:

Almost described most of the differences between normalization & de-normalization & the various pros & cons of both concepts

Description of de-normalization in simple terms “Instead of storing data into multiple space a few number of tables where most of the data can be stored for stress free environment without caring about too much of data hiding methodology”

Wednesday, January 16, 2013

Temp Tables Vs Table Variables

It’s always a tricky decision while thinking about using Temp Table of Table Variable. Let’s look into this article to check the differences between these two.

#Temp tables are the regular SQL tables that are defined and stored in TempDB database of Sql server. It is more or less equal to permanent tables but the only difference with #table is that could not have foreign key over it.

While going through the rest of the article we will look into the various important aspects of both the objects i.e. (#table & @table variable).


How to create / declare policy:
#Table: Temp tables could be created by writing table creation script or by writing the insert into script. Both the ways Process ID specific #table get created into TempDB database.

@Table Variable: Table variables can only be created by specifying Declare @name as Table syntax. Creating a table variable is little different from Create table syntax because we have to declare table variable like the way we declare local variables.

Note: Table variables cannot be created with the Select into syntax.

Lifespan of #table & @table variable:
Temporary table are of two type namely #table (Local temporary table) & ##table (Global temporary table). The lifespan of local temporary table is limited till the specific user session. Explaining in brief whenever any Process ID creates a temporary table then that #table gets created into TempDB database with that Process ID Specific access rights. If some other process ID also creates the #table with the same name then also without creating any mismatch between the names Sql engines creates the temp table for another user & keeps them unique for each process use.

Lifespan of #table is limited till the user does not forcefully drop the table or close the connection. Once the connection is closed or #table is dropped the #table gets flushed from the TempDB database.

@Table Variable: The lifespan of table variables are limited till the execution of process or execution of queries. Once the execution of queries ends the data & table architecture gets flushed from the Sql database memory.

Performance Parameter:

While comparing on the performance platform we need to discuss various parameters.

#Table: #tables are physically stored on system drive due to which it does not use the system buffer the way table variable uses. Index creation is permitted on #table which enables the user to fetch the data with the use of Index in faster manner. #table can be used to store huge data without harming the system performance. The only burden which #table put’s on system is its Read & Write operation.
#table participate in transactions so whenever the transaction related request occurs #tables behave accurately & revert back to the original positions.

@Table Variable: A myth which is associated with @table variable is that it should be used for the smaller data size operation because it uses the Sql buffer for storing & managing data which may cause the performance issues while working with huge data size.
@table variable do not participate in transactions & do not behave as per the transaction rules.

Note: #Tables causes a recompilation due to which a same execution plan could not be used. The cost of Read / Write operation of table variables are less as compared to #tables due to which it shows good performance while we need to write & then read data from processed table.

Thursday, January 10, 2013

Merits of Using Stored Procedure over Ad-Hoc queries

Stored procedure is a set of Structured Query Language statements with a user defined name. Stored procedures are physically stored within the database in precompiled form for ready to use purpose allowing applications to use it by any number of times & by any number of programs.

Ad hoc queries are the open queries which are written at application side or in SSSMS generally for requirement specific purpose.

Problems of Ad-hoc Queries:
As mentioned in description ad-hoc queries are open queries which are generally used for some specific requirement. The major demerit of using ad-hoc queries are,

Ad-hoc queries are not stored in Sql server causing no pre-stored query plan due to which whenever the ad-hoc is run the first thing Sql engine does is that it generates the executions plan every time where in case of Stored procedures a precompiled plan is ready used. Each time plan generation activity causes the execution duration & delay in result presentation.

Ad hoc queries can kill performance and some time it is hard to control complex logics through these ad hoc queries. Store procedures are the best choice to accomplish these data processes. Let’s see some important information about the same,

·         Reduce Network Traffic: Excessive network traffic is a big performance killer in production databases. In a case where users are frequently pushing request to database server through some client application (because of ad hoc queries) this may cause a excessive network traffic because the complete ad-hoc request will hit the server via network & depending on the size of query & data size the network will be used in same proportion. In this situation Store Procedures will help in reducing such network traffic by holding group of statements and returning required result with a single call.
Stored procedures are stored in database system & can be bring into use via calling by name & appropriate procedures.

Note:  Lengthy transactions in store procedures should be avoided to prevent lock problems. In real world this have been observed that while using transactions into procedures most of the developers put the complete script into transactions (some selects from huge tables etc.) which causes the other objects to be busy during the complete execution of stored procedure.

·         Privilege based execution
Stored procedures can be grant to be executed as per the user privileges. Users can be restricted from having access to read/write to tables directly in database by using store procedures. Only developer of store procedure require specific privileges while creating a store procedure but to execute these store procedures client of application only need execute privileges.

·         Re-use of Execution Plan
As mentioned in fist point Store procedures are compiled once and resultant execution plan are utilized for future executions. This results in tremendous performance boosts when store procedures are called repeatedly.

·         Efficient Re-use of Code
Commonly used store procedures can be effectively used for different projects.
For example, In case of financial application the logic for printing the amount on various documents can be used on various instances just by calling the concerned stored procedure by providing the proper parameter.

·         Ease of Maintenance
     In real world change in business rules defined for a project, over a time is normal. If such business rules are controlled with in store procedures rather than ad-hoc then it is very for the maintenance. In this case developer need to make changes in every piece of code & recompile.


Wednesday, January 9, 2013

How to find the last database backup restore information

Some Times we need to find the date/Time information when the database backup was restored on some database. In this situation below mentioned query will help a lot in finding this information including the database name, Restore date etc.


SELECT
rs.destination_database_name, rs.restore_date, bs.backup_start_date, bs.backup_finish_date,
bs.database_name as source_database_name, bmf.physical_device_name as backup_file_used_for_restore FROM msdb..restorehistory rs INNER JOIN msdb..backupset bs
ON rs.backup_set_id = bs.backup_set_id
INNER JOIN msdb..backupmediafamily bmf ON bs.media_set_id = bmf.media_set_id ORDER BY rs.restore_date DESC;

Properties of Relational Tables in Sql Server


Sql Server database is a complete RDBMS system as it follows all 12 rules of CODD which defines a database system as RDBMS. For making a RDBMS or DMBS system proper as database design for table should also be in a proper defined manner for the ease of access.


There are 6 properties of Relational Tables in sql server and they are as follow,


1) Values Are Atomic
This property implies that columns in a relational table are not repeating group. The key benefit of the one value property is that it simplifies data manipulation logic. This property of tables are even referred to be in the “first normal form” (1NF) of normalisation.


2) Referencing Column Values Are of the Same Kind
In relational terms this means that all values in a column belong to same type. This property simplifies data access because developers and users can be certain of the type of data contained in a given column. It also simplifies data validation. Because all values are from the same domain, the domain can be defined and enforced with the Data Definition Language (DDL) of the database software.


3) Each Row is Unique
This property ensures that no two rows in a relational table are identical. there is at least one column, or set of columns, the values of which uniquely identify each row in the table. Such columns are called primary keys. This property guarantees that every row in a relational table is meaningful and that a specific row can be identified by specifying the primary key value.


4) The Sequence of Columns is Insignificant
This property states that the ordering of the columns in the relational table has no meaning. Columns can be retrieved in any order and in various sequences. The benefit of this property is that it enables many users to share the same table without concern of how the table is organized. It also permits the physical structure of the database to change without affecting the relational tables.


5) The Sequence of Rows is Insignificant
This property is analogous the one above but applies to rows instead of columns. The main benefit is that the rows of a relational table can be retrieved in different order and sequences. Adding information to a relational table is simplified and does not affect existing queries.


6) Each Column Has a Unique Name
Any database system does not allow duplicate column names in database table however this is defined as a property that column name need to be unique within the table.