200 + SQL - Server - Interview - Questions - by Shivprasad Koirala
200 + SQL - Server - Interview - Questions - by Shivprasad Koirala
Ask for interview question books only by Shivprasad Koirala from BPB publications.
www.questpond.com
Titles written by Shivprasad Koirala -- .NET Interview questions -- SQL Server Interview questions -- Java Interview questions -- C# and ASP.NET Projects -- How to prepare Software quotations -- Excel for office people. -- Software Testing Interview questions -- Hacking for beginners Mail bpb@bol.net.in for any of my titles above,.
ue stp on d.c om
by Shivprasad Koirala
Including SQLCLR, XML Integration, Database optimization, Data warehousing, Data mining and reporting services
The Table of contents is different from what is available in traditional books.So rather than reading through the whole book just look at what questions you feel uncomfortable and revise that.
Contents Introduction ................................................................................................................. 38 Dedication .................................................................................................................... 38 About the author .......................................................................................................... 38 ..................................................................................................................................... 40 Introduction ................................................................................................................. 40 How to read this book .................................................................................................. 41 Software Company hierarchy ...................................................................................... 41 Resume Preparation Guidelines................................................................................... 45 Salary Negotiation ....................................................................................................... 47 Points to remember ...................................................................................................... 48 1. Database Concepts ................................................................................................... 51 What is database or database management systems (DBMS)? ................................... 51 Whats difference between DBMS and RDBMS ?...................................................... 52 (DB)What are CODD rules?........................................................................................ 53 Is access database a RDBMS? ..................................................................................... 56 Whats the main difference between ACCESS and SQL SERVER? ........................... 56 Whats the difference between MSDE and SQL SERVER 2000?............................... 60 What is SQL SERVER Express 2005 Edition? ........................................................... 61 (DB) What is SQL Server 2000 Workload Governor? ................................................ 62 Whats the difference between SQL SERVER 2000 and 2005?.................................. 62 What are E-R diagrams? .............................................................................................. 65 How many types of relationship exist in database designing? .................................... 66 What is normalization? What are different type of normalization?............................. 69 What is denormalization ? ........................................................................................... 71 (DB) Can you explain Fourth Normal Form? ............................................................. 72 (DB) Can you explain Fifth Normal Form? ................................................................ 72 (DB) Whats the difference between Fourth and Fifth normal form? ......................... 74 (DB) Have you heard about sixth normal form? ......................................................... 74 What is Extent and Page? ............................................................................................ 74 (DB)What are the different sections in Page? ............................................................. 74 What are page splits? ................................................................................................... 75 In which files does actually SQL Server store data? ................................................... 75 What is Collation in SQL Server? ............................................................................... 76 (DB)Can we have a different collation for database and table? .................................. 78 2. SQL .......................................................................................................................... 79 Revisiting basic syntax of SQL? ................................................................................. 79 What are GRANT and REVOKE statements? ...................................................... 80 What is Cascade and Restrict in DROP table SQL? .................................................... 80
..................................................................................................................................... 80 How to import table using INSERT statement? ....................................................... 81 What is a DDL, DML and DCL concept in RDBMS world? ...................................... 81 What are different types of joins in SQL? ................................................................... 81 What is CROSS JOIN? ............................................................................................ 82 You want to select the first record in a given set of rows? .......................................... 82 How do you sort in SQL? ............................................................................................ 82 How do you select unique rows using SQL? ............................................................... 83 Can you name some aggregate function is SQL Server?............................................. 83 What is the default SORT order for a SQL? ............................................................ 83 What is a self-join? ...................................................................................................... 83 Whats the difference between DELETE and TRUNCATE ? ..................................... 84 Select addresses which are between 1/1/2004 and 1/4/2004? ................................ 84 What are Wildcard operators in SQL Server? ............................................................. 84 Whats the difference between UNION and UNION ALL ?................................ 86 What are cursors and what are the situations you will use them? ............................... 88 What are the steps to create a cursor?.......................................................................... 88 What are the different Cursor Types? .......................................................................... 90 What are Global and Local cursors? .................................................................... 92 What is Group by clause? ........................................................................................ 92 What is ROLLUP?....................................................................................................... 94 What is CUBE? ........................................................................................................... 96 What is the difference between HAVING and WHERE clause? ......................... 97 What is COMPUTE clause in SQL? ........................................................................ 98 What is WITH TIES clause in SQL? ....................................................................... 98 What does SET ROWCOUNT syntax achieves? ................................................... 100 What is a Sub-Query? ................................................................................................ 100 What is Correlated Subqueries? ............................................................................. 101 What is ALL and ANY operator? ...................................................................... 101 What is a CASE statement in SQL? ....................................................................... 101 What does COLLATE Keyword in SQL signify? ..................................................... 101 What is CTE (Common Table Expression)?.............................................................. 101 Why should you use CTE rather than simple views? ................................................ 102 What is TRY/CATCH block in T-SQL? .................................................................... 102 What is PIVOT feature in SQL Server? .................................................................... 102 What is UNPIVOT?................................................................................................... 104 What are RANKING functions?................................................................................ 104 What is ROW_NUMBER()? ..................................................................................... 104 What is RANK() ? ..................................................................................................... 104 What is DENSE_RANK()? ....................................................................................... 105
What is NTILE()? ...................................................................................................... 106 (DB)What is SQl injection ? ...................................................................................... 107 3. .NET Integration .................................................................................................... 108 What are steps to load a .NET code in SQL SERVER 2005? ................................... 108 How can we drop an assembly from SQL SERVER? ............................................... 108 Are changes made to assembly updated automatically in database? ......................... 108 Why do we need to drop assembly for updating changes?........................................ 108 How to see assemblies loaded in SQL Server?.......................................................... 108 I want to see which files are linked with which assemblies? .................................... 108 Does .NET CLR and SQL SERVER run in different process? ................................. 109 Does .NET controls SQL SERVER or is it vice-versa? ............................................. 110 Is SQLCLR configured by default? ........................................................................... 110 How to configure CLR for SQL SERVER? .............................................................. 110 Is .NET feature loaded by default in SQL Server? .................................................... 111 How does SQL Server control .NET run-time? ......................................................... 111 Whats a SAND BOX in SQL Server 2005? ......................................................... 112 What is an application domain? ................................................................................. 113 How are .NET Appdomain allocated in SQL SERVER 2005?.................................. 114 What is Syntax for creating a new assembly in SQL Server 2005? .......................... 115 Do Assemblies loaded in database need actual .NET DLL? ..................................... 115 You have a assembly which is dependent on other assemblies, will SQL Server load the dependent assemblies? .................................................................................... 115 Does SQL Server handle unmanaged resources? ...................................................... 115 What is Multi-tasking? .............................................................................................. 115 What is Multi-threading? ........................................................................................... 115 What is a Thread ? ..................................................................................................... 116 Can we have multiple threads in one App domain? .................................................. 116 What is Non-preemptive threading? .......................................................................... 116 What is pre-emptive threading? ................................................................................. 116 Can you explain threading model in SQL Server? .................................................... 116 How does .NET and SQL Server thread work? ......................................................... 116 How is exception in SQLCLR code handled? ........................................................... 117 Are all .NET libraries allowed in SQL Server? ......................................................... 117 What is Hostprotectionattribute in SQL Server 2005? .......................................... 117 How many types of permission level are there for an assembly?.............................. 118 In order that an assembly gets loaded in SQL Server what type of checks are done?118 Can you name system tables for .NET assemblies? .................................................. 119 Are two version of same assembly allowed in SQL Server? ..................................... 121 How are changes made in assembly replicated?........................................................ 121 Is it a good practice to drop a assembly for changes? ............................................... 121
In one of the projects following steps where done, will it work? .............................. 121 What does Alter assembly with unchecked data signify? .......................................... 122 How do I drop an assembly? ..................................................................................... 122 Can we create SQLCLR using .NET framework 1.0? ............................................... 123 While creating .NET UDF what checks should be done? ......................................... 123 How do you define a function from the .NET assembly? ......................................... 123 Can you compare between T-SQL and SQLCLR? .................................................... 124 With respect to .NET is SQL SERVER case sensitive?............................................. 124 Does case sensitive rule apply for VB.NET? ............................................................ 125 Can nested classes be accessed in T-SQL? ................................................................ 125 Can we have SQLCLR procedure input as array? ..................................................... 125 Can object datatype be used in SQLCLR? ................................................................ 125 Hows precision handled for decimal datatypes in .NET? ........................................ 125 How do we define INPUT and OUTPUT parameters in SQLCLR? ......................... 126 Is it good to use .NET datatypes in SQLCLR? .......................................................... 127 How to move values from SQL to .NET datatypes? ................................................. 127 What is System.Data.SqlServer? ............................................................................... 127 What is SQLContext? ................................................................................................ 127 Can you explain essential steps to deploy SQLCLR? .............................................. 128 How do create function in SQL Server using .NET? ................................................ 133 How do we create trigger using .NET? ..................................................................... 134 How to create User Define Functions using .NET? .................................................. 134 How to create aggregates using .NET? ..................................................................... 135 What is Asynchronous support in ADO.NET? .......................................................... 135 What is MARS support in ADO.NET? .................................................................... 136 What is SQLbulkcopy object in ADO.NET? ............................................................. 136 How to select range of rows using ADO.NET?......................................................... 136 What are different types of triggers in SQl SERVER 2000 ? .................................... 136 If we have multiple AFTER Triggers on table how can we define the sequence of the triggers ? ............................................................................................................... 137 How can you raise custom errors from stored procedure ? ....................................... 137 4. ADO.NET .............................................................................................................. 140 Which are namespaces for ADO.NET? ..................................................................... 140 Can you give a overview of ADO.NET architecture? ............................................... 140 What are the two fundamental objects in ADO.NET? .............................................. 141 What is difference between dataset and datareader? ................................................. 142 What are major difference between classic ADO and ADO.NET? ........................... 142 What is the use of connection object? ....................................................................... 142 What are the methods provided by the command object? ......................................... 142 What is the use of Dataadapter? ............................................................................. 143
What are basic methods of Dataadapter? ............................................................... 143 What is Dataset object? ............................................................................................. 144 What are the various objects in Dataset? ................................................................... 144 How can we connect to Microsoft Access, FoxPro, Oracle etc? ............................... 144 Whats the namespace to connect to SQL Server? .................................................... 145 How do we use stored procedure in ADO.NET? ....................................................... 146 How can we force the connection object to close? .................................................... 147 I want to force the datareader to return only schema? ............................................... 147 Can we optimize command object when there is only one row? .............................. 147 Which is the best place to store connectionstring? .................................................... 147 What are steps involved to fill a dataset? ................................................................. 148 What are the methods provided by the dataset for XML? ......................................... 149 How can we save all data from dataset? .................................................................... 149 How can we check for changes made to dataset? ...................................................... 149 How can we add/remove rows in DataTable object of DataSet? ...................... 150 Whats basic use of DataView? .............................................................................. 151 Whats difference between DataSet and DataReader? ....................................... 151 How can we load multiple tables in a DataSet? ........................................................ 152 How can we add relations between table in a DataSet? ........................................... 152 Whats the use of CommandBuilder? ........................................................................ 153 Whats difference between Optimistic and Pessimistic locking? ....................... 153 How many ways are there to implement locking in ADO.NET? ............................. 153 How can we perform transactions in .NET? .............................................................. 154 Whats difference between Dataset. clone and Dataset. copy? ................................. 155 Whats the difference between Dataset and ADO Recordset? .................................... 155 5. Notification Services ............................................................................................. 156 What are notification services?.................................................................................. 156 (DB)What are basic components of Notification services?....................................... 156 (DB)Can you explain architecture of Notification Services? .................................... 158 (DB)Which are the two XML files needed for notification services? ....................... 159 (DB)What is Nscontrols command? .......................................................................... 161 What are the situations you will use Notification Services? .................................. 162 6. Service Broker ....................................................................................................... 163 What do we need Queues?......................................................................................... 163 What is Asynchronous communication?................................................................ 163 What is SQL Server Service broker? ......................................................................... 163 What are the essential components of SQL Server Service broker? ......................... 163 What is the main purpose of having Conversation Group? ....................................... 164 How to implement Service Broker? .......................................................................... 164 How do we encrypt data between Dialogs? ............................................................... 170
7. XML Integration .................................................................................................... 171 What is XML? ........................................................................................................... 171 What is the version information in XML?................................................................. 171 What is ROOT element in XML? .............................................................................. 171 If XML does not have closing tag will it work? ........................................................ 171 Is XML case sensitive? .............................................................................................. 172 Whats the difference between XML and HTML? .................................................... 172 Is XML meant to replace HTML? ............................................................................. 172 Can you explain why your project needed XML? ..................................................... 172 What is DTD (Document Type definition)? .............................................................. 172 What is well formed XML? ....................................................................................... 172 What is a valid XML? ............................................................................................... 173 What is CDATA section in XML? ............................................................................. 173 What is CSS? ............................................................................................................. 173 What is XSL?............................................................................................................. 173 What is Element and attributes in XML? .................................................................. 173 Can we define a column as XML? ............................................................................ 173 How do we specify the XML data type as typed or untyped? ................................... 174 How can we create the XSD schema? ....................................................................... 174 How do I insert in to a table which has XSD schema attached to it? ........................ 176 What is maximum size for XML datatype? ............................................................... 176 What is Xquery? ........................................................................................................ 176 What are XML indexes? ............................................................................................ 177 What are secondary XML indexes? ........................................................................... 177 What is FOR XML in SQL Server? ........................................................................... 177 Can I use FOR XML to generate SCHEMA of a table and how? ............................. 177 What is the OPENXML statement in SQL Server? ................................................... 177 I have huge XML file which we want to load in database? ....................................... 178 How to call stored procedure using HTTP SOAP? ................................................... 178 What is XMLA? ........................................................................................................ 179 8. Data Warehousing/Data Mining ............................................................................ 180 What is Data Warehousing? ................................................................................... 180 What are Data Marts? ................................................................................................ 180 What are Fact tables and Dimension Tables? ............................................................ 180 (DB)What is Snow Flake Schema design in database? ............................................. 183 (DB)What is ETL process in Data warehousing? ...................................................... 184 (DB)How can we do ETL process in SQL Server? ................................................... 185 What is Data mining? ............................................................................................. 185 Compare Data mining and Data Warehousing?.................................................. 186 What is BCP?............................................................................................................. 187
How can we import and export using BCP utility? ................................................... 188 During BCP we need to change the field position or eliminate some fields how can we achieve this? ......................................................................................................... 189 What is Bulk Insert? .................................................................................................. 191 What is DTS?............................................................................................................. 192 (DB)Can you brief about the Data warehouse project you worked on? .................... 193 What is an OLTP (Online Transaction Processing) System?..................................... 194 What is an OLAP (On-line Analytical processing) system? ..................................... 194 What is Conceptual, Logical and Physical model? ................................................... 195 (DB)What is Data purging? ....................................................................................... 195 What is Analysis Services? ........................................................................................ 195 (DB)What are CUBES? ............................................................................................. 196 (DB)What are the primary ways to store data in OLAP? .......................................... 196 (DB)What is META DATA information in Data warehousing projects? .................. 197 (DB)What is multi-dimensional analysis? ................................................................. 197 ................................................................................................................................... 198 (DB)What is MDX?................................................................................................... 199 (DB)How did you plan your Data ware house project? ............................................ 199 What are different deliverables according to phases? ............................................... 202 (DB)Can you explain how analysis service works? .................................................. 203 What are the different problems that Data mining can solve? ............................... 219 What are different stages of Data mining? ............................................................. 220 (DB)What is Discrete and Continuous data in Data mining world? ......................... 223 (DB)What is MODEL is Data mining world? ........................................................... 223 DB)How are models actually derived? ...................................................................... 224 (DB)What is a Decision Tree Algorithm? ................................................................. 224 (DB)Can decision tree be implemented using SQL? ................................................. 226 (DB)What is Nave Bayes Algorithm? ...................................................................... 226 (DB)Explain clustering algorithm? ........................................................................... 227 (DB)Explain in detail Neural Networks? .................................................................. 228 (DB)What is Back propagation in Neural Networks? ............................................... 231 (DB)What is Time Series algorithm in data mining? ................................................ 232 (DB)Explain Association algorithm in Data mining?................................................ 232 (DB)What is Sequence clustering algorithm? ........................................................... 232 (DB)What are algorithms provided by Microsoft in SQL Server?............................ 232 (DB)How does data mining and data warehousing work together? .......................... 234 What is XMLA? ........................................................................................................ 235 What is Discover and Execute in XMLA? ................................................................ 236 9. Integration Services/DTS ...................................................................................... 237 What is Integration Services import / export wizard? ............................................... 237
What are prime components in Integration Services? ............................................... 243 How can we develop a DTS project in Integration Services? ................................... 245 10. Replication ........................................................................................................... 258 Whats the best way to update data between SQL Servers? ....................................... 258 What are the scenarios you will need multiple databases with schema? ................... 258 (DB)How will you plan your replication? ................................................................. 259 What are publisher, distributor and subscriber in Replication? ............................. 260 What is Push and Pull subscription? .................................................................. 261 (DB)Can a publication support push and pull at one time? ....................................... 261 What are different models / types of replication? ...................................................... 262 What is Snapshot replication? ................................................................................... 262 What are the advantages and disadvantages of using Snapshot replication? ............ 262 What type of data will qualify for Snapshot replication? ...................................... 262 Whats the actual location where the distributor runs?.............................................. 263 Can you explain in detail how exactly Snapshot Replication works? ................... 263 What is merge replication? ........................................................................................ 264 How does merge replication works?.......................................................................... 264 What are advantages and disadvantages of Merge replication? ................................ 265 What is conflict resolution in Merge replication? ..................................................... 265 What is a transactional replication? ........................................................................... 266 Can you explain in detail how transactional replication works? ............................... 266 What are data type concerns during replications? ..................................................... 267 11. Reporting Services ............................................................................................... 272 Can you explain how can we make a simple report in reporting services? ............... 272 How do I specify stored procedures in Reporting Services? ..................................... 279 What is the architecture for Reporting Services ?.................................................. 280 12. Database Optimization ........................................................................................ 283 What are indexes? ...................................................................................................... 283 What are B-Trees? ..................................................................................................... 283 I have a table which has lot of inserts, is it a good database design to create indexes on that table?.............................................................................................................. 284 What are Table Scans and Index Scans? .......................................................... 285 What are the two types of indexes and explain them in detail? ................................ 286 (DB)What is FillFactor concept in indexes? .......................................................... 289 (DB) What is the best value for FillFactor? ........................................................... 289 What are Index statistics? ...................................................................................... 289 (DB)How can we see statistics of an index? ............................................................. 290 (DB) How do you reorganize your index, once you find the problem? .................... 294 What is Fragmentation? ............................................................................................. 294 (DB)How can we measure Fragmentation? ............................................................... 296
(DB)How can we remove the Fragmented spaces? ................................................... 296 What are the criteria you will look in to while selecting an index? .......................... 297 (DB)What is Index Tuning Wizard? ...................................................................... 298 (DB)What is an Execution plan? ............................................................................... 305 How do you see the SQL plan in textual format? ...................................................... 308 (DB)What is nested join, hash join and merge join in SQL Query plan?.................. 308 What joins are good in what situations? .................................................................... 310 (DB)What is RAID and how does it work ? .............................................................. 311 13. Transaction and Locks ......................................................................................... 313 What is a Database Transactions ? ......................................................................... 313 What is ACID?........................................................................................................... 313 What is Begin Trans, Commit Tran, Rollback Tran and Save Tran? .......... 314 (DB)What are Checkpoints in SQL Server? ......................................................... 315 (DB)What are Implicit Transactions? .................................................................... 315 (DB)Is it good to use Implicit Transactions? ......................................................... 315 What is Concurrency? ............................................................................................... 316 How can we solve concurrency problems? ............................................................... 316 What kind of problems occurs if we do not implement proper locking strategy?..... 317 What are Dirty reads? ............................................................................................ 317 What are Unrepeatable reads?................................................................................ 319 What are Phantom rows? ....................................................................................... 320 What are Lost Updates? ......................................................................................... 321 What are different levels of granularity of locking resources? ................................. 322 What are different types of Locks in SQL Server? .................................................... 322 What are different Isolation levels in SQL Server? ................................................... 325 What are different types of Isolation levels in SQL Server? ..................................... 325 If you are using COM+ what Isolation level is set by default?.............................. 326 What are Lock hints? ............................................................................................. 327 What is a Deadlock ? ............................................................................................. 327 What are the steps you can take to avoid Deadlocks ? .......................................... 327 (DB)How can I know what locks are running on which resource? ........................... 328
Introduction
Dedication This book is dedicated to my kid Sanjana, whose dads play time has been stolen and given to this book. I am thankful to my wife for constantly encouraging me and also to BPB Publication to give new comer a platform to perform. Finally at the top of all thanks to two old eyes my mom and dad for always blessing me. I am blessed to have Raju as my brother who always keeps my momentum moving on. I am grateful to Bhavnesh Asar who initially conceptualized the idea I believe concept thinking is more important than execution. Tons of thanks to my reviewers whose feedback provided an essential tool to improve my writing capabilities. Just wanted to point out Miss Kadambari . S. Kadam took all the pain to review for the left outs with out which this book would have never seen the quality light. About the author Author works in a big multinational company and has over 8 years of experience in software industry. He is working presently as project lead and in past has led projects in Banking, travel and financial sectors. But on the top of all , I am a simple developer like you all guys there doing an 8 hour job. Writing is something I do extra and I love doing it. No one is perfect and same holds true for me .So anything you want to comment, suggest, point typo / grammar mistakes or technical mistakes regarding the book you can mail me at shiv_koirala@yahoo.com. Believe me guys your harsh words would be received with love and treated to the top most priority. Without all you guys I am not an author. Writing an interview question book is really a great deal of responsibility. I have tried to cover maximum questions for the topic because I always think probably leaving one silly question will cost someones job there. But huge natural variations in an interview are something difficult to cover in this small book. So if you have come across such questions during interview which is not addressed in this book do mail at shiv_koirala@yahoo.com .Who knows probably that question can save some other guys job. Features of the book
51
This book goes in best combination with my previous book .NET Interview questions. One takes care of your front end aspect and this one the back end which will make you really stand out during .NET interviews. Around 400 plus SQL Server Interview questions sampled from real SQL Server Interviews conducted across IT companies. Other than core level interview question, DBA topics like database optimization and locking are also addressed. Replication section where most of the developer stumble, full chapter is dedicated to replication so that during interview you really look a champ. SQLCLR that is .NET integration which is one of the favorites of every interviewer is addressed with great care .This makes developer more comfortable during interview. XML is one of the must to be answered questions during interview. All new XML features are covered with great elegance. Areas like data warehousing and data mining are handled in complete depth. Reporting and Analysis services which can really surprise developers during interviews are also dealt with great care. A complete chapter on ADO.NET makes it more stronger from a programmer aspect. In addition new ADO.NET features are also highlighted which can be pain points for the new features released with SQL Server. Must for developers who are looking to crack SQL Server interview for DBA position or programmer position. Must for freshers who want to avoid some unnecessary pitfall during interview. Every answer is precise and to the point rather than hitting around the bush. Some questions are answered to greater detail with practical implementation in mind. Every question is classified in DB and NON-DB level. DB level question are mostly for guys who are looking for high profile DBA level jobs. All questions other than DB level are NON-DB level which is must for every programmer to know. Tips and tricks for interview, resume making and salary negotiation section takes this book to a greater height.
52
Introduction When my previous book ".NET Interview Questions" reached the readers, the only voice heared was more SQL Server. Ok guys we have heard it louder and clearer, so heres my complete book on SQL Server: - SQL Server Interview Questions. But theres a second stronger reason for writing this book which stands taller than the readers demand and that is SQL Server itself. Almost 90 % projects in software industry need databases or persistent data in some or other form. When it comes to .NET persisting data SQL Server is the most preferred database to do it. There are projects which use ORACLE, DB2 and other database product, but SQL Server still has the major market chunk when language is .NET and especially operating system is windows. I treat this great relationship between .NET, SQL Server and Windows OS as a family relationship. In my previous book we had only one chapter which was dedicated to SQL Server which is complete injustice to this beautiful product. So why an interview question book on SQL Server? If you look at any .NET interview conducted in your premises both parties (Employer and Candidate) pay no attention to SQL Server even though when it is such an important part of development project. They will go talking about stars (OOP, AOP, Design patterns, MVC patterns, Microsoft Application blocks, Project Management etc.) but on database side there would be rare questions. I am not saying these things are not important but if you see in development or maintenance majority time you will be either in your IDE or in SQL Server. Secondly many candidates go really as heroes when answering questions of OOP , AOP , Design patterns , architecture , remoting etc etc but when it comes to simple basic question on SQL Server like SQL , indexes ( forget DBA level questions) they are completely out of track. Third very important thing IT is changing people expect more out of less. That means they expect a programmer should be architect, coder, tester and yes and yes a DBA also. For mission critical data there will always be a separate position for a DBA. But now many interviewers expect programmers to also do a job of DBA, Data warehousing etc. This is the major place where developers lack during facing these kinds of interview. So this book will make you walk through those surprising questions which can sprang from SQL Server aspect. I have tried to not go too deep as that will defeat the complete purpose of an Interview Question book. I think that an interview book should make you
53
run through those surprising question and make you prepare in a small duration (probably with a night or so). I hope this book really points those pitfalls which can come during SQL Server Interviews. I hope this book takes you to a better height and gives you extra confidence boost during interviews.Best of Luck and Happy Job-Hunting............. How to read this book If you can read English, you can read this book....kidding. In this book there are some legends which will make your reading more effective. Every question has simple tags which mark the rating of the questions. These rating are given by Author and can vary according to companies and individuals. Compared to my previous book .NET Interview Questions which had three levels (Basic, Intermediate and Advanced) this book has only two levels (DBA and NONDBA) because of the subject. While reading you can come across section marked as Note , which highlight special points of that section. You will also come across tags like TWIST, which is nothing , but another way of asking the same question, for instance What is replication? and How do I move data between two SQL Server database? , point to the same answer. All questions with DBA level are marked with (DB) tag. Questions which do not have tags are NON-DBA levels. Every developer should have a know how of all NON-DBA levels question. But for DBA guys every question is important. For instance if you are going for a developer position and you flunk in simple ADO.NET question you know the result. Vice versa if you are going for a DBA position and you can not answer basic query optimization questions probably you will never reach the HR round. So the best way to read this book is read the question and judge yourself do you think you will be asked these types of questions? For instance many times you know you will be only asked about data warehousing and rather than hitting the bush around you would like to target that section more. And Many times you know your weakest area and you would only like to brush up those sections. You can say this book is not a book which has to be read from start to end you can start from a chapter or question and when you think you are ok close it. Software Company hierarchy
54
Its very important during interview to be clear about what position you are targeting. Depending on what positions you are targeting the interviewer shoots you questions. Example if you are looking for a DBA position you will be asked around 20% ADO.NET questions and 80% questions on query optimization, profiler, replication, data warehousing, data mining and others. Note:- In small scale software house and mid scale software companies there are chances where they expect a developer to a job of programming , DBA job , data mining and everything. But in big companies you can easily see the difference where DBA job are specifically done by specialist of SQL Server rather than developers. But now a days some big companies believe in a developer doing multitask jobs to remove dependencies on a resource.
55
Above is a figure of a general hierarchy across most IT companies ( Well not always but I hope most of the time). Because of inconsistent HR way of working you will see difference between companies.
56
Note: - There are many small and medium software companies which do not follow this hierarchy and they have there own ADHOC way of defining positions in the company. So why there is a need of hierarchy in an interview? Interview is a contract between the employer and candidate to achieve specific goals. So employer is looking for a suitable candidate and candidate for a better career. Normally in interviews the employer is very clear about what type of candidate he is looking for. But 90% times the candidate is not clear about the positions he is looking for. How many times has it happened with you that you have given a whole interview and when you mentioned the position you are looking for...pat comes the answer we do not have any requirements for this position. So be clarified about the position right when you start the interview. Following are the number of years of experience according to position. Junior engineers are especially fresher and work under software engineers. Software engineers have around 1 to 2 years of experience. Interviewer expects software engineers to have know how of how to code ADO.NET with SQL Server. Senior Software Engineers have around 2 to 4 years of experience. Interviewer expect them to be technically very strong. Project leads should handle majority technical aspect of project and should have around 4 to 8 years of experience. They are also actively involved in to defining architect of the project. Interviewer expects them to be technically strong plus should have managerial skills. Project Managers are expected to be around 40% technically strong and should have experience above 10 years plus. But they are more interviewed from aspect of project management, client interaction, people management, proposal preparation etc. Pure DBAs do not come in hierarchy as such in pure development projects. They do report to the project managers or project leads but they are mainly across the hierarchy helping every one in a project. In small companies software developers can also act as DBAs depending on companies policy. Pure DBAs have normally around 6 and above years of experience in that particular database product.
57
When it comes to maintenance projects where you have special DBA positions lot of things are ADHOC. That means one or two guys work fulfilling maintenance tickets.
So now judge where you stand where you want to go.......... Resume Preparation Guidelines First impression the last impression Before even the interviewer meets you he will first meet your resume. Interviewer looking at your resume is almost a 20% interview happening with out you knowing it. I was always a bad guy when it comes to resume preparation. But when I looked at my friends resume they where gorgeous. Now that I am writing series of book on interviews I thought this will be a good point to put in. You can happily skip it if you are confident about your resume. There is no hard and fast rule that you have to follow the same pattern but just see if these all check list are attended. Use plain text when you are sending resumes through email. For instance you sent your resume using Microsoft word and what if the interviewer is using Linux he will never be able to read your resume. You can not be sure both wise , you sent your resume in Word 2000 and the guy has Word 97uuhhh. Attach a covering letter it really impresses and makes you look traditionally formal. Yes even if you are sending your CV through email send a covering letter. Start with an objective or summary, for instance, Working as a Senior Database administrator for more than 4 years. Implemented quality web based application. Followed the industrys best practices and adhered and implemented processes, which enhanced the quality of technical delivery. Pledged to deliver the best technical solutions to the industry. Specify your Core strengths at the start of the resume by which the interviewer can make a quick decision are you eligible for the position. For example : Looked after data mining and data warehousing department independently. Played a major role in query optimization. Worked extensively in database design and ER diagram implementation. Well versed with CMMI process and followed it extensively in projects.
58
Looking forward to work on project manager or senior manager position. This is also a good position to specify your objective or position which makes it clear to the interviewer that should he call you for an interview. For instance if you are looking for senior position specify it explicitly looking for this job profile. Any kind of certification like MCP, MCSD etc you can make it visible in this section. Once you have specified briefly your goals and what you have done its time to specify what type of technology you have worked with. For instance RDBMS, TOOLS, Languages, Web servers, process (Six sigma, CMMI). After that you can make a run through of your experience company wise that is what company you have worked with, year / month joining and year / month left. This will give an overview to the interviewer what type of companies you have associated your self.
Now its time to mention all your projects you have worked till now. Best is to start in descending order that is from your current project and go backwards. For every project try to put these things : Project Name / Client name (Its sometimes unethical to mention clients name; I leave it to the readers). Number of team members. Time span of the project. Tools, language, RDBMS and technology used to complete the project. Brief summary of the project.
Senior people who have huge experience will tend to increase there CV with putting in summary for all project. Best for them is to just put description of the first three projects in descending manner and rest they can say verbally during interview. I have seen CV above 15 pages I doubt who can read it. Finally comes your education and personal details. Trying for onsite, do not forget to mention your passport number. Some guys tend to make there CV large and huge. I think an optimal size should be not more than 4 to 5 pages.
59
Do not mention your salary in CV. You can talk about it during interview with HR or the interviewer. When you are writing your summary for project make it effective by using verbs like managed a team of 5 members, architected the project from start to finish etc. It brings huge weight. This is essential very essential take 4 to 5 Xerox copies of your resume you will need it now and then. Just in case take at least 2 passport photos with you. You can escape it but many times you will need it. Carry youre all current office documents specially your salary slips and joining letter.
Salary Negotiation Ok thats what we all do it for money not every one right. This is probably the weakest area for techno savvy guys. They are not good negotiators. I have seen so many guys at the first instance they will smile say NEGOTIABLE SIR. So here are some points: Do a study of whats the salary trend? For instance have some kind of baseline. For example whats the salary trend on number of year of experience? Discuss this with your friends out. Do not mention your expected salary on the resume? Let the employer first make the salary offer. Try to delay the salary discussion till the end.
If they say what you expect ? , come with a figure with a little higher end and say negotiable. Remember never say negotiable on something which you have aimed, HR guys will always bring it down. So negotiate on AIMED SALARY + some thing extra. The normal trend is that they look at your current salary and add a little it so that they can pull you in. Do your home work my salary is this much and I expect this much so whatever it is now I will not come below this. Do not be harsh during salary negotiations. Its good to aim high. For instance I want 1 billion dollars / month but at the same time be realistic.
60
Some companies have those hidden cost attached in salary clarify that rather to be surprised at the first salary package. Many of the companies add extra performance compensation in your basic which can be surprising at times. So have a detail break down. Best is to discuss on hand salary rather than NET. Talk with the employer in what frequency does the hike happen. Take everything in writing , go back to your house and have a look once with a cool head is the offer worth it of what your current employer is giving. Do not forget once you have job in hand you can come back to your current employer for negotiation so keep that thing in mind. Remember the worst part is cribbing after joining the company that your colleague is getting this much. So be careful while interview negotiations or be sportive to be a good negotiator in the next interview. One very important thing the best negotiation ground is not the new company where you are going but the old company which you are leaving. So once you have offer on hand get back to your old employee and show them the offer and then make your next move. Its my experience that negotiating with the old employer is easy than with the new one.Frankly if approached properly rarely any one will say no. Just do not be aggressive or egoistic that you have an offer on hand.
Top of all some time some things are worth above money :- JOB SATISFACTION. So whatever you negotiate if you think you can get JOB SATISFACTION aspect on higher grounds go for it. I think its worth more than money. Points to remember One of the first questions asked during interview is Can you say something about yourself ? Can you describe about your self and what you have achieved till now? Why you want to leave the current company? Where do you see yourself after three years? What are your positive and negative points? How much do you rate yourself in .NET and SQL Server in one out of ten?
61
Are you looking for onsite opportunities? (Be careful do not show your desperation of abroad journeys) Why have you changed so many jobs? (Prepare a decent answer do not blame companies and individuals for your frequent change). Never talk for more than 1 minute straight during interview. Have you worked with previous version of SQL Server? Would you be interested in a full time Database administrator job? Do not mention client names in resume. If asked say that its confidential which brings ahead qualities like honesty When you make your resume keep your recent projects at the top. Find out what the employer is looking for by asking him questions at the start of interview and best is before going to interview. Example if a company has projects on server products employer will be looking for BizTalk, CS CMS experts. Can you give brief about your family background? As you are fresher do you think you can really do this job? Have you heard about our company ? Say five points about our company? Just read at least once what company you are going for? Can you describe your best project you have worked with? Do you work on Saturday and Sunday? Which is the biggest team size you have worked with? Can you describe your current project you have worked with? How much time will you need to join our organization? Whats notice period for your current company? What certifications have you cleared? Do you have pass port size photos, last year mark sheet, previous companies employment letter, last months salary slip, pass port and other necessary documents. Whats the most important thing that motivates you? Why you want to leave the previous organization?
62
Which type of job gives you greatest satisfaction? What is the type of environment you are looking for? Do you have experience in project management? Do you like to work as a team or as individual? Describe your best project manager you have worked with? Why should I hire you? Have you been ever fired or forced to resign? Can you explain some important points that you have learnt from your past project experiences? Have you gone through some unsuccessful projects, if yes can you explain why did the project fail? Will you be comfortable with location shift? If you have personal problems say no right at the first stage.... or else within two months you have to read my book again. Do you work during late nights? Best answer if there is project deadline yes. Do not show that its your culture to work during nights. Any special achievements in your life till now...tell your best project which you have done best in your career. Any plans of opening your own software company...Beware do not start pouring your bill gates dream to him.....can create a wrong impression.
63
1. Database Concepts
What is database or database management systems (DBMS)?
Twist: - Whats the difference between file and database? Can files qualify as a database? Note: - Probably these questions are too basic for experienced SQL SERVER guys. But from freshers point of view it can be a difference between getting a job and to be jobless. Database provides a systematic and organized way of storing, managing and retrieving from collection of logically related information. Secondly the information has to be persistent, that means even after the application is closed the information should be persisted. Finally it should provide an independent way of accessing data and should not be dependent on the application to access the information. Ok let me spend a few sentence more on explaining the third aspect. Below is a simple figure of a text file which has personal detail information. The first column of the information is Name, Second address and finally the phone number. This is a simple text file which was designed by a programmer for a specific application.
64
It works fine in the boundary of the application. Now some years down the line a third party application has to be integrated with this file , so in order the third party application integrates properly it has the following options : Use interface of the original application. Understand the complete detail of how the text file is organized, example the first column is Name, then address and finally phone number. After analyzing write a code which can read the file, parse it etc .Hmm lot of work right.
Thats what the main difference between a simple file and database; database has independent way (SQL) of accessing information while simple files do not (That answers my twisted question defined above). File meets the storing, managing and retrieving part of a database but not the independent way of accessing data. Note: - Many experienced programmers think that the main difference is that file can not provide multi-user capabilities which a DBMS provides. But if you look at some old COBOL and C programs where file where the only means of storing data, you can see functionalities like locking, multi-user etc provided very efficiently. So its a matter of debate if some interviewers think this as a main difference between files and database accept it going in to debate is probably loosing a job. (Just a note for freshers multi-user capabilities means that at one moment of time more than one user should be able to add, update, view and delete data. All DBMS provides this as in built functionalities but if you are storing information in files its up to the application to write a logic to achieve these functionalities)
Many DBMS companies claimed there DBMS product was a RDBMS compliant, but according to industry rules and regulations if the DBMS fulfills the twelve CODD rules its truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the twelve CODD rules and are considered as truly RDBMS. Note: - One of the biggest debate, Is Microsoft Access a RDBMS? We will be answering this question in later section.
66
Rule 3: Systematic treatment of null values. "Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type." In SQL SERVER if there is no data existing NULL values are assigned to it. Note NULL values in SQL SERVER do not represent spaces, blanks or a zero value; its a distinct representation of missing information and thus satisfying rule 3 of CODD. Rule 4: Dynamic on-line catalog based on the relational model. "The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data." The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database. Rule 5: Comprehensive data sub-language Rule. "A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items Data Definition View Definition Data Manipulation (Interactive and by program). Integrity Constraints Authorization. Transaction boundaries ( Begin , commit and rollback)
SQL SERVER uses SQL to query and manipulate data which has well-defined syntax and is being accepted as an international standard for RDBMS. Note: - According to this rule CODD has only mentioned that some language should be present to support it, but not necessary that it should be SQL. Before 80s different
67
database vendors where providing there own flavor of syntaxes until in 80 ANSI-SQL came into standardize this variation between vendors. As ANSI-SQL is quiet limited, every vendor including Microsoft introduced there additional SQL syntaxes in addition to the support of ANSI-SQL. You can see SQL syntaxes varying from vendor to vendor. Rule 6: .View updating Rule "All views that are theoretically updatable are also updatable by the system." In SQL SERVER not only views can be updated by user, but also by SQL SERVER itself. Rule 7: High-level insert, update and delete. "The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data." SQL SERVER allows you to update views which in turn affect the base tables. Rule 8: Physical data independence. "Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods." Any application program (C#, VB.NET, VB6 VC++ etc) Does not need to be aware of where the SQL SERVER is physically stored or what type of protocol its using, database connection string encapsulates everything. Rule 9: Logical data independence. "Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables." Application programs written in C# or VB.NET does not need to know about any structure changes in SQL SERVER database example: - adding of new field etc. Rule 10: Integrity independence. "Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs."
68
In SQL SERVER you can specify data types (integer, nvarchar, Boolean etc) which puts in data type checks in SQL SERVER rather than through application programs. Rule 11: Distribution independence. "A relational DBMS has distribution independence." SQL SERVER can spread across more than one physical computer and across several networks; but from application programs it has not big difference but just specifying the SQL SERVER name and the computer on which it is located. Rule 12: Non-subversion Rule. "If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)." In SQL SERVER whatever integrity rules are applied on every record are also applicable when you process a group of records using application program in any other language (example: - C#, VB.NET, J# etc...). Readers can see from the above explanation SQL SERVER satisfies all the CODD rules, some database gurus consider SQL SERVER as not truly RDBMS, but thats a matter of debate.
69
Access uses file server design and SQL SERVER uses the Client / Server model. This forms the major difference between SQL SERVER and ACCESS. Note: - Just to clarify what is client server and file server I will make quick description of widely accepted architectures. There are three types of architecture: Main frame architecture (This is not related to the above explanation but just mentioned it as it can be useful during interview and also for comparing with other architectures) File sharing architecture (Followed by ACCESS). Client Server architecture (Followed by SQL SERVER).
In Main Frame architecture all the processing happens on central host server. User interacts through dump terminals which only sends keystrokes and information to host. All the main processing happens on the central host server. So advantage in such type of architecture is that you need least configuration clients. But the disadvantage is that you need robust central host server like Main Frames. In File sharing architecture which is followed by access database all the data is sent to the client terminal and then processed. For instance you want to see customers who stay in INDIA, in File Sharing architecture all customer records will be send to the client PC regardless whether the customer belong to INDIA or not. On the client PC customer records from India is sorted/filtered out and displayed, in short all processing logic happens on the client PC. So in this architecture the client PC should have heavy configuration and also it increases network traffic as lot of data is sent to the client PC. But advantage of this architecture is that your server can be of low configurations.
70
In client server architecture the above limitation of the file server architecture is removed. In Client server architecture you have two entities client and the database server. File server is now replaced by database server. Database server takes up the load of processing any database related activity and the client any validation aspect of database. As the work is distributed between the entities it increases scalability and reliability. Second the network traffic also comes down as compared to file server. For example if you are requesting customers from INDIA, database server will sort/ filter and send only INDIAN customer details to the client, thus bringing down the network traffic tremendously. SQL SERVER follows the client-server architecture.
71
Second issue comes in terms of reliability. In Access the client directly interacts with the access file, in case there is some problem in middle of transaction there are chances that access file can get corrupt. But in SQL SERVER the engine sits in between the client and the database, so in case of any problems in middle of the transaction it can revert back to its original state. Note: - SQL SERVER maintains a transaction log by which you can revert back to your original state in case of any crash.
When your application has to cater to huge load demand, highly transactional environment and high concurrency then its better to for SQL SERVER or MSDE. But when it comes to cost and support the access stands better than SQL SERVER.In case of SQL SERVER you have to pay for per client license, but access runtime is free. Summarizing: - SQL SERVER gains points in terms of network traffic, reliability and scalability vice-versa access gains points in terms of cost factor.
72
There are lots of third party tools which provide administrative capability GUI, which is out of scope of the book as its only meant for interview questions.
MSDE does not support Full text search. Summarizing: - There are two major differences first is the size limitation (2 GB) of database and second are the concurrent connections (eight concurrent connections) which are limited by using the work load governor. During interview this answer will suffice if he is really testing your knowledge.
You can provide SCHEMA to the SQL SERVER fields with XML data type. You can use new XML manipulation techniques like XQUERY also called as XML QUERY. Summarizing: - Major difference is database size (2 GB and 4 GB), support of .NET support in stored procedures and native support for XML. This is much can convince the interviewer that you are clear about the differences.
There is complete chapter on SQL SERVER XML Support so till then this will suffice.
defined types can now be written using your own favorite .NET language (VB.NET, C#, J# etc .). This support was not there in SQL SERVER 2000 where the only language was T-SQL. In SQL 2005 you have support for two languages T-SQL and .NET. (PG) SQL SERVER 2005 has reporting services for reports which is a newly added feature and does not exist for SQL SERVER 2000.It was a seperate installation for SQL Server 2000 (PG) SQL SERVER 2005 has introduced two new data types varbinary (max) and XML. If you remember in SQL SERVER 2000 we had image and text data types. Problem with image and text data types is that they assign same amount of storage irrespective of what the actual data size is. This problem is solved using varbinary (max) which acts depending on amount of data. One more new data type is included "XML" which enables you to store XML documents and also does schema verification. In SQL SERVER 2000 developers used varchar or text data type and all validation had to be done programmatically. (PG) SQL SERVER 2005 can now process direct incoming HTTP request with out IIS web server. Also stored procedure invocation is enabled using the SOAP protocol. (PG) Asynchronous mechanism is introduced using server events. In Server event model the server posts an event to the SQL Broker service, later the client can come and retrieve the status by querying the broker. For huge databases SQLSERVER has provided a cool feature called as Data partitioning. In data partitioning you break a single database object such as a table or an index into multiple pieces. But for the client application accessing the single data base object partitioning is transparent. In SQL SERVER 2000 if you rebuilt clustered indexes even the non-clustered indexes where rebuilt. But in SQL SERVER 2005 building the clustered indexes does not built the non-clustered indexes. Bulk data uploading in SQL SERVER 2000 was done using BCP (Bulk copy programs) format files. But now in SQL SERVER 2005 bulk data uploading uses XML file format.
76
In SQL SERVER 2000 there where maximum 16 instances, but in 2005 you can have up to 50 instances. SQL SERVER 2005 has support of Multiple Active Result Sets also called as MARS. In previous versions of SQL SERVER 2000 in one connection you can only have one result set. But now in one SQL connection you can query and have multiple results set. In previous versions of SQL SERVER 2000, system catalog was stored in master database. In SQL SERVER 2005 its stored in resource database which is stored as sys object , you can not access the sys object directly as in older version wewhere accessing master database. This is one of hardware benefits which SQL SERVER 2005 has over SQL SERVER 2000 support of hyper threading. WINDOWS 2003 supports hyper threading; SQL SERVER 2005 can take the advantage of the feature unlike SQL SERVER 2000 which did not support hyper threading. Note: - Hyper threading is a technology developed by INTEL which creates two logical processors on a single physical hardware processor.
SMO will be used for SQL Server Management. AMO (Analysis Management Objects) to manage Analysis Services servers, data sources, cubes, dimensions, measures, and data mining models. You can map AMO in old SQL SERVER with DSO (Decision Support Objects). Replication is now managed by RMO (Replication Management Objects). Note: - SMO, AMO and RMO are all using .NET Framework. SQL SERVER 2005 uses current user execution context to check rights rather than ownership link chain, which was done in SQL SERVER 2000. Note: - There is a question on this later see for execution context questions. In previous versions of SQL SERVER the schema and the user name was same, but in current the schema is separated from the user. Now the user owns schema. Note: - There are questions on this, refer Schema later. Note:-Ok below are some GUI changes.
77
Query analyzer is now replaced by query editor. Business Intelligence development studio will be used to create Business intelligence solutions. OSQL and ISQL command line utility is replaced by SQLCMD utility. SQL SERVER Enterprise manager is now replaced by SQL SERVER Management studio. SERVER Manager which was running in system tray is now replaced by SQL Computer manager. Database mirror concept supported in SQL SERVER 2005 which was not present in SQL SERVER 2000. In SQL SERVER 2005 Indexes can be rebuild online when the database is in actual production. If you look back in SQL SERVER 2000 you can not do insert, update and delete operations when you are building indexes. (PG) Other than Serializable, Repeatable Read, Read Committed and Read Uncommitted isolation level there is one more new isolation level Snapshot Isolation level. Note: - We will see Snapshot Isolation level in detail in coming questions. Summarizing: - The major significant difference between SQL SERVER 2000 and SQL SERVER 2005 is in terms of support of .NET Integration, Snap shot isolation level, Native XML support, handling HTTP request, Web service support and Data partitioning. You do not have to really say all the above points during interview a sweet summary and you will rock.
78
One-to-many
In this many records in one table corresponds to the one record in other table. Example: - Every one customer can have multiple sales. So there exist one-to-many relationships between customer and sales table. One Asset can have multiple Maintenance. So Asset entity has one-to-many relationship between them as the ER model shows below.
80
Many-to-many
In this one record in one table corresponds to many rows in other table and also viceversa. For instance :- In a company one employee can have many skills like java , c# etc and also one skill can belong to many employees. Given below is a sample of many-to-many relationship. One employee can have knowledge of multiple Technology. So in order to implement this we have one more table EmployeeTechnology which is linked to the primary key of Employee and Technology table.
81
82
form.But believe this book answering three normal forms will put you in decent shape during interview. Following are the three normal forms :First Normal Form For a table to be in first normal form, data must be broken up into the smallest units possible.In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields.
For in the above example city1 and city2 are repeating.In order this table to be in First normal form you have to modify the table structure as follows.Also not that the Customer Name is now broken down to first name and last name (First normal form data should be broken down to smallest unit).
Second Normal form The second normal form states that each field in a multiple field primary keytable must be directly related to the entire primary key. Or in other words,each non-key field should be a fact about all the fields in the primary key. In the above table of customer , city is not linked to any primary field.
83
That takes our database to a second normal form. Third normal form A non-key field should not depend on other Non-key field.The field "Total" is dependent on "Unit price" and "qty".
So now the "Total" field is removed and is multiplication of Unit price * Qty.
What is denormalization ?
84
Denormalization is the process of putting one fact in numerous places (its vice-versa of normalization).Only one valid reason exists for denormalizing a relational design - to enhance performance.The sacrifice to performance is that you increase redundancy in database.
In the above table you can see there are two many-to-many relationship between Supplier / Product and Supplier / Location (or in short multi-valued facts). In order the above example satisfies fourth normal form, both the many-to-many relationship should go in different tables.
85
The above table shows some sample data. If you observe closely a single record is created using lot of small informations. For instance: - JM Associate can sell sweets in the following two conditions: JM Associate should be an authorized dealer of Cadbury. Sweets should be manufactured by Cadbury company.
These two smaller information forms one record of the above given table. So in order that the above information to be Fifth Normal Form all the smaller information should be in three different places. Below is the complete fifth normal form of the database.
86
(DB) Whats the difference between Fourth and Fifth normal form?
Note: - There is huge similarity between Fourth and Fifth normal form i.e. they address the problem of Multi-Valued facts. Fifth normal form multi-valued facts are interlinked and Fourth normal form values are independent. For instance in the above two questions Supplier/Product and Supplier/Location are not linked. While in fifth form the Dealer/Product/Companies are completely linked.
87
Twist: - Whats the relationship between Extent and Page? Extent is a basic unit of storage to provide space for tables. Every extent has number of data pages. As new records are inserted new data pages are allocated. There are eight data pages in an extent. So as soon as the eight pages are consumed it allocates new extent with data pages. While extent is basic unit storage from database point of view, page is a unit of allocation within extent.
Page header has information like timestamp, next page number, previous page number etc. Data rows are where your actual row data is stored. For every data row there is a row offset which point to that data row.
88
89
90
Note:- Different language will have different sort orders. Case sensitivity If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98. Accent sensitivity If a and A, o and O are treated in the same way, then it is accent-insensitive. A computer treats a and A differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and A 225. The ASCII value of o is 111 and O is 243. Kana Sensitivity When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive. Width sensitivity When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.
91
92
93
2. SQL
Note: - This is one of the crazy things which I did not want to put in my book. But when I did sampling of some real interviews conducted across companies I was stunned to find some interviewer judging developers on syntaxes. I know many people will conclude this is childish but its the interviewers decision. If you think that this chapter is not useful you can happily skip it. But I think on freshers level they should not Note: - I will be heavily using the AdventureWorks database which is a sample database shipped (in previous version we had the famousNorthWind database sample) with SQL Server 2005. Below is a view expanded from SQL Server Management Studio.
INSERT INTO ColorTable (code, colorvalue) VALUES ('b1', 'Brown') DELETE FROM ColorTable WHERE code = b1' UPDATE ColorTable SET colorvalue =Black where code=bl DROP TABLE table-name {CASCADE|RESTRICT} GRANT SELECT ON ColorTable TO SHIVKOIRALA WITH GRANT OPTION REVOKE SELECT, INSERT, UPDATE (ColorCode) ON ColorTable FROM Shivkoirala COMMIT [WORK] ROLLBACK [WORK] Select * from Person.Address Select AddressLine1, City from Person.Address Select AddressLine1, City from Person.Address where city ='Sammamish'
95
96
LEFT OUTER JOIN Left join will display all records in left table of the SQL statement. In SQL below customers with or without orders will be displayed. Order data for customers without orders appears as NULL values. For example, you want to determine the amount ordered by each customer and you need to see who has not ordered anything as well. You can also see the LEFT OUTER JOIN as a mirror image of the RIGHT OUTER JOIN (Is covered in the next section) if you switch the side of each table. SELECT Customers.*, Orders.* FROM Customers LEFT OUTER JOIN Orders ON Customers.CustomerID =Orders.CustomerID RIGHT OUTER JOIN Right join will display all records in right table of the SQL statement. In SQL below all orders with or without matching customer records will be displayed. Customer data for orders without customers appears as NULL values. For example, you want to determine if there are any orders in the data with undefined CustomerID values (say, after a conversion or something like it). You can also see the RIGHT OUTER JOIN as a mirror image of the LEFT OUTER JOIN if you switch the side of each table. SELECT Customers.*, Orders.* FROM Customers RIGHT OUTER JOIN Orders ON Customers.CustomerID =Orders.CustomerID
Using the ORDER BY clause, you either sort the data in ascending manner or descending manner. select * from sales.salesperson order by salespersonid asc select * from sales.salesperson order by salespersonid desc
What is a self-join?
If you want to join two instances of the same table you can use self-join.
98
99
_ operator (During Interview you spell it as Underscore Operator). _ operator is the character defined at that point. In the below sample I have fired a query Select AddressLine1 from person.address where AddressLine1 like '_h%' So all data where second letter is h is returned.
100
union all Select * from person.address This returns 39228 rows ( unionall does not check for duplicates so returns double the record show up)
102
Note: - Selected records should have same data type or else the syntax will not work. Note: - In the coming questions you will see some 5 to 6 questions on cursors. Though not a much discussed topic but still from my survey 5% of interviews have asked questions on cursors. So lets leave no stone for the interviewer to reject us.
What are cursors and what are the situations you will use them?
SQL statements are good for set at a time operation. So it is good at handling set of data. But there are scenarios where you want to update row depending on certain criteria. You will loop through all rows and update data accordingly. Theres where cursors come in to picture.
103
This is a small sample which uses the person.address class. This T-SQL program will only display records which have @Provinceid equal to 7. DECLARE @provinceid int -- Declare Cursor
104
DECLARE provincecursor CURSOR FOR SELECT stateprovinceid FROM Person.Address -- Open cursor OPEN provincecursor -- Fetch data from cursor in to variable FETCH NEXT FROM provincecursor INTO @provinceid WHILE @@FETCH_STATUS = 0 BEGIN -- Do operation according to row value if @Provinceid=7 begin PRINT @Provinceid end -- Fetch the next cursor FETCH NEXT FROM provincecursor INTO @provinceid END -- Finally do not forget to close and deallocate the cursor CLOSE provincecursor DEALLOCATE provincecursor
[LOCAL | GLOBAL] [FORWARD_ONLY | SCROLL] [STATIC | KEYSET | DYNAMIC | FAST_FORWARD] [READ_ONLY | SCROLL_LOCKS | OPTIMISTIC] [TYPE_WARNING] FOR select_statement [FOR UPDATE [OF column_list]] STATIC STATIC cursor is a fixed snapshot of a set of rows. This fixed snapshot is stored in a temporary database. As the cursor is using private snapshot any changes to the set of rows external will not be visible in the cursor while browsing through it. You can define a static cursor using STATIC keyword. DECLARE cusorname CURSOR STATIC FOR SELECT * from tablename WHERE column1 = 2 KEYSET In KEYSET the key values of the rows are saved in tempdb. For instance lets say the cursor has fetched the following below data. So only the supplierid will be stored in the database. Any new inserts happening is not reflected in the cursor. But any updates in the key-set values are reflected in the cursor. Because the cursor is identified by key values you can also absolutely fetch them using FETCH ABSOLUTE 12 FROM mycursor
106
DYNAMIC In DYNAMIC cursor you can see any kind of changes happening i.e. either inserting new records or changes in the existing and even deletes. Thats why DYNAMIC cursors are slow and have least performance. FORWARD_ONLY As the name suggest they only move forward and only a one time fetch is done. In every fetch the cursor is evaluated. That means any changes to the data are known, until you have specified STATIC or KEYSET. FAST_FORWARD These types of cursor are forward only and read-only and in every fetch they are not reevaluated again. This makes them a good choice to increase performance.
territory wise how many sales people are there. So in the second figure I made a group by on territory id and used the count aggregate function to see some meaningful data. Northwest has the highest number of sales personnel.
108
What is ROLLUP?
ROLLUP enhances the total capabilities of GROUP BY clause. Below is a GROUP BY SQL which is applied on SalesorderDetail on Productid and Specialofferid. You can see 707,708,709 etc products grouped according to Specialofferid and the third column represents total according to each pair of Productid and Specialofferid. Now you want to see sub-totals for each group of Productid and Specialofferid.
109
So after using ROLLUP you can see the sub-total. The first row is the grand total or the main total, followed by sub-totals according to each combination of Productid and Specialofferid. ROLLUP retrieves a result set that contains aggregates for a hierarchy of values in selected columns.
110
What is CUBE?
CUBE retrieves a result set that contains aggregates for all combinations of values in the selected columns. ROLLUP retrieves a result set that contains aggregates for a hierarchy of values in selected columns.
111
inner join sales.salesterritory on sales.salesterritory.territoryid=sales.salesperson.territoryid group by sales.salesperson.territoryid,sales.salesterritory.name having count(sales.salesperson.territoryid) >= 2 Note:- You can see the having clause applied. In this case you can not specify it with WHERE clause it will throw an error. In short HAVING clause applies filter on a group while WHERE clause on a simple SQL.
(PERCENT) rows. So what does that sentence mean? See the below figure there are four products p1,p2,p3 and p4. UnitCost of p3 and p4 are same.
So when we do a TOP 3 on the ProductCost table we will see three rows as show below. But even p3 has the same value as p4. SQL just took the TOP 1. So if you want to display tie up data like this you can use WITH TIES.
You can see after firing SQL with WITH TIES we are able to see all the products properly.
114
Note: - You should have an ORDER CLAUSE and TOP keyword specified or else WITH TIES is not of much use.
What is a Sub-Query?
A query nested inside a SELECT statement is known as a subquery and is an alternative to complex join statements. A subquery combines data from multiple tables and returns results that are inserted into the WHERE condition of the main query. A subquery is always enclosed within parentheses and returns a column. A subquery can also be referred to as an inner query and the main query as an outer query. JOIN gives better performance than a subquery when you have to check for the existence of records. For example, to retrieve all EmployeeID and CustomerID records from the ORDERS table that have the EmployeeID greater than the average of the EmployeeID field, you can create a nested query, as shown:
115
SELECT DISTINCT EmployeeID, CustomerID FROM ORDERS WHERE EmployeeID > (SELECT AVG(EmployeeID) FROM ORDERS)
What is ALL and ANY operator? What is a CASE statement in SQL? What does COLLATE Keyword in SQL signify? What is CTE (Common Table Expression)?
CTE is a temporary table created from a simple SQL query. You can say its a view. Below is a simple CTE created PurchaseOrderHeaderCTE from PurchaseOrderHeader. WITH PURCHASEORDERHEADERCTE(Orderdate,Status) as ( Select orderdate,Status from purchasing.PURCHASEORDERHEADER )
116
Select * from PURCHASEORDERHEADERCTE The WITH statement defines the CTE and later using the CTE name I have displayed the CTE data.
Select year(orderdate),Status,isnull(Subtotal,0) from purchasing.PURCHASEORDERHEADER ) Select Status as OrderStatus,isnull([2001],0) as 'Yr 2001' ,isnull([2002],0) as 'Yr 2002' from PURCHASEORDERHEADERCTE pivot (sum(Subtotal) for Orderdate in ([2001],[2002])) as pivoted You can see from the above SQL the top WITH statement is the CTE supplied to the PIVOT. After that PIVOT is applied on subtotal and orderdate. You have to specify in what you want the pivot (here it is 2001 and 2002). So below is the output of CTE table.
After the PIVOT is applied you can see the rows are now grouped column wise with the subtotal assigned to each. You can summarize that PIVOT summarizes your data in cross tab format.
118
What is UNPIVOT?
Its exactly the vice versa of PIVOT. That means you have a PIVOTED data and you want to UNPIVOT it.
What is ROW_NUMBER()?
The ROW_NUMBER() function adds a column that displays a number corresponding the row's position in the query result . If the column that you specify in the OVER clause is not unique, it still produces an incrementing column based on the column specified in the OVER clause. You can see in the figure below I have applied ROW_NUMBER function over column col2 and you can notice the incrementing numbers generated.
What is RANK() ?
The RANK() function works much like the ROW_NUMBER() function in that it numbers records in order. When the column specified by the ORDER BY clause contains unique values, then ROW_NUMBER() and RANK() produce identical results. They differ in the
119
way they work when duplicate values are contained in the ORDER BY expression. ROW_NUMBER will increment the numbers by one on every record, regardless of duplicates. RANK() produces a single number for each value in the result set. You can see for duplicate value it does not increment the row number.
What is DENSE_RANK()?
DENSE_RANK() works the same way as RANK() does but eliminates the gaps in the numbering. When I say GAPS you can see in previous results it has eliminated 4 and 5 from the count because of the gap in between COL2. But for dense_rank it overlooks the gap.
120
What is NTILE()?
NTILE() breaks the result set into a specified number of groups and assigns the same number to each record in a group. Ok NTILE just groups depending on the number given or you can say divides the data. For instance I have said to NTILE it to 3. It has 6 total rows so it grouped in number of 2.
121
Whats the difference between Stored Procedure (SP) and User Defined Function (UDF)?
Following are some major differences between a stored procedure and user defined functions:
122
UDF can be executed using the SELECT clause while SPs can not be. UDF can not be used in XML FOR clause but SPs can be used. UDF does not return output parameters while SPs return output parameters.
If there is an error in UDF its stops executing. But in SPs it just ignores the error and moves to the next statement. UDF can not make permanent changes to server environments while SPs can change some of the server environment.
123
3. .NET Integration
What are steps to load a .NET code in SQL SERVER 2005?
Following are the steps to load a managed code in SQL SERVER 2005: Write the managed code and compile it to a DLL / Assembly. After the DLL is compiled using the CREATE ASSEMBLY command you can load the assembly in to SQL SERVER. Below is the create command which is loading mycode.dll in to SQL SERVER using the CREATE ASSEMBLY command. CREATE ASSEMBLY mycode FROM 'c:/mycode.dll
Assembly_files system tables have the track about which files are associated with what assemblies. SELECT * FROM sys.assembly_files Note :- You can create SQL SERVER projects using VS 2005 which provides ready made templates to make development life easy.
125
Note :- You can see after running the SQL clr enabled property is changed from 0 to 1 , which indicates that the CLR was successfully configured for SQL SERVER.
In previous versions of .NET it was done via COM interface ICorRuntimeHost. In pervious version you can only do the following with the COM interface. Specify that whether its server or workstation DLL. Specify version of the CLR (e.g. version 1.1 or 2.0) Specify garbage collection behavior. Specify whether or not jitted code may be shared across AppDomains.
In .NET 2.0 its done by ICLRRuntimeHost. But in .NET 2.0 you can do much above what was provided by the previous COM interface. Exceptional conditions Code loading Class loading Security particulars Resource allocation
SQL Server uses the ICLRRuntimeHost to control .NET run-time as the flexibility provided by this interface is far beyond what is given by the previous .NET version, and thats what exactly SQL Server needs, a full control of the .NET run time.
Safe Access sandbox This will be the favorite setting of DBA's if they are every compelled to run CLR - Safe access. Safe means you have only access to in-proc data access functionalities. So you can create stored procedures, triggers, functions, data types, triggers etc. But you can not access memory, disk, create files etc. In short you can not hang the SQL Server. External access sandbox In external access you can use some real cool features of .NET like accessing file systems outside the box, you can leverage you classes etc. But here you are not allowed to play around with threading , memory allocation etc. Unsafe access sand box In Unsafe access you have access to memory management, threading etc. So here developers can write unreliable and unsafe code which destabilizes SQL Server. In the first two access levels of sand box its difficult to write unreliable and unsafe code.
129
Note: - This can be pretty confusing during interviews so just make one note One Appdomain per Owner Identity per Database.
130
You have a assembly which is dependent on other assemblies, will SQL Server load the dependent assemblies?
Ok. Let me make the question clearer. If you gave Assembly1.dll who is using Assembly2.dll and you try cataloging Assembly1.dll in SQL Server will it catalog Assembly2.dll also? Yes it will catalog it. SQL Server will look in to the manifest for the dependencies associated with the DLL and load them accordingly. Note: - All Dependent assemblies have to be in the same directory, do not expect SQL Server to go to some other directory or GAC to see the dependencies.
What is Multi-tasking?
Its a feature of modern operating systems with which we can run multiple programs at same time example Word,Excel etc.
What is Multi-threading?
131
Multi-threading forms subset of Multi-tasking instead of having to switch between programs this feature switches between different parts of the same program. Example you are writing in word and at the same time word is doing a spell check in background.
What is a Thread ?
A thread is the basic unit to which the operating system allocates processor time.
so that .NET threads do not consume full resource and go out of control. SQL Server introduced blocking points which allows this transition to happen between SQLCLR and SQL Server threads.
Readers must be wondering why a two time check. There are many things in .NET which are building on runtime and can not be really made out from IL code. So SQL Server makes check at two point while the assembly is cataloged and while its running which ensures 100 % that no runaway code is going to execute. Note : - Read Hostprotectionattribute in next questions.
133
many assemblies , so that SQL Server can be alerted to load those namespaces or not. Example if you look at System.Windows you will see this attribute. So during runtime SQL Server uses reflection mechanism to check if the assembly has valid protection or not. Note :- HostProtection is checked only when you are executing the assembly in SQL Server 2005.
In order that an assembly gets loaded in SQL Server what type of checks are done?
SQL Server uses the reflection API to determine if the assembly is safe to load in SQL Server.
134
Following are the checks done while the assembly is loaded in SQL Server: It does the META data and IL verification, to see that syntaxes are appropriate of the IL. If the assembly is marked as safe and external then following checks are done ! Check for static variables, it will only allow read-only static variables. ! Some attributes are not allowed for SQL Server and those attributes are also checked. ! Assembly has to be type safe that means no unmanaged code or pointers are allowed. ! No finalizers are allowed. Note: - SQL Server checks the assembly using the reflection API, so the code should be IL compliant. You can do this small exercise to check if SQL Server validates your code or not. Compile the simple below code which has static variable defined in it. Now because the static variable is not read-only it should throw an error. using System; namespace StaticDll { public class Class1 { static int i; } } After you have compiled the DLL, use the Create Assembly syntax to load the DLL in SQL Server. While cataloging the DLL you will get the following error:Msg 6211, Level 16, State 1, Line 1 CREATE ASSEMBLY failed because type 'StaticDll.Class1' in safe assembly 'StaticDll' has a static field 'i'. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.
We can do a small practical hand on to see how the assembly table looks like. Lets try to create a simple class class1. Code is as shown below. using System; using System.Collections.Generic; using System.Text; namespace Class1 { public class Class1 { } }
Then we create the assembly by name X1 using the create assembly syntax. In above image is the query output of all three main tables in this sequence sys.assemblies, sys.assembly_files and sys.assembly_references. Note :- In the second select statement we have a content field in which the actual binary data stored. So even if we do not have the actual assembly it will load from this content field.
136
Created the following class and method inside it. public class clscustomer
} } Compiled the project success fully. { Public void add(string code) { } } Note: - The add method signature is now changed. After that using Alter we tried to implement the change. Using alter syntax you can not change public method signatures, in that case you will have to drop the assembly and re-create it again. Using the create assembly cataloged it in SQL Server. Later made the following changes to the class public class clscustomer
138
Note: - If the assembly is referencing any other objects like triggers, stored procedures, UDT, other assemblies then the dependents should be dropped first, or else the drop assembly will fail.
Then my stored procedure should be defined accordingly and in the same order. That means in the stored procedure you should define it in the same order.
If you can see the first two decisions are straight forward. But the third one is where you will have do a code review and see what will go the best. Probably also run it practically, benchmark and see what will be the best choice.
140
So what does that mean? Well if you define a .NET DLL and catalog it in SQL Server. All the methods and class name are case sensitive and assembly is not case sensitive. For instance I have cataloged the following DLL which has the following details: Assembly Name is CustomerAssembly. Class Name in the CustomerAssembly is ClsCustomer. Function GetCustomerCount() in class ClsCustomer.
When we catalog the above assembly in SQL Server. We can not address the ClsCustomer with CLSCUSTOMER or function GetCustomerCount() with getcustomercount() in SQL Server T-SQL language. But assembly CustomerAssembly can be addressed by customerassembly or CUSTOMERASSEMBLY, in short the assemblies are not case sensitive.
141
In .NET we declare decimal datatypes with out precision. But in SQL Server you can define the precision part also. decimal i; --> .NET Definition decimal(9,2) --> SQL Server Definition This creates a conflict when we want the .NET function to be used in T-SQL as SQLCLR and we want the precision facility. Heres the answer you define the precision in SQL Server when you use Create syntax. So even if .NET does not support the precision facility we can define the precision in SQL Server. .NET definition func1(decimal x1) { } SQL Server definition create function func1(@x1 decimal(9,2)) returns decimal as external name CustomerAssembly.[CustomerNameSpace.ClsCustomer].func1 If you see in the above code sample func1 is defined as simple decimal but later when we are creating the function definition in SQL Server we are defining the precision.
142
But for out types of parameters there are no mappings defined. Its logical out types of parameter types does not have any equivalents in SQL Server. Note: - When we define byref in .NET that means if variable value is changed it will be reflected outside the subroutine, so it maps to SQL Server input/output (OUT) parameters.
What is System.Data.SqlServer?
When you have functions, stored procedures etc written in .NET you will use this provider rather than the traditional System.Data.SQLClient. If you are accessing objects created using T-SQL language then you will need a connection to connect them. Because you need to specify which server you will connect, what is the password and other credentials? But if you are accessing objects made using .NET itself you are already residing in SQL Server so you will not need a connection but rather a context.
What is SQLContext?
As said previously when use ADO.NET to execute a T-SQL created stored procedure we are out of the SQL Server boundary. So we need to provide SQLConnection object to connect to the SQLServer. But when we need to execute objects which are created using .NET language we only need the context in which the objects are running.
143
So you can see in the above figure SQLConnection is used because you are completely outside SQL Server database. While SQLContext is used when you are inside SQL Server database. That means that there is already a connection existing so that you can access the SQLContext. And any connections created to access SQLContext are a waste as there is already a connection opened to SQL Server. These all things are handled by SQLContext. Which are the four static methods of SQLContext? Below are the four static methods in SQLContext:GetConnection() :- This will return the current connection GetCommand() :- Get reference to the current batch GetTransaction() :- If you have used transactions this will get the current transaction GetPipe() :- This helps us to send results to client. The output is in Tabular Data stream format. Using this method you can fill in datareader or data set, which can later be used by client to display data. Note: - In top question I had shown how we can manually register the DLLs in SQL Server but in real projects no body would do that rather we will be using the VS.NET studio to accomplish the same. So we will run through a sample of how to deploy DLLs using VS.NET and paralelly we will also run through how to use SQLContext.
Let start step1 go to visual studio --> new project --> expand the Visual C# (+)--> select database, you will see SQL Server project. Select SQL Server project template and give a name to it, then click ok.
As these DLLs need to be deployed on the server you will need to specify the server details also. So for the same you will be prompted to specify database on which you will deploy the .NET stored procedure. Select the database and click ok. In case you do not see the database you can click on Add reference to add the database to the list.
145
Once you specify the database you are inside the visual studio.net editor. At the right hand side you can see the solution explorer with some basic files created by visual studio in order to deploy the DLL on the SQL Server. Right click on SQL Server project and click on ADD --> New items are displayed as shown in figure below.
146
You can see in the below figure you can create different objects using VS.NET. For this point of time we need to only create a stored procedure which will fetch data from Product.Product.
This section is where the real action will happen. As said previously you do not need to open a connection but use the context. So below are the three steps: Get the reference of the context. Get the command from the context. Set the command text, at this moment we need to select everything from Production.Product table. Finally get the Pipe and execute the command.
147
After that you need to compile it to a DLL form and then deploy the code in SQL Server. You can compile using Build Solution menu to compile and Deploy Solution to deploy it on SQL Server.
148
After deploying the solution you can see the stored procedure SelectProductAll in the stored procedure section as shown below.
Just to test I have executed the stored procedure and everything working fine.
In order to create the function you have to select in Visual studio installed templates, User defined function template. Below is the sample code. Then follow again the same procedure of compiling and deploying the solution.
150
SqlDataReader rdr = mycommand.EndExecuteReader(myResult); Note: - Heres a small project which you can do with Asynchronous processing. Fire a heavy duty SQL and in UI show how much time the SQL Server took to execute that query.
There are two types of triggers :INSTEAD OF triggers INSTEAD OF triggers fire in place of the triggering action. For example, if an INSTEAD OF UPDATE trigger exists on the Sales table and an UPDATE statement is executed against the Salestable, the UPDATE statement will not change a row in the sales table. Instead, the UPDATE statement causes the INSTEAD OF UPDATE trigger to be executed, which may or may not modify data in the Sales table. AFTER triggers AFTER triggers execute following the SQL action, such as an insert, update, or delete.This is the traditional trigger which existed in SQL SERVER. INSTEAD OF triggers gets executed automatically before the Primary Key and the Foreign Key constraints are checked, whereas the traditional AFTER triggers gets executed after these constraints are checked. Unlike AFTER triggers, INSTEAD OF triggers can be created on views.
If we have multiple AFTER Triggers on table how can we define the sequence of the triggers ?
If a table has multiple AFTER triggers, then you can specify which trigger should be executed first and which trigger should be executed last using the stored procedure sp_settriggerorder. All the other triggers are in an undefined order which you cannot control.
[ WITH option [ ,,...n ] ] A description of the components of the statement follows. msg_id :-The ID for an error message, which is stored in the error column in sysmessages. msg_str :-A custom message that is not contained in sysmessages. severity :- The severity level associated with the error. The valid values are 025. Severity levels 018 can be used by any user, but 1925 are only available to members of the fixed-server role sysadmin. When levels 1925 are used, the WITH LOG option is required. state A value that indicates the invocation state of the error. The valid values are 0127. This value is not used by SQL Server. Argument, . . . One or more variables that are used to customize the message. For example, you could pass the current process ID (@@SPID) so it could be displayed in the message. WITH option, . . . The three values that can be used with this optional argument are described here. LOG - Forces the error to logged in the SQL Server error log and the NT application log. NOWAIT - Sends the message immediately to the client. SETERROR - Sets @@ERROR to the unique ID for the message or 50,000. The number of options available for the statement make it seem complicated, but it is actually easy to use. The following shows how to create an ad hoc message with a severity of 10 and a state of 1. RAISERROR ('An error occured updating the NonFatal table',10,1) --Results-An error occured updating the NonFatal table The statement does not have to be used in conjunction with any other code, but for our purposes it will be used with the error handling code presented earlier. The following alters the ps_NonFatal_INSERT procedure to use RAISERROR. USE tempdb go
154
ALTER PROCEDURE ps_NonFatal_INSERT @Column2 int =NULL AS DECLARE @ErrorMsgID int INSERT NonFatal VALUES (@Column2) SET @ErrorMsgID =@@ERROR IF @ErrorMsgID <>0 BEGIN RAISERROR ('An error occured updating the NonFatal table',10,1) END When an error-producing call is made to the procedure, the custom message is passed to the client. The following shows the output generated by Query Analyzer.
155
4. ADO.NET
Which are namespaces for ADO.NET?
Following are the namespaces provided by .NET for data management:System.data This contains the basic objects used for accessing and storing relational data, such as DataSet, DataTable and DataRelation. Each of these is independent of the type of data source and the way we connect to it. System.Data.OleDB This contains the objects that we use to connect to a data source via an OLE-DB provider, such as OleDbConnection, OleDbCommand, etc. These objects inherit from the common base classes and so have the same properties, methods, and events as the SqlClient equivalents. System.Data.SqlClient: This contains the objects that we use to connect to a data source via the Tabular Data Stream (TDS) interface of Microsoft SQL Server (only). This can generally provide better performance as it removes some of the intermediate layers required by an OLE-DB connection. System.XML This Contains the basic objects required to create, read, store, write, and manipulate XML documents according to W3C recommendations.
Connection. Command object (This is the responsible object to use stored procedures)
Data Adapter (This object acts as a bridge between datastore and dataset). Datareader (This object reads data from data store in forward only mode).
Dataset object represents disconnected and cached data.If you see the diagram it is not in direct connection with the data store (SQL SERVER , ORACLE etc) rather it talks with Data adapter , who is responsible for filling the dataset.Dataset can have one or more Datatable and relations.
DataView object is used to sort and filter data in Datatable. Note:- This is one of the favorite questions in .NET.Just paste the picture in your mind and during interview try to refer that image.
157
ExecuteNonQuery :- Executes the command defined in the CommandText property against the connection defined in the Connection property for a query that does not return any rows (an UPDATE, DELETE or INSERT). Returning an Integer indicating the number of rows affected by the query. ExecuteReader :- Executes the command defined in the CommandText property against the connection defined in the Connection property. Returns a "reader" object that is connected to the resulting rowset within the database, allowing the rows to be retrieved. ExecuteScalar :- Executes the command defined in the CommandText property against the connection defined in the Connection property. Returns only a single value (effectively the first column of the first row of the resulting rowset). Any other returned columns and rows are discarded. Fast and efficient when only a "singleton" value is required
UpdateBatch method provided by the ADO Recordset object, but in the DataSet it can be used to update more than one table.
Private Sub loadData() Dim strPath As String strPath = AppDomain.CurrentDomain.BaseDirectory Dim objOLEDBCon As New OleDbConnection(Provider=Microsoft.Jet.OLEDB.4.0;Data Source = & strPath & Nwind.mdb) Dim objOLEDBCommand As OleDbCommand Dim objOLEDBReader As OleDbDataReader Try
160
objOLEDBCommand = New OleDbCommand(Select FirstName from Employees) objOLEDBCon.Open() objOLEDBCommand.Connection = objOLEDBCon objOLEDBReader = objOLEDBCommand.ExecuteReader() Do While objOLEDBReader.Read() lstNorthwinds.Items.Add(objOLEDBReader.GetString(0)) Loop Catch ex As Exception Throw ex Finally objOLEDBCon.Close() End Try End Sub
looping through the reader to fill the list box Do While objReader.Read() lstData.Items.Add(objReader.Item(FirstName)) Loop Catch ex As Exception Throw ex Finally objConnection.Close() End Try
Now from interview point of view definitely you are not going to say the whole source code which is given in book. Interviewer expects only the broader answer of what are the steps needed to connect to SQL SERVER. For fundamental sake author has explained the whole source code. In short you have to explain the LoadData method in broader way. Following are the steps to connect to SQL SERVER : First is import the namespace System.Data.SqlClient. Create a connection object as shown in LoadData method.
With objConnection .ConnectionString = strConnectionString .Open() End Withs
Create the command object with the SQL.Also assign the created connection object to command object. and execute the reader.
objCommand = New SqlCommand(Select FirstName from Employees) With objCommand .Connection = objConnection objReader = .ExecuteReader() End With
Finally loop through the reader and fill the list box.If old VB programmers are expecting the movenext command its replaced by Read() which returns true if there is any data to be read.If the .Read() returns false that means that its end of datareader and there is no more data to be read.
ADO.NET provides the SqlCommand object which provides the functionality of executing stored procedures.
If txtEmployeeName.Text.Length = 0 Then objCommand = New SqlCommand(SelectEmployee) Else objCommand = New SqlCommand(SelectByEmployee) objCommand.Parameters.Add(@FirstName, Data.SqlDbType.NVarChar, 200) objCommand.Parameters.Item(@FirstName).Value = txtEmployeeName.Text.Trim() End If
In the above sample not lot has been changed only that the SQL is moved to the stored procedures. There are two stored procedures one is SelectEmployee which selects all the employees and the other is SelectByEmployee which returns employee name starting with a specific character. As you can see to provide parameters to the stored procedures we are using the parameter object of the command object. In such question interviewer expects two simple answers one is that we use command object to execute stored procedures and the parameter object to provide parameter to the stored procedure. Above sample is provided only for getting the actual feel of it. Be short, be nice and get a job.
Config files are the best place to store connection strings. If its a web-based application Web.config file will be used and if its a windows application App.config files will be used.
In such type of questions interviewer is looking from practical angle, that have you worked with dataset and datadapters. Let me try to explain the above code first and then we move to what steps to say suring interview. Dim objConn As New SqlConnection(strConnectionString) objConn.Open() First step is to open the connection.Again note the connection string is loaded from config file. Dim objCommand As New SqlCommand(Select FirstName from Employees) objCommand.Connection = objConn Second step is to create a command object with appropriate SQL and set the connection object to this command. Dim objDataAdapter As New SqlDataAdapter() objDataAdapter.SelectCommand = objCommand
164
Third step is to create the Adapter object and pass the command object to the adapter object. objDataAdapter.Fill(objDataSet) Fourth step is to load the dataset using the Fill method of the dataadapter. lstData.DataSource = objDataSet.Tables(0).DefaultView lstData.DisplayMember = FirstName lstData.ValueMember = FirstName Fifth step is to bind to the loaded dataset with the GUI.At this moment sample has listbox as the UI. Binding of the UI is done by using DefaultView of the dataset.Just to revise every dataset has tables and every table has views. In this sample we have only loaded one table i.e. Employees table so we are referring that with a index of zero. Just say all the five steps during interview and you will see the smile in the interviewers face.....Hmm and appointment letter in your hand.
Twist :- How can we cancel all changes done in dataset?, How do we get values which are changed in a dataset? For tracking down changes Dataset has two methods which comes as rescue GetChanges and HasChanges. GetChanges Returns dataset which are changed since it was loaded or since Acceptchanges was executed. HasChanges This property indicates has any changes been made since the dataset was loaded or acceptchanges method was executed. If we want to revert or abandon all changes since the dataset was loaded use RejectChanges. Note:- One of the most misunderstood things about these properties is that it tracks the changes of actual database. Thats a fundamental mistake; actually the changes are related to only changes with dataset and has nothing to with changes happening in actual database. As dataset are disconnected and do not know anything about the changes happening in actual database.
166
RemoveAt Removes a DataRow object from DataTable depending on index position of the DataTable.
167
DataSet is a disconnected architecture, while DataReader has live connection while reading data. So if we want to cache data and pass to a different tier DataSet forms the best choice and it has decent XML support. When application needs to access data from more than one table DataSet forms the best choice. If we need to move back while reading records, datareader does not support this functionality. But one of the biggest drawbacks of DataSet is speed. As DataSet carry considerable overhead because of relations, multiple tables etc speed is slower than DataReader.Always try to use DataReader wherever possible, as its meant specially for speed performance.
Relations can be added between DataTable objects using the DataRelation object. Above sample code is trying to build a relationship between Customer and Addresses Datatable using CustomerAddresses DataRelation object.
updating and any mismatch in timestamp it will not update the records. This is the best practice used by industries for locking. Update table1 set field1=@test where LastTimeStamp=@CurrentTimeStamp In Check for original values stored in SQL SERVER and actual changed values. stored procedure check before updating that the old data is same as the current. Example in the below shown SQL before updating field1 we check that is the old field1 value same. If not then some one else has updated and necessary action has to be taken. Update table1 set field1=@test where field1 = @oldfield1value Locking can be handled at ADO.NET side or at SQL SERVER side i.e. in stored procedures.for more details of how to implementing locking in SQL SERVER read What are different locks in SQL SERVER? in SQL SERVER chapter. Note:- This is one of the favorite questions of interviewer, so cram it....When I say cram it i do not mean it.... I mean understand it. This book has tried to cover ADO.NET as much as possible, but indeterminist nature of ADO.NET interview questions makes it difficult to make full justice. But hope so that the above questions will make you quiet confident during interviews.
170
Commit or roll back the transaction using the Commit or Rollback method of the transaction object. Close the database connection.
171
5. Notification Services
What are notification services?
Notification services help you to deliver messaging application which can deliver customized messages to huge group of subscribers. In short its a software application which sits between the information and the recipient.
172
Subscriptions: - Subscriptions are nothing but user showing interest in certain events and registering them to the events for information. For instance a user may subscribe to heavy rainfall event. In short subscription links the user and the event. Notifications: - Notification is the actual action which takes place that is message is sent to the actual user who has shown interest in the event. Notification can be in various formats and to variety of devices. Notification engine: - This is the main coordinator who will monitor for any events; if any event occurs it matches with the subscribers and sends the notifications.
In short notification engine is the central engine which manages Events, Subscriptions and Notifications.
173
174
(DB)Which are the two XML files needed for notification services?
What are ADF and ACF XML files for? ADF (Application definition file) and ACF (Application configuration file) are core XML files which are needed to configure Notification Services application. ACF file defines the instance name and the applications directory path of the application. This is the application which runs the notification service.
175
ADF file describes the event, subscription, rules, and notification structure that will be employed by the Notification Services application. After these files have been defined you have to load the ADF file using the command line utility or the UI provided by SQL Server 2005. Click on the server browser as show below and expand the Notification Services, the right click to bring up the Notification Services dialog box.
You will also have to code the logic to add subscription heres a small sample.
176
using Microsoft.SqlServer.NotificationServices; using System.Text; public class NSSubscriptions { private string AddSubscription(string instanceName, string applicationName, string subscriptionClassName, string subscriberId) { NSInstance myNSInstance = new NSInstance(instanceName); NSApplication myNSApplication = new NSApplication (myNSInstance, applicationName); Subscription myNSSubscription = new Subscription (myNSApplication, subscriptionClassName); myNSSubscription.Enabled = true; myNSSubscription.SubscriberId = subscriberId; myNSSubscription["Emailid"] = "shiv_koirala@yahoo.com"; string subscriptionId = myNSSubscription.Add(); return subscriptionId; } } Note: - As this been an interview book its beyond the scope of the book to go in to detail of how to create notification. Its better to create a small sample using MSDN and get some fundamentals clear of how practically Notification services are done. Try to understand the full format of both the XML files.
177
For creating notification services you can either use the dialog box of notification service or use the command line utility Nscontrols command. So just in short Nscontrol is a command-line tool thats used to create and administer Notification Services applications. Note: - You can refer MSDN for Nscontrol commands.
178
6. Service Broker
What do we need Queues?
There are instances when we expect that the other application with which we are interacting are not available. For example when you chat on messaging system like yahoo, MSN, ICQ etc, you do not expect that the other users will be guaranteed online. So there is where we need queues. So during chatting if the user is not online all the messages are sent to a queue. Later when the user comes online he can read all messages from the queue.
message is part of a conversation and it has a unique identifier as well as a unique sequence number to enforce message ordering. Dialog Dialog ensure messages to be read in the same order as they where put in to queue between endpoints. In short it ensures proper ordered sequence of events at both ends for a message. Conversation Group Conversation Group is a logical grouping of Dialog. To complete a task you can need one or more dialog. For instance an online payment gateway can have two Dialogs first is the Address Check and second is the Credit Card Number validation, these both dialog form your complete Payment process. So you can group both the dialogs in one Conversation Group. Message Transport Message transport defines how the messages will be send across networks. Message transport is based on TCP/IP and FTP. There are two basic protocols Binary Adjacent Broker Protocol which is like TCP/IP and Dialog Protocol which like FTP.
180
Further you have to assign these Messagetype to Contract. Messagetype is grouped in Contracts. Contract is an entity which describes messages for a particular Dialog. So a contract can have multiple messagetypes. Contracts are further grouped in service. Service has all the dialogs needed to complete one process. Service can further be attached to multiple queues. Service is the basic object from SQL Server Service broker point of view. So when any client wants to communicate with a queue he opens a dialog with the service.
181
Above figure shows how SQL Server Service broker works. Client who want to use the queues do not have to understand the complexity of queues. They only communicate with the logical view of SQL Server Service broker objects (Messages, Contracts and Services). In turn these objects interact with the queues below and shield the client from any physical complexities of queues. Below is a simple practical implementation of how this works. Try running the below statements from a T-SQL and see the output. -- Create a Message type and do not do any data type validation for this CREATE MESSAGE TYPE MessageType VALIDATION = NONE
182
GO -- Create Message contract what type of users can send these messages at this moment we are defining current as an initiator CREATE CONTRACT MessageContract (MessageType SENT BY INITIATOR) GO -- Declare the two end points thats sender and receive queues CREATE QUEUE SenderQ CREATE QUEUE ReceiverQ GO -- Create service and bind them to the queues CREATE SERVICE Sender ON QUEUE SenderQ CREATE SERVICE Receiver ON QUEUE ReceiverQ (MessageContract) GO -- Send message to the queue DECLARE @conversationHandle UNIQUEIDENTIFIER DECLARE @message NVARCHAR(100) BEGIN BEGIN TRANSACTION; BEGIN DIALOG @conversationHandle FROM SERVICE Sender TO SERVICE 'Receiver' ON CONTRACT MessageContract
183
-- Sending message SET @message = N'SQL Server Interview Questions by Shivprasad Koirala'; SEND ON CONVERSATION @conversationHandle MESSAGE TYPE MessageType (@message) COMMIT TRANSACTION END GO -- Receive a message from the queue RECEIVE CONVERT(NVARCHAR(max), message_body) AS message FROM ReceiverQ -- Just dropping all the object so that this sample can run successfully DROP SERVICE Sender DROP SERVICE Receiver DROP QUEUE SenderQ DROP QUEUE ReceiverQ DROP CONTRACT MessageContract DROP MESSAGE TYPE MessageType GO After executing the above T-SQL command you can see the output below.
184
Note:- In case your SQL Server service broker is not active you will get the following error as shown below. In order to remove that error you have to enable the service broker by using Alter Database [DatabaseName] set Enable_broker At this moment I have created all these samples in the sample database AdventureWorks.
185
186
7. XML Integration
Note: - In this chapter we will first just skim through basic XML interview questions so that you do not get stuck up with simple questions.
What is XML?
XML (Extensible markup language) is all about describing data. Below is a XML which describes invoice data. <?xml version="1.0" encoding="ISO-8859-1"?> <invoice> <productname>Shoes</productname> <qty>12</qty> <totalcost>100</totalcost> <discount>10</discount> </invoice> An XML tag is not something predefined but it is something you have to define according to your needs. For instance in the above example of invoice all tags are defined according to business needs. The XML document is self explanatory, any one can easily understand looking at the XML data what exactly it means.
187
What is CSS?
With CSS you can format a XML document.
What is XSL?
XSL (the eXtensible Stylesheet Language) is used to transform XML document to some other document. So its transformation document which can convert XML to some other document. For instance you can apply XSL to XML and convert it to HTML document or probably CSV files.
189
</xs:complexType> </xs:element> </xs:schema>' After you have created the schema you see the MYXSD schema in the schema collections folder.
Figure 7.2 : - You can view the XSD in explorer of Management Studio
When you create the XML data type you can assign the MyXsd to the column.
191
What is Xquery?
In a typical XML table below is the type of data which is seen. Now I want to retrieve orderid 4. I know many will jump up with saying use the LIKE keyword. Ok you say that interviewer is very sure that you do not know the real power of XML provided by SQL Server.
Well first thing XQUERY is not that something Microsoft invented, its a language defined by W3C to query and manipulate data in a XML. For instance in the above scenario we can use XQUERY and drill down to specific element in XML. So to drill down heres the XQUERY
192
SELECT * FROM xmltable WHERE TestXml.exist('declare namespace xd=http://MyXSD/xd:MyXSD[xd:Orderid eq "4"]') = 1 Note: - Its out of the scope of this book to discuss XQUERY. I hope and only hope guys many interviewers will not bang in this section. In case you have doubt visit www.w3c.org or SQL Server books online they have a lot of material in to this.
193
What is XMLA?
XMLA stand for XML for Analysis Services. Analysis service is covered in depth in data mining and data ware housing chapters. Using XMLA we can expose the Analysis service data to the external world in XML. So that any data source can consume it as XML is universally known.
195
Twist: - What is Star Schema Design? When we design transactional database we always think in terms of normalizing design to its least form. But when it comes to designing for Data warehouse we think more in terms of denormalizing the database. Data warehousing databases are designed using Dimensional Modeling. Dimensional Modeling uses the existing relational database structure and builds on that. There are two basic tables in dimensional modeling: Fact Tables. Dimension Tables.
Fact tables are central tables in data warehousing. Fact tables have the actual aggregate values which will be needed in a business process. While dimension tables revolve around fact tables. They describe the attributes of the fact tables. Lets try to understand these two conceptually.
197
In the above example we have three tables which are transactional tables: Customer: - It has the customer information details. Salesperson: - Sales person who are actually selling products to customer. CustomerSales: - This table has data of which sales person sold to which customer and what was the sales amount.
Below is the expected report Sales / Customer / Month. You will be wondering if we make a simple join query from all three tables we can easily get this output. But imagine if you have huge records in these three tables it can really slow down your reporting process. So we introduced a third dimension table CustomerSalesByMonth which will have foreign key of all tables and the aggregate amount by month. So this table becomes
198
the dimension table and all other tables become fact tables. All major data warehousing design use Fact and Dimension model.
The above design is also called as Star Schema design. Note: - For a pure data warehousing job this question is important. So try to understand why we modeled out design in this way rather than using the traditional approach normalization.
199
Transformation:This process can also be called as cleaning up process. Its not necessary that after the extraction process data is clean and valid. For instance all the financial figures have NULL values but you want it to be ZERO for better analysis. So you can have some kind of stored procedure which runs through all extracted records and sets the value to zero. Loading:After transformation you are ready to load the information in to your final data warehouse database.
201
Data mining is a concept by which we can analyze the current data from different perspectives and summarize the information in more useful manner. Its mostly used either to derive some valuable information from the existing data or to predict sales to increase customer market. There are two basic aims of Data mining: Prediction: - From the given data we can focus on how the customer or market will perform. For instance we are having a sale of 40000 $ per month in India, if the same product is to be sold with a discount how much sales can the company expect. Summarization: - To derive important information to analyze the current business scenario. For example a weekly sales report will give a picture to the top management how we are performing on a weekly basis?
202
The above figure gives a picture how these concepts are quiet different. Data Warehouse collects cleans and filters data through different sources like Excel, XML etc. But Data Mining sits on the top of Data Warehouse database and generates intelligent reports. Now either it can export to a different database or just generate report using some reporting tool like Reporting Services.
What is BCP?
Note: - Its not necessary that this question will be asked for data mining. But if a interviewer wants to know your DBA capabilities he will love to ask this question. If he is a guy who has worked from the old days of SQL Server he will expect this to be answered. There are times when you want to move huge records in and out of SQL Server, theres where this old and cryptic friend will come to use. Its a command line utility. Below is the detail syntax:bcp {[[<database name>.][<owner>].]{<table name>|<view name>}|"<query>"} {in | out | queryout | format} <data file> [-m <maximum no. of errors>] [-f <format file>] [-e <error file>] [-F <first row>] [-L <last row>] [-b <batch size>]
203
[-n] [-c] [-w] [-N] [-V (60 | 65 | 70)] [-6] [-q] [-C <code page>] [-t <field term>] [-r <row term>] [-i <input file>] [-o <output file>] [-a <packet size>] [-S <server name>[\<instance name>]] [-U <login id>] [-P <password>] [-T] [-v] [-R] [-k] [-E] [-h "<hint> [,...n]"] UUUHH Lot of attributes there. But during interview you do not have to remember so much. Just remember that BCP is a utility with which you can do import and export of data.
204
Figure 8.7 : - After executing BCP command prompts for some properties
During BCP we need to change the field position or eliminate some fields how can we achieve this?
For some reason during BCP you want some fields to be eliminated or you want the positions to be in a different manner. For instance you have field1, field2 and field3. You want that field2 should not be imported during BCP. Or you want the sequence to be changed as field2, field1 and then finally field3. This is achieved by using the format file. When we ran the BCP command in the first question it has generated a file with .fmt extension. Below is the FMT file generated in the same directory from where I ran my BCP command.
205
FMT file is basically the format file for BCP to govern how it should map with tables. Lets say, in from our salesperson table we want to eliminate commissionpct, salesytd and saleslastyear. So you have to modify the FMT file as shown below. We have made the values zero for the fields which has to be eliminated.
If we want to change the sequence you have to just change the original sequence number. For instance we have changed the sequence from 9 to 5 --> 5 to 9 , see the figure below.
206
Once you have changed the FMT file you can specify the .FMT file in the BCP command arguments as shown below. bcp adventureworks.sales.salesperson in c:\salesperson.txt c:\bcp.fmt -T Note: - we have given the .FMT file in the BCP command.
Below is a detailed syntax of BULK INSERT. You can run this from SQL Server Management Studio, TSQL or ISQL. BULK INSERT [[database_name.][owner].] {table_name | view_name FROM data_file } [WITH ( [BATCHSIZE [ = batch_size ]] [[,] CHECK_CONSTRAINTS ] [[,] CODEPAGE [ = ACP | OEM | RAW | code_page ]] [[,] DATAFILETYPE [ = {char|native| widechar|widenative }]] [[,] FIELDTERMINATOR [ = field_terminator ]] [[,] FIRSTROW [ = first_row ]] [[,] FIRETRIGGERS [ = fire_triggers ]] [[,] FORMATFILE [ = format_file_path ]]
207
[[,] KEEPIDENTITY ] [[,] KEEPNULLS ] [[,] KILOBYTES_PER_BATCH [ = kilobytes_per_batch ]] [[,] LASTROW [ = last_row ]] [[,] MAXERRORS [ = max_errors ]] [[,] ORDER ( { column [ ASC | DESC ]}[ ,n ])] [[,] ROWS_PER_BATCH [ = rows_per_batch ]] [[,] ROWTERMINATOR [ = row_terminator ]] [[,] TABLOCK ] )] Below is a simplified version of bulk insert which we have used to import a comma separated file in to SalesPersonDummy. The first row is the column name so we specified start importing from the second row. The other two attributes define how the fields and rows are separated. bulk insert adventureworks.sales.salespersondummy from 'c:\salesperson.txt' with ( FIRSTROW=2, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' )
What is DTS?
Note :- Its now a part of integration service in SQL Server 2005. DTS provides similar functionality as we had with BCP and Bulk Import. There are two major problems with BCP and Bulk Import: BCP and Bulk import do not have user friendly User Interface. Well some DBA does still enjoy using those DOS prompt commands which makes them feel doing something worthy.
208
Using BCP and Bulk imports we can import only from files, what if we wanted to import from other database like FoxPro, access, and oracle. That is where DTS is the king. One of the important things that BCP and Bulk insert misses is transformation, which is one of the important parts of ETL process. BCP and Bulk insert allows you to extract and load data, but does not provide any means by which you can do transformation. So for example you are getting sex as 1 and 2, you would like to transform this data to M and F respectively when loading in to data warehouse. It also allows you do direct programming and write scripts by which you can have huge control over loading and transformation process. It allows lot of parallel operation to happen. For instance while you are reading data you also want the transformation to happen in parallel , then DTS is the right choice.
You can see DTS Import / Export wizard in the SQL Server 2005 menu.
Note: - DTS is the most used technology when you are during Data warehousing using SQL Server. In order to implement the ETL fundamental properly Microsoft has rewritten the whole DTS from scratch using .NET and named it as Integration Services. There is a complete chapter which is dedicated to Integration Services which will cover DTS indirectly in huge details. Any interviewer who is looking for data warehousing professional in SQL Server 2005 will expect that candidates should know DTS properly.
(DB)Can you brief about the Data warehouse project you worked on?
209
Note: - This question is the trickiest and shoot to have insight, from where the interviewer would like to spawn question threads. If you have worked with a data warehouse project you can be very sure of this. If not then you really have to prepare a project to talk about. I know its unethical to even talk in books but? I leave this to readers as everyone would like to think of a project of his own. But just try to include the ETL process which every interviewer thinks should be followed for a data warehouse project.
210
Transactions are mainly batch transactions which are running so there are no huge volumes of transaction. Do not need to have recovery process as such until the project specifies specifically.
return results. As we are not going through any joins (because data is in denormalized form) SQL queries are executed faster and in more optimized way.
212
and ROLAP. However, unlike MOLAP and ROLAP, which follow well-defined standards, HOLAP has no uniform implementation.
All Meta data is stored in system tables MSDB. META data can be accessed using repository API, DSO (Decision Support Objects).
213
The above table gives a three dimension view; you can have more dimensions according to your depth of analysis. Like from the above multi-dimension view I am able to predict that Calcutta is the only place where Shirts and Caps are selling, other metros do not show any sales for this product.
214
(DB)What is MDX?
MDX stands for multi-dimensional expressions. When it comes to viewing data from multiple dimensions SQL lacks many functionalities, theres where MDX queries are useful. MDX queries are fired against OLAP data bases. SQL is good for transactional databases (OLTP databases), but when it comes to analysis queries MDX stands the top. Note: - If you are planning for data warehousing position using SQL Server 2005, MDX will be the favorite of the interviewers. MDX itself is such a huge and beautiful beast that we cannot cover in this small book. I will suggest at least try to grab some basic syntaxes of MDX like select before going to interview.
215
Once you are ok with requirement its time to select which tools can do good work for you. This book only focuses on SQL Server 2005, but in reality there are many tools for data warehousing. Probably SQL Server 2005 will sometimes not fit your project requirement and you would like to opt for something else. Data Modeling and design This where the actual designing takes place. You do conceptual and logical designing of your database, star schema design. ETL Process This forms the major part for any data warehouse project. Refer previous section to see what an ETL process is. ETL is the execution phase for a data warehouse project. This is the place where you will define your mappings, create DTS packages, define work flow, write scripts etc. Major issue when we do ETL process is about performance which should be considered while executing this process. Note: - Refer Integration Services for how to do the ETL process using SQL Server 2005. OLAP Cube Design This is the place where you define your CUBES, DIMENSIONS on the data warehouse database which was loaded by the ETL process. CUBES and DIMENSIONS are done by using the requirement specification. For example you see that customer wants a report Sales Per month so he can define the CUBES and DIMENSIONS which later will be absorbed by the front end for viewing it to the end user. Front End Development Once all your CUBES and DIMENSIONS are defined you need to present it to the user. You can build your front ends for the end user using C#, ASP.NET, VB.NET any language which has the ability to consume the CUBES and DIMENSIONS. Front end stands on top of CUBES and DIMENSION and delivers the report to the end users. With out any front end the data warehouse will be of no use form users perspective. Performance Tuning Many projects tend to overlook this process. But just imagine a poor user sitting to view Yearly Sales for 10 minutes.frustrating no. There are three sections where you can really look why your data warehouse is performing slow:216
While data is loading in database ETL process. This is probably the major area where you can optimize your database. The best is to look in to DTS packages and see if you can make it better to optimize speed.
OLAP CUBES and DIMENSIONS. CUBES and DIMENSIONS are something which will be executed against the data warehouse. You can look in to the queries and see if some optimization can be done.
Front end code. Front end are mostly coded by programmers and this can be a major bottle neck for optimization. So you can probably look for loops and you also see if the front end is running too far away from the CUBES.
UAT means saying to the customer Is this product ok with you?. Its a testing phase which can be done either by the customer (and mostly done by the customer) or by your own internal testing department to ensure that its matches with the customer requirement which was gathered during the requirement phase. Rolling out to Production Once the customer has approved your UAT, its time to roll out the data ware house in production so that customer can get the benefit of it. Production Maintenance I know the most boring aspect from programmers point of view, but the most profitable for an IT company point of view. In data warehousing this will mainly involve doing back ups, optimizing the system and removing any bugs. This can also include any enhancements if the customer wants it.
217
218
Requirement phase: - System Requirement documents, Project management plan, Resource allocation plan, Quality management document, Test plans and Number of reports the customer is looking at. I know many people from IT will start raising there eye balls hey do not mix the project management with requirement gathering. But thats a debatable issue I leave it to you guys if you want to further split it. Tool Selection: - POC (proof of concept) documents comparing each tool according to project requirement. Note: - POC means can we do?. For instance you have a requirement that, 2000 users at a time should be able to use your data warehouse. So you will probably write some sample code or read through documents to ensure that it does it.
Data modeling: - Logical and Physical data model diagram. This can be ER diagrams or probably some format which the client understands. ETL: - DTS packages, Scripts and Metadata. OLAP Design:-Documents which show design of CUBES / DIMENSIONS and OLAP CUBE report. Front end coding: - Actual source code, Source code documentation and deployment documentation. Tuning: - This will be a performance tuning document. What performance level we are looking at and how will we achieve it or what steps will be taken to do so. It can also include what areas / reports are we targeting performance improvements. UAT: - This is normally the test plan and test case document. It can be a document which has steps how to create the test cases and expected results. Production: - In this phase normally the entire data warehouse project is the deliverable. But you can also have handover documents of the project, hardware, network settings, in short how is the environment setup. Maintenance: - This is an on going process and mainly has documents like error fixed, issues solved, within what time the issues should be solved and within what time it was solved.
with a small project. For this complete explanation I am taking the old sample database of Microsoft NorthWind. First and foremost ensure that your service is started so go to control panel, services and start the Analysis Server service.
As said before we are going to use NorthWind database for showing analysis server demo.
220
We are not going use all tables from NorthWind. Below are the only tables we will be operating using. Leaving the FactTableCustomerByProduct all other tables are self explanatory. Ok I know I have still not told you what we want to derive from this whole exercise. We will try to derive a report how much products are bought by which customer and how much products are sold according to which country. So I have created the fact table with three fields Customerid , Productid and the TotalProducts sold. All the data in Fact table I have loaded from Orders and Order Details. Means I have taken all customerid and productid with there respective totals and made entries in Fact table.
221
Ok I have created my fact table and also populated using our ETL process. Now its time to use this fact table to do analysis. So lets start our BI studio as shown in figure below.
222
I have name the project as AnalysisProject. You can see the view of the solution explorer. Data Sources :- This is where we will define our database and connection.
To add a new data Source right click and select new Data Source.
223
After that Click next and you have to define the connection for the data source which you can do by clicking on the new button. Click next to complete the data source process.
224
After that its time to define view. Data Source View: - Its an abstraction view of data source. Data source is the complete database. Its rare that we will need the complete database at any moment of time. So in data source view we can define which tables we want to operate on. Analysis server never operates on data source directly but it only speaks with the Data Source view.
225
So here we will select only two tables Customers, Products and the fact table.
226
We had said previously fact table is a central table for dimension table. You can see products and customers table form the dimension table and fact table is the central point. Now drag and drop from the Customerid of fact table to the Customerid field of the customer table. Repeat the same for the productid table with the products table.
Check Autobuild as we are going to let the analysis service decide which tables he want to decide as fact and Dimension tables.
227
After that comes the most important step which are the fact tables and which are dimension tables. SQL Analysis services decides by itself, but we will change the values as shown in figure below.
228
229
230
231
Cube Builder Works with the cube measures Dimensions Works with the cube dimensions Calculations Works with calculations for the cube KPIs Works with Key Performance Indicators for the cube Actions Works with cube actions Partitions Works with cube partitions Perspectives Works with views of the cube Translations Defines optional transitions for the cube Browser Enables you to browse the deployed cube
Once you are done with the complete process drag drop the fields as shown by the arrows below.
Figure 8.32: - Drag and Drop the fields over the designer
232
Once you have dragged dropped the fields you can see the wonderful information unzipped between which customer has bought how many products.
233
This is the second report which says in which country I have sold how many products.
234
Note: - I do not want my book to increase pages just because of images but sometimes the nature of the explanation demands it. Now you can just summarize to the interviewer from the above steps how you work with analysis services.
What are the different problems that Data mining can solve?
There are basically four problems that Data mining can solve:Analyzing Relationships This term is also often called as Link Analysis. For instance one of the companies who sold adult products did an age survey of his customers. He found his entire products
235
where bought by customers between age of 25 29. He further became suspicious that all of his customers must have kids around 2 to 5 years as thats the normal age of marriage. He analyzed further and found that maximum of his customers where married with kids. Now the company can also try selling kid products to the same customer as they will be interested in buying it, which can tremendously boost up his sales. Now here the link analysis was done between the age and kids decide a marketing strategy. Choosing right Alternatives If a business wants to make a decision between choices data mining can come to rescue. For example one the companies saw a major resignation wave in his company. So the HR decided to have a look at employees joining date. They found that major of the resignations have come from employees who have stayed in the company for more than 2 years and there where some resignations from fresher. So the HR made decision to motivate the freshers rather than 2 years completed employees to retain people. As HR thought its easy to motivate freshers rather than old employees. Prediction Prediction is more about forecasting how the business will move ahead. For instance company has sold 1000 Shoe product items, if the company puts a discount on the product sales can go up to 2000. Improving the current process. Past data can be analyzed to view how we can improve the business process. For instance for past two years company has been distributing product X using plastic bags and product Y using paper bags. Company has observed closely that product Y sold the same amount as product X but has huge profits. Company further analyzed that major cost of product X was due to packaging the product in plastic bags. Now the company can improve the process by using the paper bags and bringing down the cost and thus increasing profits.
on number of tickets sold , but if its a huge travel companies with lot of agents he would like to see it with number of tickets / Agent sold. If its a different industry together like bank they would like to see actual amount of transactions done per day. There can be several models which a company wants to look into. For instance in our previous travel company model, they would like to have the following metrics: Ticket sold per day Number of Ticket sold per agent Number of ticket sold per airlines Number of refunds per month What attribute you want to measure and predict? What type of relationship you want to explore? In our travel company example you would like to explore relationship between Number of tickets sold and Holiday patterns of a country.
Preprocessing and Transforming Data This can also be called as loading and cleaning of data or to remove unnecessary information to simplify data. For example you will be getting data for title as "Mr.", "M.r.", "Miss", "Ms" etc ... Hmm can go worst if these data are maintained in numeric format "1", "2", "6" etc...This data needs to be cleaned for better results. You also need to consolidate data from various sources like EXCEL, Delimited Text files; any other databases (ORACLE etc). Microsoft SQL Server 2005 Integration Services (SSIS) contains tools which can be used for cleaning and consolidating from various services. Note: - Data warehousing ETL process is a subset of this section. Exploring Models Data mining / Explore models means calculating the min and max values, look in to any serious deviations that are happening, and how is the data distrubuted. Once you see the data you can look in to if the data is flawed or not. For instance normal hours in a day is
237
24 and you see some data has more than 24 hours which is not logical. You can then look in to correcting the same. Data Source View Designer in BI Development Studio contains tools which can let you analyze data. Building Models Data derived from Exploring models will help us to define and create a mining model. A model typically contains input columns, an identifying column, and a predictable column. You can then define these columns in a new model by using the Data Mining Extensions (DMX) language or the Data Mining Wizard in BI Development Studio. After you define the structure of the mining model, you process it, populating the empty structure with the patterns that describe the model. This is known as training the model. Patterns are found by passing the original data through a mathematical algorithm. SQL Server 2005 contains a different algorithm for each type of model that you can build. You can use parameters to adjust each algorithm. A mining model is defined by a data mining structure object, a data mining model object, and a data mining algorithm. Verification of the models. By using viewers in Data Mining Designer in BI Development Studio you can test / verify how well these models are performing. If you find you need any refining in the model you have to again iterate to the first step.
238
239
MODEL is extracting and understanding different patterns from a data. Once the patterns and trends of how data behaves are known we can derive a model from the same. Once these models are decided we can see how these models can be helpful for prediction / forecasting, analyzing trends, improving current process etc.
240
Based on the above data we have made the following decision tree. So you can see decision tree takes data and then start applying attribute comparison on every node recursively.
241
Age 18-25 always buys internet connection, irrelevant of income. Income drawers above 5000 always buy internet connection, irrelevant of age.
Using this data we have made predictions that if we market using the above criterias we can make more Internet Connection sales. So we have achieved two things from Decision tree:Prediction If we market to age groups between 32-40 and income below 5000 we will not have decent sales. If we target customer with Age group 18-25 we will have good sales. All income drawers above 5000 will always have sales.
242
If you look at the sample we can say that 80 % of time customer who buy pants also buys shirts. P (Shirt | Pants) = 0.8 Customer who buys shirts are more than who buys pants , we can say 1 of every 10 customer will only buy shirts and 1 of every 100 customer will buy only pants. P (Shirts) = 0.1 P (Pants) = 0.01 Now suppose we a customer comes to buys pants how much is the probability he will buy a shirt and vice-versa. According to theorem:Probability of buying shirt if bought pants = 0.8-0.01 / 0.1=7.9 Probability of buying pants if bought shirts = 0.8-0.1 / 0.01=70 So you can see if the customer is buying shirts there is a huge probability that he will buy pants also. So you can see nave bayes algorithm is use for predicting depending on existing data.
Exclusive: A member belongs to only one cluster. Overlapping: A member can belong to more than one cluster. Probabilistic: A member can belong to every cluster with a certain amount of probability. Hierarchical: Members are divided into hierarchies, which are sub-divided into clusters at a lower level.
244
Above is the figure which shows a neuron model. We have inputs (I1, I2 IN) and for every input there are weights (W1, W2 . WN) attached to it. The ellipse is the NEURON. Weights can have negative or positive values. Activation value is the summation and multiplication of all weights and inputs coming inside the nucleus. Activation Value = I1 * W1 + I2 * W2+ I3 * W3+ I4 * W4 IN * WN There is threshold value specified in the neuron which evaluates to Boolean or some value, if the activation value exceeds the threshold value.
245
So probably feeding a customer sales records we can come out with an output is the sales department under profit or loss.
For instance take the case of the top customer sales data. Below is the neural network defined for the above data.
246
You can see neuron has calculated the total as 5550 and as its greater than threshold 2000 we can say the company is under profit. The above example was explained for simplification point of view. But in actual situation there can many neurons as shown in figure below. Its a complete hidden layer from the data miner perspective. He only looks in to inputs and outputs for that scenario.
you are expecting values between 0 to 6000 maximum). So you can always go back and look at whether you have some wrong input or weights. So the error is again Fed back to the neural network and the weights are adjusted accordingly. This is also called training the model.
Microsoft Decision Trees Algorithm Microsoft Naive Bayes Algorithm Microsoft Clustering Algorithm Microsoft Neural Network Algorithm Microsoft Decision Trees Algorithm Microsoft Time Series Algorithm
Predicting a sequence, for example, to perform a click stream analysis of a company's Web site. Microsoft Sequence Clustering Algorithm Finding groups of common items in transactions, for example, to use market basket analysis to suggest additional products to a customer for purchase. Microsoft Association Algorithm Microsoft Decision Trees Algorithm
Finding groups of similar items, for example, to segment demographic data into groups to better understand the relationships between attributes. Microsoft Clustering Algorithm Microsoft Sequence Clustering Algorithm
Why we went through all these concepts is when you create data mining model you have to specify one the algorithms. Below is the snapshot of all SQL Server existing algorithms.
249
Note: - During interviewing its mostly the theory that counts and the way you present. For datamining I am not showing any thing practical as such probably will try to cover this thing in my second edition. But its a advice please do try to run make a small project and see how these techniques are actually used.
Lets start from the most left hand side of the image. First section comes is the transaction database. This is the database in which you collect data. Next process is the ETL process. This section extracts data from the transactional database and sends to your data warehouse which is designed using STAR or SNOW FLAKE model. Finally when your data warehouse data is loaded in data warehouse, you can use SQL Server tools like OLAP, Analysis Services, BI, Crystal reports or reporting services to finally deliver the data to the end user. Note: - Interviewer will always try goof you up saying why should not we run OLAP, Analysis Services, BI, Crystal reports or reporting services directly on the transactional data. That is because transactional database are in complete normalized form which can make the data mining process complete slow. By doing data warehousing we denormalize the data which makes the data mining process more efficient.
What is XMLA?
251
XML for Analysis (XMLA) is fundamentally based on web services and SOAP. Microsoft SQL Server 2005 Analysis Services uses XMLA to handle all client application communications to Analysis Services. XML for Analysis (XMLA) is a Simple Object Access Protocol (SOAP)-based XML protocol, designed specifically for universal data access to any standard multidimensional data source residing on the Web. XMLA also eliminates the need to deploy a client component that exposes Component Object Model (COM) or Microsoft .NET Framework.
252
9. Integration Services/DTS
Note: - We had seen some question on DTS in the previous chapter Data Warehousing. But in order to just make complete justice with this topic I have included them in integration services.
253
Next step is to specify from which source you want to copy data. You have to specify the Data source name and server name. For understanding purpose we are going to move data between AdventureWork databases. I have created a dummy table called as SalesPersonDummy which has the same structure as that of SalesPerson table. But the only difference is that SalesPersonDummy does not have data.
254
Next step is to specify the destination where the source will be moved. At this moment we are moving data inside AdventureWorks itself so specify the same database as the source.
255
256
Next step is to specify option from where you want to copy data. For the time being we going to copy from table, so selected the first option.
Finally choose which object you want to map where. You can map multiple objects if you want.
257
When everything goes successful you can see the below screen, which shows the series of steps DTS has gone through.
258
259
While DTP acts as bridge DTR controls you integration service. They are more about how will be the workflow and different components during transformation. Below are different components associated with DTR:-
260
Container: - Container logically groups task. For instance you have a task to load CSV file in to database. So you will have two or three task probably : Parse the CSV file. Check for field data type Map the source field to the destination.
So you can define all the above work as task and group them logically in to a container called as Container. Package: - Package are executed to actually do the data transfer. Note : -I can hear the shout practical.. practical. I think I have confused you guys over there. So lets warm up on some practical DTS stuff. 1000 words is equal to one compiled program Shivprasad Koirala ? I really want to invent some proverbs if you do not mind it. DTP and DTR model expose API which can be used in .NET language for better control.
261
Give name to the project as Salesperson project. Before moving ahead let me give a brief about what we are trying to do. We are going to use Sales.SalesPerson table from the adventureworks database. Sales.Salesperson table has field called as Bonus. We have the following task to be accomplished:Note: - These both tables have to be created manually by you. I will suggest to use the create statements and just make both tables. You can see in the image below there are two tables SalesPerson5000 and SalesPersonNot5000. Whenever Bonus field is equal to 5000 it should go inSales.Salesperson5000.
262
One you selected the Data transformation project , you will be popped with a designer explorer as show below. I understand you must be saying its crypticit is. But lets try to simplify it. At the right hand you can see the designer pane which has lot of objects on it. At right hand side you can see four tabs (Control flow, Data Flow, Event handlers and Package Explorer). Control flow: - It defines how the whole process will flow. For example if you loading a CSV file. Probably you will have task like parsing, cleaning and then loading. You can see lot of control flow items which can make your data mining task easy. But first we have to define a task in which we will define all our data flows. So you can see the curve arrow which defines what you have to drag and drop on the control flow designer. You can see the arrow tip which defines the output point from the task.
263
In this project I have only define one task, but in real time project something below like this can be seen (Extraction, Transformation and Loading: - ETL). One task points as a input to other task and the final task inputs data in SQL Server.
264
Data Flow: - Data flow say how the objects will flow inside a task. So Data flow is subset of a task defining the actual operations. Event Handlers: - The best of part of DTS is that we can handle events. For instance if there is an error what action do you want it to do. Probably log your errors in error log table, flat file or be more interactive send a mail.
Now that you have defined your task its time to define the actual operation that will happen with in the task. We have to move data from Sales.SalesPerson to Sales.SalesPerson5000 (if their Bonus fields are equal to 5000) and Sales.SalesPersonNot5000 (if their Bonus fields are not equal to 5000). In short we have Sales.SalesPerson as the source and other two tables as Destination. So click on
265
the Data Flow tab and drag the OLEDB Source data flow item on the designer, we will define source in this item. You can see that there is some error which is shown by a cross on the icon. This signifies that you need to specify the source table that is Sales.Salesperson.
In order to specify source tables we need to specify connections for the OLEDB source. So right click on the below tab Connections and select New OLEDB Connection. You will be popped up with a screen as show below. Fill in all details and specify the database as AdventureWorks and click OK.
266
If the connection credentials are proper you can see the connection in the Connections tab as shown in below figure.
267
Now that we have defined the connection we have to associate that connection with the OLE DB source. So right click and select the Edit menu.
Once you click edit you will see a dialog box as shown below. In data access mode select Table or View and select the Sales.Salesperson table. To specify the mapping click on Columns tab and then press ok.
268
If the credentials are ok you can see the red Cross is gone and the OLE DB source is not ready to connect further. As said before we need to move data to appropriate tables on condition that Bonus field value. So from the data flow item drag and drop the Conditional Split data flow item.
Right click on the Conditional Split data flow item so that you can specify the criteria. It also gives you a list of fields in the table which you can drag drop. You can also drag
269
drop the operators and specify the criteria. I have made two outputs from the conditional split one which is equal to 5000 and second not equal to 5000.
Conditional split now has two outputs one which will go in Sales.SalesPerson5000 and other in Sales.SalesPersonNot5000. So you have to define two destination and the associate respective tables to it. So drag two OLE DB destination data flow items and connect it the two outputs of conditional split.
270
When you drag from the conditional split items over OLEDB destination items it will pop up a dialog to specify which output this destination has to be connected. Select the one from drop down and press ok. Repeat this step again for the other destination object.
271
Its time to build and run the solution which you can do from the drop down. To run the DTS you press the green icon as pointed by arrow in the below figure. After you run query both the tables have the appropriate values or not.
272
Note: - You can see various data flow items on the right hand side; its out of the scope to cover all items ( You must be wondering how much time this author will say out of scope , but its fact guys something you have to explore). In this sample project we needed the conditional split so we used it. Depending on projects you will need to explore the toolbox. Its rare that any interviewer will ask about individual items but rather ask fundamentals or general overview of how you did DTS.
273
10. Replication
Whats the best way to update data between SQL Servers?
By using Replication we can solve this problem. Many of the developers end up saying DTS, BCP or distributed transaction management. But this is one of the most reliable ways to maintain consistency between databases.
What are the scenarios you will need multiple databases with schema?
Following are the situations you can end up in to multi-databases architecture:24x7 Hours uptime systems for online systems This can be one of the major requirements for duplicating SQL Servers across network. For instance you have a system which is supposed to be 24 hours online. This system is hosted in a central database which is far away in terms of Geographics. As said first that this system should be 24 hours online, in case of any break over from the central server we hosted one more server which is inside the premises. So the application detects that it can not connect to the online server so it connects to the premises server and continues working. Later in the evening using replication all the data from the local SQL Server is sent to the central server. License problems SQL Server per user usage has a financial impact. So many of the companies decide to use MSDE which is free, so that they do not have to pay for the client licenses. Later every evening or in some specific interval this all data is uploaded to the central server using replication. Note: - MSDE supports replication. Geographical Constraints It is if the central server is far away and speed is one of the deciding criteria.
274
Reporting Server In big multi-national sub-companies are geographically far away and the management wants to host a central reporting server for the sales, which they want to use for decision making and marketing strategy. So here the transactional SQL Servers entire database is scattered across the sub-companies and then weekly or monthly we can push all data to the central reporting server.
You can see from the above figure how data is consolidated in to a central server which is hosted in India using replication.
275
to know Sales by Customer. To achieve this you do not need the whole database on reporting server, from the above you will only need Sales and Customer tables. Frequency planning As defined in the top example lets say management wants only Sales by Customer weekly, so you do not need to update every day , rather you can plan weekly. But if the top management is looking for Sales by Customer per day then probably your frequency of updates would be every night. Schema should not have volatile baseline Note: - I like this word baseline it really adds weight while speaking as a project manager. Its mainly used to control change management in projects. You can say Baseline is a process by which you can define a logical commit to a document. For example you are coding a project and you have planned different versions for the project. So after every version you do a baseline and create a setup and deploy to the client side. Any changes after this will be a new version. One of the primary requirements of a replication is that the schemas which should be replicated across should be consistent. If you are keeping on changing schema of the server then replication will have huge difficulty in synchronizing. So if you are going to have huge and continuous changes in the database schema rethink over replication option. Or else a proper project management will help you solve this.
276
Disadvantages:
Read-only data are the best candidates for snapshot replication. Master tables like zip code, pin code etc are some valid data for snapshot replication.
279
280
Merge agent stands in between subscriber and publisher. Any conflicts are resolved through merge agent in turn which uses conflict resolution. Depending how you have configured the conflict resolution the conflicts are resolved by merge agent.
There can be practical situations where same row is affected by one or many publishers and subscribers. During such critical times Merge agent will look what conflict resolution is defined and make changed accordingly. SQL Server uniquely identifies a column using globally unique identifier for each row in a published table. If the table already has a uniqueidentifier column, SQL Server will automatically use that column. Else it will add a rowguid column to the table and create an index based on the column. Triggers will be created on the published tables at both the Publisher and the Subscribers. These are used to track data changes based on row or column changes.
282
283
284
285
286
287
Can you explain how can we make a simple report in reporting services?
We will be using AdventureWorks database for this sample. We would like to derive a report how much quantity sales were done per product. For this sample we will have to refer three tables Salesorderdetails, Salesorderheader and product table. Below is the SQL which also shows what the relationship between those tables is:select production.product.Name as ProductName, count(*) as TotalSales from sales.salesorderdetail inner join Sales.Salesorderheader on Sales.Salesorderheader.salesorderid= Sales.Salesorderdetail.Salesorderid inner join production.product on production.product.productid=sales.salesorderdetail.productid group by production.product.Name So we will be using the above SQL and trying to derive the report using reporting services. First click on business intelligence studio menu in SQL Server 2005 and say File --> New --> Project. Select the Report project wizard. Lets give this project name TotalSalesByProduct. You will be popped with a startup wizard as shown below.
288
Click next and you will be prompted to input data source details like type of server, connection string and name of data source. If you have the connection string just paste it on the text area or else click edit to specify connection string values through GUI.
289
As we are going to use SQL Server for this sample specify OLEDB provider for SQL Server and click next.
290
After selecting the provider specify the connection details which will build your connection string. You will need to specify the following details Server Name, Database name and security details.
291
This is the most important step of reporting services, specifying SQL. You remember the top SQL we had specified the same we are pasting it here. If you are not sure about the query you can use the query builder to build your query.
292
293
Now its the time to include the fields in reports. At this moment we have only two fields name of product and total sales.
Finally you can preview your report. In the final section there are three tabs data, layout and preview. In data tab you see your SQL or the data source. In layout tab you can design your report most look and feel aspect is done in this section. Finally below is the preview where you can see your results.
294
295
You have to also specify the command type from the data tab.
Figure 11.10 : - Specify the command type from the Data tab.
Reporting Services is not a stand alone system but rather a group of server sub-system which work together for creation, management, and deployment of reports across the enterprise.
Report designer This is an interactive GUI which will help you to design and test your reports. Reporting Service Database After the report is designed they are stored in XML format. These formats are in RDL (Report Design Layout) formats. These entire RDL format are stored in Report Service Database. Report Server Report Server is nothing but an ASP.NET application running on IIS Server. Report Server renders and stores these RDL formats.
297
Report Manager Its again an ASP.NET web based application which can be used by administrators to control security and managing reports. From administrative perspective who have the authority to create the report, run the report etc... You can also see the various formats which can be generated XML, HTML etc using the report server.
298
299
Above is a sample diagram which explains how B-Tree fundamental works. The above diagram is showing how index will work for number from 1-50. Lets say you want to search 39. SQL Server will first start from the first node i.e. root node. It will see that the number is greater than 30, so it moves to the 50 node. Further in Non-Leaf nodes it compares is it more than 40 or less than 40. As its less than 40 it loops through the leaf nodes which belong to 40 nodes.
You can see that this is all attained in only two stepsfaster aaah. That is how exactly indexes work in SQL Server.
I have a table which has lot of inserts, is it a good database design to create indexes on that table?
Twist: - Inserts are slower on tables which have indexes, justify it? Twist: - Why do page splitting happen? B-Tree stands for balanced tree. In order that B-tree fundamental work properly both of the sides should be balanced. All indexing fundamentals in SQL Server use B-tree fundamental. Now whenever there is new data inserted or deleted the tree tries to become unbalance. In order that we can understand the fundamental properly lets try to refer the figure down.
300
If you see the first level index there is 2 and 8, now let say we want to insert 6. In order to balance the B-TREE structure rows it will try to split in two pages, as shown. Even though the second page split has some empty area it will go ahead because the primary thing for him is balancing the B-TREE for fast retrieval. Now if you see during the split it is doing some heavy duty here: Creates a new page to balance the tree. Shuffle and move the data to pages.
So if your table is having heavy inserts that means its transactional, then you can visualize the amount of splits it will be doing. This will not only increase insert time but will also upset the end-user who is sitting on the screen. So when you forecast that a table has lot of inserts its not a good idea to create indexes.
301
These are ways by which SQL Server searches a record or data in table. In Table Scan SQL Server loops through all the records to get to the destination. For instance if you have 1, 2, 5, 23, 63 and 95. If you want to search for 23 it will go through 1, 2 and 5 to reach it. Worst if it wants to search 95 it will loop through all the records. While for Index Scans it uses the B-TREE fundamental to get to a record. For BTREE refer previous questions. Note: - Which way to search is chosen by SQL Server engine. Example if it finds that the table records are very less it will go for table scan. If it finds the table is huge it will go for index scan.
What are the two types of indexes and explain them in detail?
Twist: - Whats the difference between clustered and non-clustered indexes? There are basically two types of indexes: Clustered Indexes. Non-Clustered Indexes.
Ok every thing is same for both the indexes i.e. it uses B-TREE for searching data. But the main difference is the way it stores physical data. If you remember the previous figure (give figure number here) there where leaf level and non-leaf level. Leaf level holds the key which is used to identify the record. And non-leaf level actually point to the leaf level. In clustered index the non-leaf level actually points to the actual data.
302
In Non-Clustered index the leaf nodes point to pointers (they are rowids) which then point to actual data.
303
So heres what the main difference is in clustered and non-clustered , in clustered when we reach the leaf nodes we are on the actual data. In non-clustered indexes we get a pointer, which then points to the actual data. So after the above fundamentals following are the basic differences between them: Also note in clustered index actual data as to be sorted in same way as the clustered indexes are. While in non-clustered indexes as we have pointers which is logical arrangement we do need this compulsion. So we can have only one clustered index on a table as we can have only one physical order while we can have more than one non-clustered indexes.
304
If we make non-clustered index on a table which has clustered indexes, how does the architecture change? The only change is that the leaf node point to clustered index key. Using this clustered index key can then be used to finally locate the actual data. So the difference is that leaf node has pointers while in the next half it has clustered keys. So if we create non-clustered index on a table which has clustered index it tries to use the clustered index.
305
Note: - Before reading this you should have all the answers of the pervious section clear. Especially about extent, pages and indexes. DECLARE @ID int, @IndexID int, @IndexName varchar(128) -- input your table and index name SELECT @IndexName = 'AK_Department_Name' SET @ID = OBJECT_ID('HumanResources.Department') SELECT @IndexID = IndID FROM sysindexes WHERE id = @ID AND name = @IndexName --run the DBCC command DBCC SHOWCONTIG (@id, @IndexID) Just a short note here DBCC i.e. Database consistency checker is used for checking heath of lot of entities in SQL Server. Now here we will be using it to see index health. After the command is run you will see the following output. You can also run DBCC SHOWSTATISTICS to see when was the last time the indexes rebuild.
307
Pages Scanned
The number of pages in the table (for a clustered index) or index.
308
Extents Scanned
The number of extents in the table or index. If you remember we had said in first instance that extent has pages. More extents for the same number of pages the higher will be the fragmentation.
Extent Switches
The number of times SQL Server moves from one extent to another. More the switches it has to make for the same amount of pages, the more fragmented it is.
309
(DB) How do you reorganize your index, once you find the problem?
You can reorganize your index using DBCC DBREINDEX. You can either request a particular index to be re-organized or just re-index the all indexes of the table. This will re-index your all indexes belonging to HumanResources.Department. DBCC DBREINDEX ([HumanResources.Department]) This will re-index only AK_Department_Name. DBCC DBREINDEX ([HumanResources.Department],[AK_Department_Name]) This will re-index with a fill factor. DBCC DBREINDEX ([HumanResources.Department],[AK_Department_Name],70) You can then again run DBCC SHOWCONTIG to see the results.
What is Fragmentation?
310
Splits have been covered in the first questions. But one other big issue is fragmentation. When database grows it will lead to splits, but what happens when you delete something from the databaseHeHeHe life has lot of turns right. Ok lets say you have two extents and each have two pages with some data. Below is a graphical representation. Well actually thats now how things are inside but for sake of clarity lot of things have been removed.
Now over a period of time some Extent and Pages data undergo some delete. Heres the modified database scenario. Now one observation you can see is that some pages are not removed even when they do not have data. Second If SQL server wants to fetch all Females it has to span across to two extent and multiple pages within them. This is called as Fragmentation i.e. to fetch data you span across lot of pages and extents. This is also termed as Scattered Data.
What if the fragmentation is removed, you only have to search in two extent and two pages. Definitely this will be faster as we are spanning across less entities.
311
312
DBCC INDEXFRAG: - This is not the effective way of doing fragmentation it only does fragmenting on the leaf nodes.
What are the criteria you will look in to while selecting an index?
Note: - Some answers what I have got for this question. I will create index wherever possible. I will create clustered index on every table. How often the field is used for selection criteria. For example in a Customer table you have CustomerCode and PinCode. Most of the searches are going to be performed on CustomerCode so its a good candidate for indexing rather than using PinCode. In short you can look in to the WHERE clauses of SQL to figure out if its a right choice for indexing.
313
If the column has higher level of unique values and is used in selection criteria again is a valid member for creating indexes. If Foreign key of table is used extensively in joins (Inner, Outer, and Cross) again a good member for creating indexes. If you find the table to be highly transactional (huge insert, update and deletes) probably not a good entity for creating indexes. Remember the split problems with Indexes. You can use the Index tuning wizard for index suggestions.
314
It will alert for giving you all trace file details for instance the Trace Name, File where to save. After providing the details click on Run button provided below. I have provided the file name of the trace file as Testing.trc file.
315
HUH and the action starts. You will notice that profiler has started tracing queries which are hitting SQL Server and logging all those activities in to the Testing.trc file. You also see the actual SQL and the time when the SQL was fired.
Let the trace run for some but of time. In actually practical environment I run the trace for almost two hours in peak to capture the actual load on server. You can stop the trace by clicking on the red icon given above.
You can go the folder and see your .trc file created. If you try to open it in notepad you will see binary data. It can only be opened using the profiler. So now that we have the load file we have to just say to the advisor hey advisor heres my problem (trace file) can you suggest me some good indexes to improve my database performance.
316
In order to go to Database Tuning Advisor you can go from Tools Database Tuning Advisor.
In order to supply the work load file you have to start a new session in Database tuning advisor.
After you have said New Session you have to supply all details for the session. There are two primary requirements you need to provide to the Session: Session Name
317
Work Load File or Table (Note you can create either a trace file or you can put it in SQL Server table while running the profiler).
I have provided my Testing.trc file which was created when I ran the SQL profiler. You can also filter for which database you need index suggestions. At this moment I have checked all the databases. After all the details are filled in you have to click on Green icon with the arrow. You can see the tool tip as Start analysis in the image below.
While analyzing the trace file it performs basic four major steps: Submits the configuration information. Consumes the Work load data (that can be in format of a file or a database table).
318
Start performing analysis on all the SQL executed in the trace file. Generates reports based on analysis. Finally give the index recommendations.
You can see all the above steps have run successfully which is indicated by 0 Error and 0 Warning.
Now its time to see what index recommendations SQL Server has provided us. Also note it has included two new tabs after the analysis was done Recommendations and Reports. You can see on AdventureWorks SQL Server has given me huge recommendations. Example on HumanResources.Department he has told me to create index on PK_Department_DepartmentId.
319
In case you want to see detail reports you can click on the Reports tab and there are wide range of reports which you can use to analyze how you database is performing on that Work Load file.
320
Note: - The whole point of putting this all step by step was that you have complete understanding of how to do automatic index decision using SQL Server. During interview one of the questions that is very sure How do you increase speed performance of SQL Server? and talking about the Index Tuning Wizard can fetch you some decent points.
321
Click on the ICON in SQL Server management studio as shown in figure below.
In bottom window pane you will see the complete break up of how your SQL Query will execute. Following is the way to read it: Data flows from left to right. Any execution plan sums to total 100 %. For instance in the below figure it is 18 + 28 + 1 + 1 + 52. So the highest is taken by Index scan 52 percent. Probably we can look in to that logic and optimize this query. Right most nodes are actually data retrieval nodes. I have shown them with arrows the two nodes. In below figure you can see some arrows are thick and some are thin. More the thickness more the data is transferred. There are three types of join logic nested join, hash join and merge join.
322
If you move your mouse gently over any execution strategy you will see a detail breakup of how that node is distributed.
323
(DB)What is nested join, hash join and merge join in SQL Query plan?
A join is whenever two inputs are compared to determine and output. There are three basic types of strategies for this and they are: nested loops join, merge join and hash join. When a join happens the optimizer determines which of these three algorithms is best to use for the given problem, however any of the three could be used for any join. All of the costs related to the join are analyzed the most cost efficient algorithm is picked for use. These are in-memory loops used by SQL Server.
324
Nested Join If you have less data this is the best logic. It has two loops one is the outer and the other is the inner loop. For every outer loop, its loops through all records in the inner loop. You can see the two loop inputs given to the logic. The top index scan is the outer loop and bottom index seek is the inner loop for every outer record.
Its like executing the below logic:For each outer records For each inner records Next Next So you visualize that if there fewer inner records this is a good solution. Hash Join Hash join has two input Probe and Build input. First the Build input is processed and then the Probe input. Which ever input is smaller is the Build input. SQL Server first builds a hash table using the build table input. After that he loops through the probe input and finds the matches using the hash table created previously using the build table and does the processing and gives the output.
325
Merge Join In merge joins both the inputs are sorted on the merge columns. Merge columns are determined depending on the inner join defined in SQL. Since each input join is sorted merge join takes input and compares for equality. If there is equality then matching row is produced. This is processed till the end of rows.
Nested joins best suited if the table is small and its a must the inner table should have an index. Merge joins best of large tables and both tables participating in the joins should have indexes. Hash joins best for small outer tables and large inner tables. Not necessary that tables should have indexes, but would be better if outer table has indexes. Note: - Previously we have discussed about table scan and index scan do revise it which is also important from the aspect of reading query plan.
Note :- It's difficult to cover complete aspect of RAID in this book.It's better to take some decent SQL SERVER book for in detail knowledge , but yes from interview aspect you can probably escape with this answer.
328
What is ACID?
ACID is a set of rule which are laid down to ensure that Database transaction is reliable. Database transaction should principally follow ACID rule to be safe. ACID is an acronym which stands for: Atomicity A transaction allows for the grouping of one or more changes to tables and rows in the database to form an atomic or indivisible operation. That is, either all of the changes occur or none of them do. If for any reason the transaction cannot be completed, everything this transaction changed can be restored to the state it was in prior to the start of the transaction via a rollback operation. Consistency Transactions always operate on a consistent view of the data and when they end always leave the data in a consistent state. Data may be said to be consistent as long as it conforms to a set of invariants, such as no two rows in the customer table have the same customer id and all orders have an associated customer row. While a transaction executes these invariants may be violated, but no other transaction will be allowed to see these inconsistencies, and all such inconsistencies will have been eliminated by the time the transaction ends. Isolation To a given transaction, it should appear as though it is running all by itself on the database. The effects of concurrently running transactions are invisible to this transaction, and the effects of this transaction are invisible to others until the transaction is committed. Durability Once a transaction is committed, its effects are guaranteed to persist even in the event of subsequent system failures. Until the transaction commits, not only are any changes made
329
by that transaction not durable, but are guaranteed not to persist in the face of a system failure, as crash recovery will rollback their effects. The simplicity of ACID transactions is especially important in a distributed database environment where the transactions are being made simultaneously.
What is Begin Trans, Commit Tran, Rollback Tran and Save Tran?
Begin Tran: - Its a point which says that from this point onwards we are starting the transaction. Commit Tran: - This is a point where we say we have completed the transaction. From this point the data is completely saved in to database. Rollback Tran: - This point is from where we go back to the start point that i.e. Begin Tran stage. Save Tran: - Its like a bookmark for rollback to come to some specified state. When we say rollback Tran we go back directly to Begin Tran, but what if we want to go back to some specific point after Begin Tran. So Save Tran is like book marks which can be used to come back to that state rather than going directly to the start point.
There are two paths defined in the transaction one which rollbacks to the main state and other which rollbacks to a tran1. You can also see tran1 and tran2 are planted in multiple places as book mark to roll-back to that state.
330
Brushing up the syntaxes To start a transaction BEGIN TRAN Tran1 Creates a book point SAVE TRAN PointOne This will roll back to point one ROLLBACK TRAN PointOne This commits complete data right when Begin Tran point COMMIT TRAN Tran1
No. If in case developer forgets to shoot the Commit Tran it can open lot of transactions which can bring down SQL Server Performance.
What is Concurrency?
In multi-user environment if two users are trying to perform operations (Add, Modify and Delete) at the same time is termed as Concurrency. In such scenarios there can be lot of conflicts about the data consistency and to follow ACID principles.
For instance the above figure depicts the concurrency problem. Mr X started viewing Record1 after some time MR Y picks up Record1 and starts updating it. So Mr X is viewing data which is not consistent with the actual database.
332
In our first question we saw the problem above is how locking will work. Mr. X retrieves Record1 and locks it. When Mr Y comes in to update Record1 he can not do it as its been locked by Mr X. Note: - What I have showed is small glimpse, in actual situations there are different types of locks we will going through each in the coming questions.
333
Dirty Read occurs when one transaction is reading a record which is part of a half finished work of other transaction. Above figure defines the Dirty Read problem in a pictorial format. I have defined all activities in Steps which shows in what sequence they are happening (i.e. Step1, Step 2 etc).
334
Step1: -Mr. Y Fetches Record which has Value=2 for updating it. Step2:- In mean time Mr. X also retrieves Record1 for viewing. He also sees it as Value=2.
Step3:- While Mr. X is viewing the record, concurrently Mr. Y updates it as Value=5. Boom the problem Mr. X is still seeing it as Value=3, while the actual value is 5.
In every data read if you get different values then its an Unrepeatable Read problem. Lets try to iterate through the steps of the above given figure: Step1:- Mr. X get Record and sees Value=2. Step2:- Mr. Y meantime comes and updates Record1 to Value=5.
335
Step3:- Mr. X again gets Record1 ohh... values are changed 2 Confusion.
If UPDATE and DELETE SQL statements seems to not affect the data then it can be Phantom Rows problem.
336
Step1:- Mr. X updates all records with Value=2 in record1 to Value=5. Step2:- In mean time Mr. Y inserts a new record with Value=2.
Step3:- Mr. X wants to ensure that all records are updated, so issues a select command for Value=2.surprisingly find records which Value=2
So Mr. X thinks that his UPDATE SQL commands are not working properly.
Lost Updates are scenario where one updates which is successfully written to database is over-written with other updates of other transaction. So lets try to understand all the steps for the above figure: Step1:- Mr. X tries to update all records with Value=2 to Value=5. Step2:- Mr. Y comes along the same time and updates all records with Value=5 to Value=2.
337
Step3 :- Finally the Value=2 is saved in database which is inconsistent according to Mr. X as he thinks all the values are equal to 2.
338
the Update locks to Exclusive locks then Update locks are used. Update locks are compatible with Shared locks. Ok just to give a brief of how the above three locks will move in actual environment. Below is the figure which shows sequence of SQL steps executed and the locks they are trying to acquire on it.
Step1:- First transaction issues a SELECT statement on the resource, thus acquiring a Shared Lock on the data. Step2:- Second transaction also executes a SELECT statement on the resource which is permitted as Shared lock is honored by Shared lock.
339
Step3:- Third transaction tries to execute an Update SQL statement. As its a Update statement it tries to acquire an Exclusive. But because we already have a Shared lock on it, it acquires a Update lock. Step4:- The final transaction tries to fire Select SQL on the data and try to acquire a Shared lock. But it can not do until the Update lock mode is done. So first Step4 will not be completed until Step3 is not executed. When Step1 and Step2 is done Step3 make the lock in to Exclusive mode and updates the data. Finally Step4 is completed. Intent Locks: - When SQL Server wants to acquire a Shared lock or an Exclusive lock below the hierarchy you can use Intent locks. For instance one of the transactions has acquired as table lock and you want to have row level lock you can use Intent locks. Below are different flavors of Intent locks but with one main intention to acquire locks on lower level:! Intent locks include: ! Intent shared (IS) ! Intent exclusive (IX) ! Shared with intent exclusive (SIX) ! Intent update (IU) ! Update intent exclusive (UIX) ! Shared intent update (SIU) Schema Locks: - Whenever you are doing any operation which is related to Schema operation this lock is acquired. There are basically two types of flavors in this :! Schema modification lock (Sch-M):- Any object structure change using ALTER, DROP, CREATE etc will have this lock. ! Schema stability lock (Sch-S) This lock is to prevent Sch-M locks. These locks are used when compiling queries. This lock does not block any transactional locks, but when the Schema stability (Sch-S) lock is used, the DDL operations cannot be performed on the table.
340
Bulk Update locks:-Bulk Update (BU) locks are used during bulk copying of data into a table. For example when we are executing batch process in midnight over a database. Key-Range locks: - Key-Range locks are used by SQL Server to prevent phantom insertions or deletions into a set of records accessed by a transaction. ! RangeI_S ! RangeI_U ! RangeI_X ! RangeX_S ! RangeX_U
Note: - By default SQL Server has READ COMMITTED Isolation level. Read Committed Any Shared lock created using Read Committed will be removed as soon as the SQL statement is executed. So if you are executing several SELECT statements using Read Committed and Shared Lock, locks are freed as soon as the SQL is executed. But when it comes to SQL statements like UPDATE / DELETE AND INSERT locks are held during the transaction. With Read Committed you can prevent Dirty Reads but Unrepeatable and Phantom still occurs. Read Uncommitted This Isolation level says do not apply any locks. This increases performance but can introduces Dirty Reads. So why is this Isolation level in existence?. Well sometimes when you want that other transaction do not get affected and you want to draw some blurred report , this is a good isolation level to opt for. Repeatable Read This type of read prevents Dirty Reads and Unrepeatable reads. Serializable Its the king of everything. All concurrency issues are solved by using Serializable except for Lost update. That means all transactions have to wait if any transaction has a Serializable isolation level. Note: - Syntax for setting isolation level:SET TRANSACTION ISOLATION LEVEL <READ COMMITTED|READ UNCOMMITTED|REPEATABLE READ|SERIALIZABLE>
What is a Deadlock ?
Deadlocking occurs when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server ends the deadlock by automatically choosing one and aborting the process, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user of the aborted process. Generally, the transaction that requires the least amount of overhead to rollback is the transaction that is aborted.
If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK. Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often. If appropriate, use as low of isolation level as possible for the user connection running the transaction. Consider using bound connections.
344