Evaluation of Sub Query Performance in SQL Server

. The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012. Keywords: Sub query, indexing, performance, SQL Server


Introduction
Business application relies on a database management system (DBMS) for retrieving and storing data, which its overall performance depend on. For this matter, it must be administered and optimized for better performance according to business application needs, as well as quick in handling all kind of performance threats. Its performance influenced by several factors i.e.: growing database size proportional with the data, increasing user base, increasing user processes, improperly and un-tuned system [1]. These factors must be maintained according to business application needs to stabilizing the overall performance of each request to DBMS.
One of the major critical issues that often happened in a company is inadequate performance of queries used for the suitable output. Many factors causing this issue, one of them is query processing problem. Since then, a significant amount of research and observation has been done to find an efficient solution for processing queries. A query may be expensive in cost of execution if it is not optimized well [2]. This will give a negative impact on the performance of business application, hence reducing business performance. The degradation of performance can be detected by performing monitoring at a timely basis on system performance parameter.
In the first generation database management systems, the low level procedural query language is embedded in a high level programming language and the programmer's should select the most appropriate execution strategy. In contrast, with declarative languages such as Structured Query Language (SQL), the user specifies what data is required rather than how it is to retrieved [3]. This pattern transforms the user responsibility from determine method to support good execution strategy. The most important objectives to be considered in order to improve the performance of DBMS are: designing an efficient data schema, optimizing indexes, analyzing execution plans, monitoring access to data, and optimizing query [4].
The object of this study is to present the comparative performance of sub query methods using Microsoft SQL server 2012 by using large data according to experimental method.

Methodologies
This study follows experimental method i.e. generate model, simulate the model, record and then analyze the record, we use this method because the performance measurements is a significant issue [5]. It is based on operational data of student attendances in a laboratory that consists of 6 columns and 500,000 records. The table consists of record id, class id, student id; attend date time and place, status, and record date that save all students attendances with attendance updates as a new record. The queries for experiment must produce the latest status of all students from the attendances table. The experiment uses several query variation by using sub query method: in, exists, relation operator and relation operator with top operator, and the table used in experiment use two indexing scenario [6]. In the first scenario, the table uses indexing in record id column, the second one uses additional indexing in class id, student id include record date, and also in class id, student id, include status and record date. For the analysis, each query and indexing scenario is executed 10 times on Microsoft SQL Server 2012 and each execution's server processing time is recorded using client statistics feature provided by SQL Server Management Studio.

Query Execution
Business application retrieves and stores data from DBMS using query. The query operations used by the application consist of select, insert, update, and delete. Each operation is run by DBMS then the result is transfer to business application. All data in DBMS are stored in file on physical device such as disk device. Assume one file consists of many records and user want to retrieve a single records based on a particularly criteria [3], the disk device is able to going directly in the middle of a file to retrieve the record. To accomplish that, the system need time to move the disk device and retrieve the requested record using many procedures compilation in DBMS. To reduce the time needed by the system, the DBMS must be tuned according to the executed query.
The process of DBMS manages query are: after receiving query from external level, the query are checked semantically and syntactically by the query parser. If there are violation of structure, user right, or procedure; an error message will be return, otherwise query will be translated into internal level in relational algebra expression. After that, query optimizer selects appropriate optimal method to implement relational algebra and generate query execution plan then executed [7]. This DBMS's process can be seen at Figure 1.
The query execution plan is a compiled code that contains the ordered steps to carry out the query [8].
Identifying an appropriate plan for execution is very important because this will determine the effectiveness of the execution. By using statistics on tables and indexes, the optimizer predicts the cost of alternative access methods to resolve a particular query.
Queries in algebra are constructed using operators. Each relational query describes a step by step procedure to compute the desired output, based on the order in which operators are applied [9]. There are many variations of the operations included in relational algebra. The fundamental operations in relational algebra are selection, projection, Cartesian product, union, set difference, join, intersection, and division operations. The selection and projection operations are unary operations, which operate only on one relation. The other operations work on pairs of relations is therefore called binary operations [3].

Figure 1
Process of Query Execution [7] As one of the binary operations, a sub query is a query that nested somewhere inside divided into clustered index and non clustered index. The clustered indexes are recommended only for tables that are frequently updated. Clustered type indexes are effective when operators like BETWEEN, >, >=, <, <=, <>, and !=. Non clustered indexes usage is recommended only for databases where updates are infrequent and gives the optimal solution for the "exact match" [10]. Many people wonders which attributes are suitable for indexing to be applied for getting better performance. Gilenson [6] states there are two sorts of possibilities: primary keys, and search attributes.
Indexes are powerful method for improving searching time, but we should keep in mind that when the record in an indexed table is modified, the system must take the time to update the table's indexes too. If user updates a lot of data, the time that it takes to execute the updates operation and update all the indexes could slow down the operations that are just trying to read the data for applications, and also degrading query response time. Hence that fact, user should beware when applied index in query. The placement of index must be precise with the necessity and procedures.
In transact SQL statements, there is usually no procedure that regulate when to apply a sub query or that does not, because it is not difference between them. User does not concern how to retrieve the data, but how many times the execution occurred. However, in some cases where the data to be returned numerous or the conditions from the query are very complicated, it caused the performance query is going to down. In case of query optimization, it is impractical to search evaluation plans exhaustively, when the optimization of query involves many relations [11].

RESULTS
The  All the query in table 2 is used to retrieve the last student attendance record that represented by inserted date of each data. The first to third query is using an aggregate function MAX to return the latest record of student attendance having max inserted date (last inserted). For the fourth query, the query change the MAX function with TOP with the same purpose as the other query. The result of the experiments can be seen in table 3 and table 4.

CONCLUSION 5
Optimization of sub query can be done by applying indexing strategy to the table according to condition used in queries. For business application that require fast processing time in retrieving data from database than processing time in storing date, it's better to applied indexing in all table used by the application, and also all the sub query better use relational operator. Conversely, for business application that required fast processing time in storing data than fast processing time in retrieving data, it's better to not applied indexing and for sub query better use IN or EXISTS, since if indexing applied, the processing time to storing data will increased in each INSERT operation which also automatically update index in the related table.