Analysis of SQL Count Queries

Anne Futrell-2001 SIST Intern

The sql analysis for the count query is similar to that of the date query, therefore you may refer to the page statement.html for more details. The count query outputs the project ids of Analysis Project 1, and Analysis Project 2, and the count of the files which share the same file_ids in the two analysis projects.The project ids are constrained between the values APlo to APmid and APlo to APhi. We limit the project ids in order to view a smaller segment of the data files rather than the whole. In addition, the data type is defined as reconstructed. We have written 2 different queries that all implement the same task described above. The format of the query, countquery, was taken from the fastest and smallest load query in the date query analysis, query3.

COUNTQUERYORIG

select AP1.project_id, AP2.project_id, count(PF1.File_id) from project_files PF1, project_files PF2, analysis_projects AP1, analysis_projects AP2, data_files DF1, data_files DF2 where (PF1.proj_snap_id = AP1.proj_snap_id AND PF2.proj_snap_id = AP2.proj_snap_id AND PF1.File_id=PF2.File_id) AND (PF1.file_id = DF1.File_id and PF2.File_id = DF2.File_id) AND (DF1.data_tier = 'reconstructed' AND DF2.data_tier = 'reconstructed') AND (AP1.project_id between 10000 and 10010 AND AP2.Project_id between 10000 and 24000 AND AP2.Project_id > AP1.project_id) group by AP1.project_id, AP2.project_id;

Countqueryorig does not use a nested SELECT statement.The query utilizes joins to connect the tables: analysis_ project, project_files, and data_files.

==========================================================================

COUNTQUERY

select AP1.project_id, AP2.project_id,count(PF1.File_id) from analysis_projects AP1, analysis_projects AP2, project_files PF1, project_files PF2 where AP1.project_id in (select AP1.project_id from project_files PF1, analysis_projects AP1, data_files DF1 where (AP1.proj_snap_id =PF1.proj_snap_id) AND (PF1.file_id = DF1.file_id) AND DF1.data_tier = 'reconstructed' AND AP1.project_id between 10000 and 10010 ) AND AP2.project_id in (select AP2.project_id from project_files PF2, analysis_projects AP1 , data_files DF2 where (AP2.proj_snap_id=PF2.proj_snap_id) AND (PF2.file_id = DF2.file_id) AND DF2.data_tier = 'reconstructed' AND AP2.project_id between 10000 and 24000 ) AND PF2.File_id = PF1.File_id AND AP1.proj_snap_id =PF1.proj_snap_id AND AP2.proj_snap_id=PF2.proj_snap_id AND AP2.project_id > AP1.project_id group by AP1.project_id,AP2.project_id;

Countquery uses two nested SELECT statements in the WHERE statement. The SELECT statements in this case are used as control variables for AP1.project_id and AP2.project_id.

==========================================================================

The following timetable gives the system time in seconds for Countqueryorig and Countquery at different APlo, APmid and APhi values. APlo to APhi gives the whole range, and APlo to APmid gives us a range in which we can take the average over the quantities we look at APlo to APhi.

Table of time
Countqueryorig Countquery
APlo-APmid-APhi CountqueryorigCountquery
10000-10010-13000 0.777s 0.884s
10000-10010-15000 0.802s 0.767s
10000-10010-17000 0.885s 0.915s
10000-10010-19000 1.285s 1.076s
10000-10010-24000 1.309s 1.277s

After analyzing the times, and performing the Oracle's explain statement for the 2 queries, we have seen little difference between the performance of the two. Since the two queries are equivalent, we will use the query in which we have worked with extensively, countqueryorig.