Abstract.tex

%   Copyright 2016 Ahmet Arslan
%
%   Licensed under the Apache License, Version 2.0 (the "License");
%   you may not use this file except in compliance with the License.
%   You may obtain a copy of the License at
%
%       http://www.apache.org/licenses/LICENSE-2.0
%
%   Unless required by applicable law or agreed to in writing, software
%   distributed under the License is distributed on an "AS IS" BASIS,
%   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
%   See the License for the specific language governing permissions and
%   limitations under the License.

\newpage
\addcontentsline{toc}{chapter}{ABSTRACT}
\begin{spacing}{1.2}
\begin{center}
\textbf{ABSTRACT} \vspace{4mm}\\
ANALYSIS OF THE FREQUENCY DISTRIBUTIONS OF QUERY TERMS ON DOCUMENT COLLECTIONS \& PER-QUERY SELECTION OF BEST TERM-WEIGHTING MODEL\\
\vspace{4mm}
Ahmet ARSLAN
\vspace{4mm} \\
Department of Computer Engineering \\
Anadolu University, Graduate School of Science, August, 2016 \\
\vspace{4mm}
Supervisor: Assoc. Prof. Dr. Bekir Taner D\.{I}N\c{C}ER \\
\end{center}
\vspace*{-1mm}

Many term-weighting models have been proposed for information retrieval but the effectiveness of each term-weighting model varies across queries (i.e., information needs of users). 
Thus, using a single term-weighting model to process all kinds of queries may not be appropriate for fulfilling every information need of users.
Instead of using a single term weighting model, it is an empirical fact that using different term weighting models for different queries could provide an increase in information retrieval effectiveness by an order of magnitude.
However, for any given query, automatically selecting the term-weighting model that could provide the highest achievable retrieval effectiveness in the current state-of-the-art of information retrieval technology is still an open and challenging research problem.
This issue is, in general, referred to as \emph{selective term weighting} or \emph{selective weighting function} or \emph{selective retrieval model} in the field of selective information retrieval.
In this PhD dissertation, we will investigate a novel statistical/probabilistic approach to the \emph{selective term weighting} problem, based on the frequency distributions of query terms on document collections.

A term-weighting model that works well for one query, may not work well for another. 
We are not capable of determining or justifying in advance the best term-weighting model to use with a given query.
We know little of the characteristics of queries and document collections that affect the effectiveness of term-weighting models.
This PhD dissertation aims to shed some light on this mystery by analyzing the frequency distributions of query terms on document collections.

All the results presented in this dissertation are fully repeatable and reproducible with data and code available online.\\

\setlength\leftmargini{2.5cm}
\begin{enumerate}[label=\noindent{\textbf{Keywords:}}]
  \item Chi-Square Goodness-of-Fit Test, Index Term Weighting, Frequency Distribution, Robustness of Retrieval Effectiveness, Selective Information Retrieval.
\end{enumerate}

\end{spacing}