Entropy–Distance Approach to Evaluating Diversity and Robustness in Organizational Information Retrieval
Aubakirov S. Akhmetov I. Krassovitsky A. Gelbukh A.
2025Instituto Politecnico Nacional
Computacion y Sistemas
2025#29Issue 42449 - 2470 pp.
Information retrieval constitutes a critical component of organizational information management, directly affecting the efficiency, accuracy, and resilience of decision-making processes. Conventional evaluation metrics—such as precision or click-through rates—do not adequately capture the lexical and semantic diversity of retrieved content, limiting their utility in managerial contexts where both relevance and variety are essential. This study introduces a scalable, language-agnostic entropy–distance framework designed to assess the robustness of retrieval systems under controlled linguistic variation. The framework integrates Shannon entropy, to quantify lexical diversity, with semantic dispersion measures derived from SBERT embeddings, enabling joint evaluation of breadth and coherence in search outputs. Using a curated 6.6M-article Wikipedia corpus, topics were clustered, summarized, and reformulated into paraphrased queries, which were executed across Google, Bing, and DuckDuckGo. The resulting outputs reveal significant differences in diversity–coherence trade-offs between platforms, with DuckDuckGo exhibiting the highest adaptability to query variation. The proposed methodology supports information governance by providing an unsupervised, reproducible metric that enables comparative auditing of search performance in enterprise and public domains. The findings offer actionable insights for optimizing retrieval strategies, mitigating systemic bias, and enhancing the resilience of organizational search infrastructures.
Entropy , information retrieval , paraphrase robustness , query variability , search engine evaluation , semantic dispersion
Text of the article Перейти на текст статьи
Kazakh-British Technical University, Kazakhstan
Institute of Information and Computational Technologies, Kazakhstan
Instituto Politécnico Nacional, CIC, Mexico
Kazakh-British Technical University
Institute of Information and Computational Technologies
Instituto Politécnico Nacional
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026