DEVELOPMENT OF THE COMBINED METHOD OF IDENTIFICATION OF NEAR DUPLICATES IN ELECTRONIC SCIENTIFIC WORKS


Lizunov P. Biloshchytskyi A. Kuchansky A. Andrashko Y. Biloshchytska S. Serbin O.
31 August 2021Technology Center

Eastern-European Journal of Enterprise Technologies
2021#4Issue 4(112)57 - 63 pp.

The methods for identification of near-duplicates in electronic scientific papers, which include the content of the same type, for example, text data, mathematical formulas, numerical data, etc. were described. For text data, the method of locally sensitive hashing with the finding of Hamming distance between the elements of indices of electronic scientific papers was formalized. If Hamming distance exceeds a fixed numerical threshold, a scientific paper contains a near-duplicate. For numerical data, sub-sequences for each scientific work are formed and the proximity between the papers is determined as the Euclidian distance between the vectors consisting of the numbers of these sub-sequences. To compare mathematical formulas, the method for comparing the sample of formulas is used and the names of variables are compared. To identify near-duplicates in graphic information, two directions are separated: finding key points in the image and applying locally sensitive hashing for individual pixels of the image. Since scientific papers often include such objects as schemes and diagrams, subscriptions to them are examined separately using the methods for comparing text information. The combined method for identification of near-duplicates in electronic scientific papers, which combines the methods for identification of near-duplicates of various types of data, was proposed. To implement the combined method for the identification of near-duplicates in electronic scientific papers, an information-analytical system that processes scientific materials depending on the content type was devised. This makes it possible to qualitatively identify near-duplicates and as widely as possible identify possible abuses and plagiarism in electronic scientific papers: scientific articles, dissertations, monographs, conference materials, etc

Antiplagiarism system , Electronic scientific paper , Locally sensitive hashing , Near-duplicate

Text of the article Перейти на текст статьи

Department of Fundamentals of Informatics, Kyiv National University of Construction and Architecture, Povitroflotskyi ave.,31, Kyiv, 03037, Ukraine
Astana IT University, Mangilik Yel ave., EXPO Business Center, Block C.1., Nur-Sultan, 010000, Kazakhstan
Department of Information Systems and Technologies, Taras Shevchenko National University of Kyiv, Volodymyrska str.,60, Kyiv, 01033, Ukraine
Department of System Analysis and Optimization Theory, Uzhhorod National University, Narodna sq.,3, Uzhhorod, 88000, Ukraine
Department of Intelligent and Information Systems, Taras Shevchenko National University of Kyiv, Volodymyrska str.,60, Kyiv, 01033, Ukraine
Maksymovych Scientific Library, Taras Shevchenko National University of Kyiv, Volodymyrska str.,60, Kyiv, 01033, Ukraine

Department of Fundamentals of Informatics
Astana IT University
Department of Information Systems and Technologies
Department of System Analysis and Optimization Theory
Department of Intelligent and Information Systems
Maksymovych Scientific Library

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026