改进的大数据检索自适应性切换搜索算法

(西安工业大学 计算机科学与工程学院,西安710021

大数据平台; Hadoop; 搜索引擎; 自适应性切换搜索算法

Improved Adaptive Switching Search Algorithm for Big Data Retrieval
WU Yuchen,LIU Pingping,XU Jiangtao

(School of Computer Science and Engineering,Xi'an Technological University,Xi'an 710021,China)

big data platform; Hadoop; search engine; adaptive switching search algorithm

DOI: 10.16185/j.jxatu.edu.cn.2019.06.011 http://xb.xatu.edu.cn

备注

针对搜索引擎搜索索引大小不同时的效率不一致问题,文中在Hadoop分布式计算平台上利用Map-Reduce框架搭建分布式的搜索引擎及查询子系统,生成一种改进的自适应性切换搜索算法,以索引文件实现了高效的检索,索引文件较小时将其直接存入内存,索引文件较大时建立二级索引并读取内存中的索引列表,进行分布式的查询,通过设置足够多的集群的节点数进行测试。测试结果表明:当索引大小达到1 000 MB,搜索时间由原始搜索算法的16.631 s缩短至7.259 s,文中方法对索引文件的搜索效率有显著提高。在索引文件更大的情况下,文中算法的优势也更明显,从而可以为网络论坛、网站以及其他用户提供高效的分布式搜索服务。

A search engine has different efficiency for different search index size.In order to solve this problem,the distributed search engine and query subsystem are built on the Hadoop distributed computing platform,generating an improved adaptive switch search algorithm.The algorithm implements efficient retrieval based on the index size.When the index file is small,it is directly stored in the memory.When the index file is large,the secondary index is established and the index list in the memory is read.Then the distributed query is performed.The number of nodes in multiple clusters is tested.The results show that when the index size reaches 1 000 MB,the search time is reduced to 7.259 s from 16.631 s of the original search algorithm,indicating that the search efficiency of the proposed algorithm is significantly improved.For the much larger index files,the algorithm has obvious advantages,which can provide efficient distributed search services for web forums,websites and other users.