[1]吴雨晨,刘萍萍,徐江涛.改进的大数据检索自适应性切换搜索算法[J].西安工业大学学报,2019,(06):688-695.[doi:10.16185/j.jxatu.edu.cn.2019.06.011 ]
 WU Yuchen,LIU Pingping,XU Jiangtao.Improved Adaptive Switching Search Algorithm for Big Data Retrieval[J].Journal of Xi'an Technological University,2019,(06):688-695.[doi:10.16185/j.jxatu.edu.cn.2019.06.011 ]
点击复制

改进的大数据检索自适应性切换搜索算法()
分享到:

《西安工业大学学报》[ISSN:1673-9965/CN:61-1458/N]

卷:
期数:
2019年06期
页码:
688-695
栏目:
信息科学与控制
出版日期:
2019-12-25

文章信息/Info

Title:
Improved Adaptive Switching Search Algorithm for Big Data Retrieval
文章编号:
1673-9965(2019)06-0688-08
作者:
吴雨晨刘萍萍徐江涛
(西安工业大学 计算机科学与工程学院,西安710021
Author(s):
WU YuchenLIU PingpingXU Jiangtao
(School of Computer Science and Engineering,Xi'an Technological University,Xi'an 710021,China)
关键词:
大数据平台 Hadoop 搜索引擎 自适应性切换搜索算法
Keywords:
big data platform Hadoop search engine adaptive switching search algorithm
分类号:
TP301
DOI:
10.16185/j.jxatu.edu.cn.2019.06.011
文献标志码:
A
摘要:
针对搜索引擎搜索索引大小不同时的效率不一致问题,文中在Hadoop分布式计算平台上利用Map-Reduce框架搭建分布式的搜索引擎及查询子系统,生成一种改进的自适应性切换搜索算法,以索引文件实现了高效的检索,索引文件较小时将其直接存入内存,索引文件较大时建立二级索引并读取内存中的索引列表,进行分布式的查询,通过设置足够多的集群的节点数进行测试。测试结果表明:当索引大小达到1 000 MB,搜索时间由原始搜索算法的16.631 s缩短至7.259 s,文中方法对索引文件的搜索效率有显著提高。在索引文件更大的情况下,文中算法的优势也更明显,从而可以为网络论坛、网站以及其他用户提供高效的分布式搜索服务。
Abstract:
A search engine has different efficiency for different search index size.In order to solve this problem,the distributed search engine and query subsystem are built on the Hadoop distributed computing platform,generating an improved adaptive switch search algorithm.The algorithm implements efficient retrieval based on the index size.When the index file is small,it is directly stored in the memory.When the index file is large,the secondary index is established and the index list in the memory is read.Then the distributed query is performed.The number of nodes in multiple clusters is tested.The results show that when the index size reaches 1 000 MB,the search time is reduced to 7.259 s from 16.631 s of the original search algorithm,indicating that the search efficiency of the proposed algorithm is significantly improved.For the much larger index files,the algorithm has obvious advantages,which can provide efficient distributed search services for web forums,websites and other users.

参考文献/References:


[1] 李德华,巩宇,张自锋,等.基于.Net构建海量非结构文本与用户行为协同的搜索引擎研究[J].软件工程,2018,21(5):46. LI Dehua,GONG Yu,ZHANG Zifeng,et al.Research on Search Engine Based on Net Construction of Massive Unstructured Text and User Behavior[J].Software Engineering,2018,21(05):46.(in Chinese)
[2] 李臣龙,陶皖,窦易文.基于AHP的全文搜索算法优化[J].赤峰学院学报:自然科学版,2018(4):56. LI Chenlong,TAOWan,DOU Yiwen.Optimization of Full-text Search Algorithm Based on AHP[J].Journal of Chifeng University:Natural Science Edition,2018(4):56.(in Chinese)
[3] 夏翠翠,刘梦赤,胡婕.基于信息网模型的Web实体语义信息搜索平台[J].计算机工程,2017,43(3):18. XIA Cuicui,LIU Mengchi,HU Jie.Web Entity Semantic Information Search Platform Based on Information Network Model[J].Computer Engineering,2017,43(3):18.(in Chinese)
[4] 韦美峰,王亚民.基于后缀树聚类的主题搜索引擎研究[J].情报理论与实践,2017(12):123. WEI Meifeng,WANG Yamin.Research on Topic Search Engine Based on Suffix Tree Clustering[J].Information Theory & Practice,2017(12):123.(in Chinese)
[5] 杨小梅,黎斌.海量数据下的特定语义数据检索优化方法研究[J].计算机仿真,2016,33(5):422. YANG Xiaomei,LI Bin.Research on Optimization Method of Specific Semantic Data Retrieval under Massive Data[J].Computer Simulation,2016,33(5):422. (in Chinese)
[6] 郝树魁.Hadoop HDFS和MapReduce架构浅析[J].邮电设计技术,2012(7):37. HAO Shukui.Analysis of Hadoop HDFS and MapReduce Architecture[J].Post and Telecommunications Design Technology,2012(7):37.(in Chinese)
[7] GHAZARIAN S,NEMATBAKHSH M A.Enhancing Memorybased Collaborative Filtering for Group Recommender Systems[J].Expert Systems with Applications,2015,42(7):3801.
[8] LIAO S H.Expert System Methodologies and Applications—A Decade Review from 1995 to 2004[J].Expert Systems with Applications,2005,28(1):93.
[9] SHVACHKO K,KUANG H,RADIA S,et al.The Hadoop Distributed File System[C]// 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies.Washington:IEEE,2010:1.
[10] 管明亮.基于大数据平台的百度式数据检索系统设计与实现[D].成都:电子科技大学,2015. GUAN Mingliang.Design and Implementation of Baidu Data Retrieval System Based on Big Data Platform[D].Chengdu:University of Electronic Science and Technology,2015.(in Chinese)
[11] 刘顺文.基于Hadoop平台的大学生个性化就业推荐系统的构建与研究[D].南昌:东华理工大学,2016. LIU Shunwen.Construction of and Research on College Students'Individualized Employment Recommendation System Based on Hadoop Platform[D].Nanchang:Donghua University of Technology,2016.(in Chinese)
[12] AGGARWALl C C.Neighborhood-Based Collaborative Filtering[M]// Recommender Systems.Berlin:Springer International Publishing,2016.
[13] 吴梦潇.基于Hadoop的分布式云平台搜索系统设计与实现[D].长沙:湖南大学,2016. WU Mengxiao.Design and Implementation of Distributed Cloud Platform Search System Based on Hadoop[D].Changsha:Hunan University,2016. (in Chinese)
[14] 陈皓.Rsync的核心算法[EB/OL].(20120517)[20180705].https://coolshell.cn/articles/7425.html. CHEN Hao.Rsync Core Algorithm[EB/OL].(20120517)[20180705].https://coolshell.cn/articles/7425.html.(in Chinese)
[15] 刘鹏.实战Hadoop[M].北京:电子工业出版社,2011. LIU Peng.Actual Combat Hadoop[M].Beijing:Publishing House of Electronics Industry,2011. (in Chinese)
[16] 刘文娟.基于Hadoop的文件同步存储系统的设计与实现[D].成都:电子科技大学,2012. LIU Wenjuan.Design and Implementation of File Synchronous Storage System Based on Hadoop [D].Chengdu:University of Electronic Science and Technology of China,2012.(in Chinese)

相似文献/References:

[1]洪 波,曹子建.基于Hadoop的分布式入侵检测系统设计与实现[J].西安工业大学学报,2018,(04):390.
 HONG Bo,CAO Zijian.Design and Implement of Distributed Intrusion Detection System Based on Hadoop[J].Journal of Xi'an Technological University,2018,(06):390.

备注/Memo

备注/Memo:
收稿日期:2019-09-03
基金资助:新型网络与检测控制国家地方联合工程实验室基金项目(GSYSJ20170009)。 第一作者简介:吴雨晨(1994-),女,西安工业大学硕士研究生。 通信作者简介:刘萍萍(1971-),女,西安工业大学副教授,主要研究方向为人工智能,E-mail:1341369601@qq.com。(编辑、校对 肖 晨)
更新日期/Last Update: 2019-12-25