Big Data Analytics Recommendation Solutions for Crop Disease using Hive and Hadoop Platform
Raghu Garg and Himanshu Aggarwal
Journal Title:Indian Journal of Science and Technology
Objective: With the digital advancements in the field of agriculture, a large amount of data is being produced constantly as a result agriculture data has entered the world of big data. Statistical Analysis: As the development processes, the requirement of the parallel computing, compatible data management infrastructure and novel analytics paradigm to extract information from huge amounts are also increases. A single machine cannot store and analyze this large amount of data. The polynomial time required to access this kind of large data. Findings: The solution to store and analyze such massive amounts of data is big data analytics. In this paper a big data analytics recommendation framework is developed for providing solutions of crop disease based on historical data using Hive and Hadoop. The data is collected from various sources like laboratory reports, agriculture information web pages, and expert recommendation for the developed framework. After the collection of raw data, the irrelevant or the redundant data that is also known as the noise, should be removed. The next step is to extract the features from cleaned data, normalization of data is done in order to remove the technical variations. Once normalization is complete the data is uploaded on HDFS and save in a file that is supported by Hive. Thus classified data is finally located on the specific place. In the next step HiveQL is used to analyze agriculture data based on features and then prioritize the outcome based on crop disease symptoms and in the last a high priority solution is recommended. Application/Improvements: In the paper prioritize outcomes are useful for agriculture officers, researchers to easily understand, and helpful for recommending a solution based on evidence from historical data.