Automatic Extraction of Epidemic-Related Sites in COVID-19 Media Reports of Webpages Based on Conditional Random Field Model
Kuiyun Huang1, Jinming Cao2, Bin Zhao1*
Journal Title: Journal of Clinical Immunology AND Microbiology
Background: Since the outbreak of the COVID-19 in Wuhan, China, in early December 2019, the Chinese government has formed a mode of information disclosure. More than 400 cities have announced specific location information for newly diagnosed cases of novel coronavirus pneumonia, including residential areas or places of stay. We have established a conditional random field model and a rule-dependent model based on Chinese geographical name elements. Taking Guangdong province as an example, the identification of named entities and the automatic extraction of epidemic-related sites are carried out. This method will help locate the spread of the epidemic, prevent and control the spread of the epidemic and gain more time for vaccine clinical trials.
Methods: Based on the presentation form of the habitual place or place of stay of the diagnosed cases in the text of the web page, a conditional random field model is established, and a rule-dependent model is established according to the combination rule of the elements of the place words and the place name dictionary composed of provinces, cities and administrative regions.
Findings: The results of the analysis based on the conditional random field model and the rule-dependent model show that the location of confirmed cases of new coronavirus pneumonia in Guangdong Province in mid-February is mainly concentrated in Guangzhou, Shenzhen, Zhuhai and Shantou cities. In Guangzhou, Futian district has more epidemic sites and Huangpu and Conghua district has fewer epidemic sites. Government officials in Guangzhou City should pay attention to Futian District.
Interpretation: Governments at all levels in Guangzhou Province have intervened to control the epidemic through various means in mid-February. According to the results of the model analysis, we believe that the administrative regions with more diagnosed locations should focus on and take measures such as blockades and control of personnel flow to control the disease in those administrative regions to avoid affecting other adjacent administrative regions.