用Python实现针对英文论文的词频分析

阅读: 评论:0

用Python实现针对英文论文的词频分析

用Python实现针对英文论文的词频分析

有时候看英文论文,高频词汇是一些术语,可能不太认识,因此我们可以先分析一下该论文的词频,对于高频词汇可以在看论文之前就记住其意思,这样看论文思路会更顺畅一些,接下来就讲一下如何用python输出一篇英文论文的词汇出现频次。

首先肯定要先把论文从PDF版转为txt格式,一般来说直接转会出现乱码,建议先转为Word格式,之后再复制为txt文本格式。

接下来附上含有详细注释的代码

#论文词频分析
#You should convert the file to text format#Read the text and save all the words in a list
def readtxt(filename):fr = open(filename, 'r')wordsL = []#use this list to save the wordsfor word in fr:word = word.strip()word = word.split()wordsL = wordsL + wordfr.close()return wordsL#count the frequency of every word and store in a dictionary
#And sort dictionaries by value from large to small
def count(wordsL):wordsD = {}for x in wordsL:#move these words that we don't needif Judge(x):continue#countif not x in wordsD:wordsD[x] = 1wordsD[x] += 1#Sort dictionaries by value from large to smallwordsInorder = sorted(wordsD.items(), key=lambda x:x[1], reverse = True)return wordsInorder#juege whether the word is that we want to move such as punctuation or letter
#You can modify this function to move more words such as number
def Judge(word):punctList = [' ','t','n',',','.',':','?']#juege whether the word is punctuationletterList = ['a','b','c','d','m','n','x','p','t']#juege whether the word is letterif word in punctList:return Trueelif word in letterList:return Trueelse:return False#Read the file and output the file 
filename = 'F:\python\'
wordsL = readtxt(filename)
words = count(wordsL)
fw = open('F:\python\Words In ','w')
for item in words:fw.write(item[0] + ' ' + str(item[1]) + 'n')
fw.close()

 

本文发布于:2024-02-04 22:03:45,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170717426360007.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:词频   英文   论文   Python
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23