有时候看英文论文,高频词汇是一些术语,可能不太认识,因此我们可以先分析一下该论文的词频,对于高频词汇可以在看论文之前就记住其意思,这样看论文思路会更顺畅一些,接下来就讲一下如何用python输出一篇英文论文的词汇出现频次。
首先肯定要先把论文从PDF版转为txt格式,一般来说直接转会出现乱码,建议先转为Word格式,之后再复制为txt文本格式。
接下来附上含有详细注释的代码
#论文词频分析
#You should convert the file to text format#Read the text and save all the words in a list
def readtxt(filename):fr = open(filename, 'r')wordsL = []#use this list to save the wordsfor word in fr:word = word.strip()word = word.split()wordsL = wordsL + wordfr.close()return wordsL#count the frequency of every word and store in a dictionary
#And sort dictionaries by value from large to small
def count(wordsL):wordsD = {}for x in wordsL:#move these words that we don't needif Judge(x):continue#countif not x in wordsD:wordsD[x] = 1wordsD[x] += 1#Sort dictionaries by value from large to smallwordsInorder = sorted(wordsD.items(), key=lambda x:x[1], reverse = True)return wordsInorder#juege whether the word is that we want to move such as punctuation or letter
#You can modify this function to move more words such as number
def Judge(word):punctList = [' ','t','n',',','.',':','?']#juege whether the word is punctuationletterList = ['a','b','c','d','m','n','x','p','t']#juege whether the word is letterif word in punctList:return Trueelif word in letterList:return Trueelse:return False#Read the file and output the file
filename = 'F:\python\'
wordsL = readtxt(filename)
words = count(wordsL)
fw = open('F:\python\Words In ','w')
for item in words:fw.write(item[0] + ' ' + str(item[1]) + 'n')
fw.close()
本文发布于:2024-02-04 22:03:45,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170717426360007.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |