用Python实现针对英文论文的词频分析

阅读：评论：0

有时候看英文论文，高频词汇是一些术语，可能不太认识，因此我们可以先分析一下该论文的词频，对于高频词汇可以在看论文之前就记住其意思，这样看论文思路会更顺畅一些，接下来就讲一下如何用python输出一篇英文论文的词汇出现频次。

首先肯定要先把论文从PDF版转为txt格式，一般来说直接转会出现乱码，建议先转为Word格式，之后再复制为txt文本格式。

接下来附上含有详细注释的代码

#论文词频分析
#You should convert the file to text format#Read the text and save all the words in a list
def readtxt(filename):fr = open(filename, 'r')wordsL = []#use this list to save the wordsfor word in fr:word = word.strip()word = word.split()wordsL = wordsL + wordfr.close()return wordsL#count the frequency of every word and store in a dictionary
#And sort dictionaries by value from large to small
def count(wordsL):wordsD = {}for x in wordsL:#move these words that we don't needif Judge(x):continue#countif not x in wordsD:wordsD[x] = 1wordsD[x] += 1#Sort dictionaries by value from large to smallwordsInorder = sorted(wordsD.items(), key=lambda x:x[1], reverse = True)return wordsInorder#juege whether the word is that we want to move such as punctuation or letter
#You can modify this function to move more words such as number
def Judge(word):punctList = [' ','t','n',',','.',':','?']#juege whether the word is punctuationletterList = ['a','b','c','d','m','n','x','p','t']#juege whether the word is letterif word in punctList:return Trueelif word in letterList:return Trueelse:return False#Read the file and output the file 
filename = 'F:\python\'
wordsL = readtxt(filename)
words = count(wordsL)
fw = open('F:\python\Words In ','w')
for item in words:fw.write(item[0] + ' ' + str(item[1]) + 'n')
fw.close()

本文发布于:2024-02-04 22:03:45，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170717426360007.html

上一篇：Android自带实现语言播报功能（针对英文）

下一篇：TypeError: Resolve is not a function at Object.loader

标签：词频英文论文 Python

留言与评论（共有 0 条评论）