【算法】英语短文单词词频统计

阅读: 评论:0

【算法】英语短文单词词频统计

【算法】英语短文单词词频统计

目录

题目

示例短文

输出示例

算法分析

源代码


题目

    1. 提供三篇英语短文,分别统计每篇短文中每个单词出现的次数

    2. 每个单词用空格、换行或标点符号隔开,忽视大小写

    3. 打印出现频率最高的5个单词,打印单词和出现的次数

    4. 单词的打印优先次数,再是根据单词字母在字典中的顺序

    5. 介词、冠词、连词、副词、代词不统计

示例短文

<

In the flood of darkness, hope is the light. It brings comfort, faith, and confidence. 
It gives us guidance when we are lost, and gives support when we are afraid.
And the moment we give up hope, we give up our lives. 
The world we live in is disintegrating into a place of malice and hatred, where we need hope and find it harder. 
In this world of fear, hope to find better, but easier said than done, the more meaningful life of faith will make life meaningful.
<

No one can help others as much as you do. 
No one can express himself like you. 
No one can express what you want to convey. 
No one can comfort others in your own way. 
No one can be as understanding as you are. 
No one can feel happy, carefree, and no one can smile as much as you do. 
In a word, no one can show your features to anyone else.
<

Keep faith and hope for the future. 
Make your most sincere dreams, and when the opportunities come, they will fight for them. 
It may take a season or more, but the ending will not change. Ambition, best, become a reality. 
An uncertain future, only one step at a time, the hope can realize the dream of the highest. 
We must treasure the dream, to protect it a season, let it in the heart quietly germinal. 
However, we have to gently protect our hearts deep expectations, slowly dream, will achieve new life.

输出示例

<: 
hope 4
faith 2
find 2
give 2
gives 2

<: 
can 8
no 8
one 8
as 6
do 2

<: 
dream 3
future 2
hope 2
protect 2
season 2

算法分析

1. 文件读取,将文件中的内容以字符串形式读取存入text字符串变量中

2. 字符串分割,将text文件字符串内容以 " ,.n" 进行分割

3. 通过map的特性,将分割的字符串按要求存入map的同时统计次数(map默认根据key排序)

4. 将map的数据存入vector中,通过stable_sort()进行词频排序(稳定排序)

5. 打印词频出现最多的5个单词以及出现次数,已经在vector中排序完成

源代码

#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <cstring>
#include <cstdlib>
#include <algorithm>
#include <vector>
using namespace std;// 需要删掉的 介词、冠词、连词、副词、代词
vector<string> g_delWord = {"to", "in", "on", "for", "of", "from", "between", "behind", "by", "about", "at", "with", "than","a", "an", "the", "this", "that","and", "but", "or", "so", "yet","often", "very", "then", "therefore","i", "you", "we", "he", "she", "my", "your", "hes", "her", "our", "us", "it","am", "is", "are","when", "where", "who", "what","will", "would"
};struct compare
{bool operator()(const pair<int, string>& l, const pair<int, string>& r){return l.first > r.first;}
};int main()
{for (int i = 1; i <= 3; ++i){// 获取文件名string fileName = "test";fileName += '0' + i;fileName += ".txt";// 读取文件信息fstream file;file.open(fileName, ios::in);   // 以只读方式打开文件,ios::out(只写),ios::app(追加)char text[4096];ad(text, 4096);// cout << fileName << ": " << endl;// cout << text << endl << endl;// 字符串分割,将分割的结果存入map中map<string, int> mWords;const char* s = " ,.n";char* p = strtok(text, s);while (p){string word = static_cast<string>(p);string lwrWord;transform(word.begin(), d(), back_inserter(lwrWord) ,::tolower);     // 字符串大写转小写// 排除 介词、连词、副词、代词if (find(g_delWord.begin(), d(), lwrWord) == d()){mWords[lwrWord]++;       // map的 "[]" 的重载,有插入/查询/修改功能,返回值为键值对的second值或false}p = strtok(NULL, s);}// 遍历map// int cnt = 0;// for (const auto& e: mWords)// {//     cout << "(" << e.first << ", " << e.second << ")    ";//     ++cnt;//     if (cnt % 5 == 0)//     {//         cout << endl;//     }// }// cout << endl <<endl;// 将map中的数据存入vector中vector< pair<int, string> > vWords;     // "> >"之间空格,防止与部分编译的 ">>" 重载冲突for (const auto& e: mWords){vWords.push_back(make_pair(e.second, e.first));}// 排序,sort排序存在不稳定缺陷,可以自定义sort排序规则,也可以使用stable_sortstable_sort(vWords.begin(), d(), compare());cout << fileName << ": " << endl;for (int j = 0; j < 5; ++j){cout << vWords[j].second << " " << vWords[j].first << endl;}cout << endl;}return 0;
}

本文发布于:2024-01-29 07:18:21,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170648390613625.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:词频   英语   短文   算法   单词
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23