目录:
一、项目描述
二、项目环境
三、项目步骤
四、项目实现
4.1、创建一个需求文档存放需求文件,文件内包含本次项目的所有需求
4.2、新建一个配置文件config.py,用来存放配置条目,如农产品的种类、所查询的年份、进口国、出口国、以及URL的请求、需要连接的mysql数据库等
4.3、导入country.py文件,里面存放了所有的国家和国家对应的id,如需请自行复制。
4.4、创建recv_data.py文件,用来请求联合国贸易的接口,获取所设定年份的所有国家跟中国的玉米贸易数据,并将请求好的数据文件保存在本地的data目录下
4.5、打开CentOS虚拟机创建mysql库,并配置对应的环境
4.6、创建一个insert_mysql.py的文件,用于连接数据库,并将所有数据都保存到mysql数据库内、等待后续数据的处理分析。
4.7创建一个analysis.py文件,用来分析数据、提取出与中国玉米交易进出口量前三的国家,并绘制柱状图
4.8、创建一个sendmail.py的文件,用来将最后的柱状图发送给老师的邮箱
五、项目心得
一、项目描述
该项目旨在通过对联合国农业组织(FAO)提供的全球农产品数据进行分析,为农业决策制定者提供有用的信息和见解。该项目是从联合国的农业贸易数据里,爬取到各个国家农产品的交易记录,并且对这个数据进行分析,方便他人可以通过图表对我国农产品的进出口有个很清晰的了解。
二、项目环境
运用python3.10(requests、os、pandas、pymysql、matplotlib等库)、sql语言、mysql5.7.35、CentOS Linux release 7.9.2009 (Core)
三、项目步骤
1、进行需求分析,并将需求写入需求文档
#爬取联合国玉米的交易数据
#需求年份 2020 - 2022 年 这个年份最好不要写死、 以后方便更改、 最好用配置文件取管理
#保存成csv文件 放到data目录 这个data目录路径你要自己设计好
#入库到数据库
#url = "https://comtrade.un.org/api/get/plus?max=10000&type=C&freq=A&px=HS&ps=2020&r=156&p=all&rg=all&cc=1006&fmt=csv"
#下一步
#1、 使用pandas 将爬取的数据进行入库
#2、 使用matplotlib 作图 画出与中国玉米交易进出口量前三的国家 以及他们的进出口量分别是多少
# 形成柱状图
#3、将柱状图以邮件的形式发给老师
2、查看联合国的贸易网站(UN Comtrade),并查看网站接口api的文档
3、编写爬取网页数据的代码(python语言),使用requests库发起请求,并将爬取的数据通过csv文件的形式保存在本地,以便于后续数据的总结与分析。
4、将csv文件使用pandas库进行合并送入mysql库中
5、对入库的数据进行分析,选择出最近几年内该农产品贸易数量最多的数据,并绘制图表(matplotlib),然后将图表结果通过邮件发送给老师。
四、项目实现
4.1、创建一个需求文档存放需求文件,文件内包含本次项目的所有需求
4.2、新建一个配置文件config.py,用来存放配置条目,如农产品的种类、所查询的年份、进口国、出口国、以及URL的请求、需要连接的mysql数据库等
"""
@data:2023/3/8
@file:config
@author:Claylpfconfig作为我们的配置模块
可以用来配置年份、产品等信息
保存项目的配置
"""#如果需要获取别的年份、 直接修改如下的配置
Start_Year = 2020 #开始年份
End_Year = 2022 #结束年份#要获取的产品
#其他农产品的id可以去网址查询,此项目以玉米来举例说明
Products = "maize"
Products_ID = 1005#我们需求的url
URL = "https://comtrade.un.org/api/get/plus?max=10000&type=C&freq=A&px=HS&p=all&rg=all&fmt=csv"#sqlalchmey 连接数据库url#什么数据库+什么方式连接://用户名:密码@主机ip地址/连接的数据库?字符编码
DBURL = "mysql+pymysql://sc:123456@192.168.2.200/sc?charset=utf8"
解释:
URL 链接、网址、地址、统一资源定位器
协议 域名--》IP 资源
url="https://comtrade.un.org/api/get/plus
?max=10000&type=C&freq=A&px=HS&ps=2020&r=156&p=all&rg=all&cc=1006&fmt=csv"
url参数
url: 称为接口
https: https协议 超文本传输协议安全(HyperText Transfer Protocol Secure)
由于互联网的不安全性和ERP系统的高安全性相悖,所以基于B/S结构的ERP系统应该采用安全超文本传输协议(HTTPS)。实际上,HTTPS就是应用了Netscape的完全套接字层(SSL)作为HTTP应用层的子层。
所谓的输入网页、 就是客户机和服务器的一问一答 当你在浏览器输入www.baidu.com的时候 你就会通过网络连接访问百度的服务器
然后百度的服务器就会通过接口url传输一个html文件过来、 我们自己的浏览器就会运行这个html文件、 然后形成我们看到的百度网页
所以http 就叫做超文本传输协议 其实就是用来传输数据 或者 html文件的协议
comtrade.un.org:网站地址
后面都是资源
? 后面的就都是传递过去的参数
max=10000:最多一次性可以下载多少条 并记录下来
type=C等 都是接口参数
接口参数提示文档:
https://comtrade.un.org/data/doc/api/#DataAvailabilityRequests
4.3、导入country.py文件,里面存放了所有的国家和国家对应的id,如需请自行复制。
"""
@data:2023/3/8
@file:country
@author:Claylpf
"""
country = [{"id": "all","text": "All"},{"id": "4","text": "Afghanistan"},{"id": "8","text": "Albania"},{"id": "12","text": "Algeria"},{"id": "20","text": "Andorra"},{"id": "24","text": "Angola"},{"id": "660","text": "Anguilla"},{"id": "28","text": "Antigua and Barbuda"},{"id": "32","text": "Argentina"},{"id": "51","text": "Armenia"},{"id": "533","text": "Aruba"},{"id": "36","text": "Australia"},{"id": "40","text": "Austria"},{"id": "31","text": "Azerbaijan"},{"id": "44","text": "Bahamas"},{"id": "48","text": "Bahrain"},{"id": "50","text": "Bangladesh"},{"id": "52","text": "Barbados"},{"id": "112","text": "Belarus"},{"id": "56","text": "Belgium"},{"id": "58","text": "Belgium-Luxembourg"},{"id": "84","text": "Belize"},{"id": "204","text": "Benin"},{"id": "60","text": "Bermuda"},{"id": "64","text": "Bhutan"},{"id": "68","text": "Bolivia (Plurinational State of)"},{"id": "535","text": "Bonaire"},{"id": "70","text": "Bosnia Herzegovina"},{"id": "72","text": "Botswana"},{"id": "92","text": "Br. Virgin Isds"},{"id": "76","text": "Brazil"},{"id": "96","text": "Brunei Darussalam"},{"id": "100","text": "Bulgaria"},{"id": "854","text": "Burkina Faso"},{"id": "108","text": "Burundi"},{"id": "132","text": "Cabo Verde"},{"id": "116","text": "Cambodia"},{"id": "120","text": "Cameroon"},{"id": "124","text": "Canada"},{"id": "136","text": "Cayman Isds"},{"id": "140","text": "Central African Rep."},{"id": "148","text": "Chad"},{"id": "152","text": "Chile"},{"id": "156","text": "China"},{"id": "344","text": "China, Hong Kong SAR"},{"id": "446","text": "China, Macao SAR"},{"id": "170","text": "Colombia"},{"id": "174","text": "Comoros"},{"id": "178","text": "Congo"},{"id": "184","text": "Cook Isds"},{"id": "188","text": "Costa Rica"},{"id": "384","text": "Côte d'Ivoire"},{"id": "191","text": "Croatia"},{"id": "192","text": "Cuba"},{"id": "531","text": "Curaçao"},{"id": "196","text": "Cyprus"},{"id": "203","text": "Czechia"},{"id": "200","text": "Czechoslovakia"},{"id": "408","text": "Dem. People's Rep. of Korea"},{"id": "180","text": "Dem. Rep. of the Congo"},{"id": "208","text": "Denmark"},{"id": "262","text": "Djibouti"},{"id": "212","text": "Dominica"},{"id": "214","text": "Dominican Rep."},{"id": "588","text": "East and West Pakistan"},{"id": "218","text": "Ecuador"},{"id": "818","text": "Egypt"},{"id": "222","text": "El Salvador"},{"id": "226","text": "Equatorial Guinea"},{"id": "232","text": "Eritrea"},{"id": "233","text": "Estonia"},{"id": "231","text": "Ethiopia"},{"id": "97","text": "EU-28"},{"id": "234","text": "Faeroe Isds"},{"id": "238","text": "Falkland Isds (Malvinas)"},{"id": "242","text": "Fiji"},{"id": "246","text": "Finland"},{"id": "886","text": "Fmr Arab Rep. of Yemen"},{"id": "278","text": "Fmr Dem. Rep. of Germany"},{"id": "866","text": "Fmr Dem. Rep. of Vietnam"},{"id": "720","text": "Fmr Dem. Yemen"},{"id": "230","text": "Fmr Ethiopia"},{"id": "280","text": "Fmr Fed. Rep. of Germany"},{"id": "582","text": "Fmr Pacific Isds"},{"id": "590","text": "Fmr Panama, excl.Canal Zone"},{"id": "592","text": "Fmr Panama-Canal-Zone"},{"id": "868","text": "Fmr Rep. of Vietnam"},{"id": "717","text": "Fmr Rhodesia Nyas"},{"id": "736","text": "Fmr Sudan"},{"id": "835","text": "Fmr Tanganyika"},{"id": "810","text": "Fmr USSR"},{"id": "890","text": "Fmr Yugoslavia"},{"id": "836","text": "Fmr Zanzibar and Pemba Isd"},{"id": "251","text": "France"},{"id": "254","text": "French Guiana"},{"id": "258","text": "French Polynesia"},{"id": "583","text": "FS Micronesia"},{"id": "266","text": "Gabon"},{"id": "270","text": "Gambia"},{"id": "268","text": "Georgia"},{"id": "276","text": "Germany"},{"id": "288","text": "Ghana"},{"id": "292","text": "Gibraltar"},{"id": "300","text": "Greece"},{"id": "304","text": "Greenland"},{"id": "308","text": "Grenada"},{"id": "312","text": "Guadeloupe"},{"id": "320","text": "Guatemala"},{"id": "324","text": "Guinea"},{"id": "624","text": "Guinea-Bissau"},{"id": "328","text": "Guyana"},{"id": "332","text": "Haiti"},{"id": "336","text": "Holy See (Vatican City State)"},{"id": "340","text": "Honduras"},{"id": "348","text": "Hungary"},{"id": "352","text": "Iceland"},{"id": "699","text": "India"},{"id": "356","text": "India, excl. Sikkim"},{"id": "360","text": "Indonesia"},{"id": "364","text": "Iran"},{"id": "368","text": "Iraq"},{"id": "372","text": "Ireland"},{"id": "376","text": "Israel"},{"id": "381","text": "Italy"},{"id": "388","text": "Jamaica"},{"id": "392","text": "Japan"},{"id": "400","text": "Jordan"},{"id": "398","text": "Kazakhstan"},{"id": "404","text": "Kenya"},{"id": "296","text": "Kiribati"},{"id": "414","text": "Kuwait"},{"id": "417","text": "Kyrgyzstan"},{"id": "418","text": "Lao People's Dem. Rep."},{"id": "428","text": "Latvia"},{"id": "422","text": "Lebanon"},{"id": "426","text": "Lesotho"},{"id": "430","text": "Liberia"},{"id": "434","text": "Libya"},{"id": "440","text": "Lithuania"},{"id": "442","text": "Luxembourg"},{"id": "450","text": "Madagascar"},{"id": "454","text": "Malawi"},{"id": "458","text": "Malaysia"},{"id": "462","text": "Maldives"},{"id": "466","text": "Mali"},{"id": "470","text": "Malta"},{"id": "584","text": "Marshall Isds"},{"id": "474","text": "Martinique"},{"id": "478","text": "Mauritania"},{"id": "480","text": "Mauritius"},{"id": "175","text": "Mayotte"},{"id": "484","text": "Mexico"},{"id": "496","text": "Mongolia"},{"id": "499","text": "Montenegro"},{"id": "500","text": "Montserrat"},{"id": "504","text": "Morocco"},{"id": "508","text": "Mozambique"},{"id": "104","text": "Myanmar"},{"id": "580","text": "N. Mariana Isds"},{"id": "516","text": "Namibia"},{"id": "524","text": "Nepal"},{"id": "530","text": "Neth. Antilles"},{"id": "532","text": "Neth. Antilles and Aruba"},{"id": "528","text": "Netherlands"},{"id": "540","text": "New Caledonia"},{"id": "554","text": "New Zealand"},{"id": "558","text": "Nicaragua"},{"id": "562","text": "Niger"},{"id": "566","text": "Nigeria"},{"id": "579","text": "Norway"},{"id": "512","text": "Oman"},{"id": "490","text": "Other Asia, nes"},{"id": "586","text": "Pakistan"},{"id": "585","text": "Palau"},{"id": "591","text": "Panama"},{"id": "598","text": "Papua New Guinea"},{"id": "600","text": "Paraguay"},{"id": "459","text": "Peninsula Malaysia"},{"id": "604","text": "Peru"},{"id": "608","text": "Philippines"},{"id": "616","text": "Poland"},{"id": "620","text": "Portugal"},{"id": "634","text": "Qatar"},{"id": "410","text": "Rep. of Korea"},{"id": "498","text": "Rep. of Moldova"},{"id": "638","text": "Réunion"},{"id": "642","text": "Romania"},{"id": "643","text": "Russian Federation"},{"id": "646","text": "Rwanda"},{"id": "647","text": "Ryukyu Isd"},{"id": "461","text": "Sabah"},{"id": "652","text": "Saint Barthelemy"},{"id": "654","text": "Saint Helena"},{"id": "659","text": "Saint Kitts and Nevis"},{"id": "658","text": "Saint Kitts, Nevis and Anguilla"},{"id": "662","text": "Saint Lucia"},{"id": "534","text": "Saint Maarten"},{"id": "666","text": "Saint Pierre and Miquelon"},{"id": "670","text": "Saint Vincent and the Grenadines"},{"id": "882","text": "Samoa"},{"id": "674","text": "San Marino"},{"id": "678","text": "Sao Tome and Principe"},{"id": "457","text": "Sarawak"},{"id": "682","text": "Saudi Arabia"},{"id": "686","text": "Senegal"},{"id": "688","text": "Serbia"},{"id": "891","text": "Serbia and Montenegro"},{"id": "690","text": "Seychelles"},{"id": "694","text": "Sierra Leone"},{"id": "702","text": "Singapore"},{"id": "703","text": "Slovakia"},{"id": "705","text": "Slovenia"},{"id": "711","text": "So. African Customs Union"},{"id": "90","text": "Solomon Isds"},{"id": "706","text": "Somalia"},{"id": "710","text": "South Africa"},{"id": "728","text": "South Sudan"},{"id": "724","text": "Spain"},{"id": "144","text": "Sri Lanka"},{"id": "275","text": "State of Palestine"},{"id": "729","text": "Sudan"},{"id": "740","text": "Suriname"},{"id": "748","text": "Eswatini"},{"id": "752","text": "Sweden"},{"id": "757","text": "Switzerland"},{"id": "760","text": "Syria"},{"id": "762","text": "Tajikistan"},{"id": "807","text": "North Macedonia"},{"id": "764","text": "Thailand"},{"id": "626","text": "Timor-Leste"},{"id": "768","text": "Togo"},{"id": "772","text": "Tokelau"},{"id": "795","text": "Turkmenistan"},{"id": "796","text": "Turks and Caicos Isds"},{"id": "798","text": "Tuvalu"},{"id": "800","text": "Uganda"},{"id": "804","text": "Ukraine"},{"id": "858","text": "Uruguay"},{"id": "850","text": "US Virgin Isds"},{"id": "842","text": "USA"},{"id": "841","text": "USA (before 1981)"},{"id": "548","text": "Vanuatu"},{"id": "862","text": "Venezuela"},{"id": "704","text": "Viet Nam"},{"id": "975","text": "ASEAN"}]
4.4、创建recv_data.py文件,用来请求联合国贸易的接口,获取所设定年份的所有国家跟中国的玉米贸易数据,并将请求好的数据文件保存在本地的data目录下
"""
@data:2023/3/8
@file:recv_data
@author:Claylpf请求联合国贸易的接口
获取2020年 到 2022年 所有国家跟中国的玉米贸易数据
将请求好的数据文件保存在本地的 data目录下改进:
接口限制: 一个小时只有100次请求 --》 控制请求频率、 请求一下 sleep 一下
一次性得到的结果不能超过1万条:通过单个国家跟世界贸易、使数据量减少请求接口: -- 异常处理 -- 重试 2-3次 sleep个1-2s左右 再去重试当我已经请求过啦、 我们可以下次接着下面再去请求、 不需要每次都从头开始
可以通过判断我们以前是否请求成功过 就是 download函数执行成功了 就不需要再次请求了
"""
#先想好
#1、 创建配置文件
#config
#country#2、调用请求模块
import requests
#调用配置文件
import country
import config
#os 是系统模块、 可以用来判断文件是否存在
import os
#调用time库中的sleep函数
import time#检查文件夹是否存在、 不存在就创建
def dir_check(dir_path):if not os.path.exists(dir_path):os.makedirs(dir_path)#通过接口请求获取相应的信息、 让后保存在当前data目录下
def download(full_url, file_path):#进行3次重试for i in range(3):# 当出现异常 需要进行异常检测try:#进行请求result = requests.get(full_url)#判断是否请求成功 status_code = 200 表示成功if result.status_code == 200:with open(file_path, "w+", encoding="utf-8") as fp:# w+ 表示文本覆盖#拿到它返回的文本信息fp.write(result.text)breakexcept:print("请求失败,重新发起请求")#每次都睡2s 来保障准确率time.sleep(2)if __name__ == "__main__":#判断数据目录存不存在、 不存在就创建dir_path = "data/" + config.Productsdir_check(dir_path)for year in range(config.Start_Year, config.End_Year + 1):#创建年份文件夹year_path = dir_path + "/" + str(year)dir_check(year_path)for state in country.country[1:]:print(f"正在下载{year}年,{state['text']}国家的{config.Products}数据")#state 成为字典、 我们需要获取字典里的idstate_id = state["id"]#请求的完整urlfull_url = config.URL + "&ps=" + str(year) + "&r=" + state_id + "&cc=" + str(config.Products_ID)# print(full_url)#保存数据的文件路径file_path = year_path + "/" + state.get("text") + ".csv"#调用download函数# download(full_url, file_path)if os.path.exists(file_path):print("此文件已经存在、不需要重新请求")else:download(full_url, file_path)# 使它一小时以内请求在100次以内 36s 一次time.sleep(36)
当运行该文件里的代码时,就会通过请求api接口获得数据,等待一段时间后将所有数据都获取完成并保存在本地文件夹data下、如下图所示。
4.5、打开CentOS虚拟机创建mysql库,并配置对应的环境
=====
1、安装好mysql
yum install mariadb-server mariadb -y2、启动服务
systemctl start mariadb3、查看进程
ps -ef|grep mariadb(查看当前安装了哪些包)
[root@service ~]# rpm -qa|grep mariadb
mariadb-server-5.5.68-1.el7.x86_64
mariadb-libs-5.5.68-1.el7.x86_64
mariadb-5.5.68-1.el7.x86_64c/s架构: client server 客户端 服务器
b/s架构: browser server 浏览器 服务器=====
[root@service ~]# lsof -i:3306 --》 查看mysql的端口 3306
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld 11008 mysql 14u IPv4 130372 0t0 TCP *:mysql (LISTEN) --》监听状态
[root@service ~]# [root@service ~]# yum provides lsof --》 查看lsof属于哪个包
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile* base: mirrors.ustc.edu.cn* epel: mirror.01link.hk* extras: mirrors.ustc.edu.cn* updates: mirrors.aliyun.com
lsof-4.87-6.el7.x86_64 : A utility which lists open files on a: Linux/UNIX system
Repo : baselsof-4.87-6.el7.x86_64 : A utility which lists open files on a: Linux/UNIX system
Repo : @anaconda[root@service ~]#
=====
4、在服务端给mysql的root用户设置密码123456
[root@service ~]# mysqladmin -u root password 123456 --》 更改mysql密码5、连接到mysql中 填写刚才设定的密码
[root@service ~]# mysql -u root -p --》 登录mysql
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 5.5.68-MariaDB MariaDB ServerCopyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.MariaDB [(none)]> MariaDB [(none)]> show databases; --》 查看当前数据库
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)=====
6、查看用户权限MariaDB [(none)]> use mysql --》 使用mysql库
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -ADatabase changed
MariaDB [mysql]> MariaDB [mysql]> show tables; --》查看数据库有哪些列表
+---------------------------+
| Tables_in_mysql |
+---------------------------+
| columns_priv |
| db |
| event |
| func |
| general_log |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| host |
| ndb_binlog_index |
| plugin |
| proc |
| procs_priv |
| proxies_priv |
| servers |
| slow_log |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| user |
+---------------------------+
24 rows in set (0.00 sec)MariaDB [mysql]> MariaDB [mysql]> select * from user \G; --》 查看 user表的所有内容 使用 \G 的格式查看7、用户授权: (新建一个sc用户 使他有连接外来用户的权力 并且设置它的密码为123456)
MariaDB [(none)]> grant all on *.* to sc@'%' identified by '123456';
可以允许所有用户访问mysql数据库8、创建数据库和其中的表格MariaDB [mysql]> create database sc; --》 创建数据库 sc
Query OK, 1 row affected (0.00 sec)MariaDB [mysql]> MariaDB [mysql]> show databases; --》 展示所有数据库
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sc |
| test |
+--------------------+
5 rows in set (0.01 sec)MariaDB [mysql]> MariaDB [mysql]> use sc; -->使用sc数据库
Database changed
MariaDB [sc]>
使用pymysql尝试连接mysql数据库
=====
python操控数据库1、 在pycharm里的teminal安装pymysqlMicrosoft Windows [版本 10.0.19044.2604]
(c) Microsoft Corporation。保留所有权利。(Day_01_venv) D:\衡山-开发\python\pythonProject_day01>pip install pymysqlpython去连接mysql的客户端=====
2、python连接数据库
=====
"""
@data:2023/3/4
@file:pymysql的使用.py
@author:Claylpf
"""
import pymysql#连接数据库 (默认添加3306端口)
db = pymysql.connect(host = "192.168.2.200", #填写mysql主机IPuser = "root", #用户名passwd = "123456", #密码database = "sc" #数据库
)#使用cursor()方法创建一个游标对象、 保存中间结果的
cursor = db.cursor() #使用execute执行sql语句
cursor.execute("select * from userinfo")
# data = cursor.fetchone() # 查找第一行
data = cursor.fetchall() # 查找所有
print(data)
#输出元组数据#关闭数据库
db.close()=====
4.6、创建一个insert_mysql.py的文件,用于连接数据库,并将所有数据都保存到mysql数据库内、等待后续数据的处理分析。
"""
@data:2023/3/8
@file:insert_mysql
@author:Claylpf
"""
#导入 pymysql模块
import pymysql
#导入pandas数据库叫做pd
import pandas as pd
#从sqlalchemy模块导入create_engine函数
from sqlalchemy import create_engine
#导入文件处理os模块
import os
#从config模块里导入DBURL,Start_Year, End_Year, Products
from config import DBURL, Start_Year, End_Year, Products#用pymysql模块的方法连接到数据库
conn = pymysql.connect(host='192.168.2.200',#主机的ip地址user='sc',#用户名password='123456',#用户密码database='sc',#连接的数据库的名称charset='utf8'#字符编码格式)#创建连接数据库引擎
engine = create_engine(DBURL)#按年份存入数据库
dir_path = "data"+"\\"+Products
for year in range(Start_Year,End_Year+1):#按年存入数据库#将每一年的csv都读取到一个大的dataframe里面,然后使用to_sql入库df = pd.DataFrame()#得到年份目录的路径year_path = os.path.join(dir_path,str(year))#循环得到年份目录下每一个以.csv结尾的国家的文件for filename in os.listdir(year_path):#使用os的拼接方法,得到文件的路径file_path = os.path.join(year_path,filename)#使用pandas将每个文件都读取出来tmp_df = pd.read_csv(file_path)#将每个小的dataframe合并成一个大的dataframe --》改变df = pd.concat([tmp_df],ignore_index=False)#用pandas处理表格数据,筛选出年份不为空的数据df = df[df["Period"].notnull()]#将最终筛选出来的df表格插入到数据库里df.to_sql("union_table",engine,if_exists="append",index=False)
执行该代码、查看虚拟机上的mysql中的sc库是否生成了union_table表格
MariaDB [sc]> show tables; --》查看sc库
+--------------+
| Tables_in_sc |
+--------------+
| union_table |
| userinfo |
+--------------+
2 rows in set (0.00 sec)MariaDB [sc]>
4.7创建一个analysis.py文件,用来分析数据、提取出与中国玉米交易进出口量前三的国家,并绘制柱状图
"""
@data:2023/3/8
@file:analysis
@author:Claylpf
"""
#分析2022年中国与其他国家玉米交易量前三的国家
#使用pandas matplotlib作图
import pymysql
import pandas as pd
import matplotlib.pyplot as plt#连接数据库
db = pymysql.connect(host = "192.168.2.200", #mysql主机ipuser = "sc", #用户名passwd = "123456", #密码database = "sc" #数据库
)#获取china 2022年对所有国家的交易数据
df = pd.read_sql("select * from union_table where Reporter='China' and Year='2022';", con=db)
#与中国进行玉米交易前三的国家数据 Partner得到的国家就成为这个dataframe 的index了
result = df.groupby('Partner').sum().sort_values(by="Netweight (kg)", ascending=False).iloc[1:4, :]
#sql语句 直接查询
#select Partner, sum(Qty) as nw from union_table where Reporter='China' and Year='2021' group by Partner order by nw desc limit 5;#把前三的国家 交易数据放入result_df
result_df = df[df['Partner'].isin(result.index) ]
#把Partner作为result_df索引
result_df.set_index(['Partner'], inplace=True)
#筛选交易类型和交易量
result_df = result_df[['Trade Flow','Netweight (kg)']]
#s1是所有出口国家
#s2是所有进口国家
s1 = result_df[result_df["Trade Flow"] == "X"]['Netweight (kg)']
s2 = result_df[result_df["Trade Flow"] == "M"]['Netweight (kg)']
#将s1,s2,俩个serise合并成一个dataframe,并且他们的列名称就是"export","import"
rdf = pd.concat([s1,s2], axis=1)
rdf.columns = ["export","import"]
# #画图#对于dataframe,matplotlib作图工具,会自动的将列名称作为柱状图的x轴下标名称
#会自动的将每一行的数据生成一个柱状图
rdf.plot.bar()
#让柱状图的下标可以自由的旋转,达到合适的效果
plt.xticks(rotation=360)
# fontproperties='simhei',让python画图支持中文格式
plt.title(f"2022-玉米进出口总量前三排名",fontproperties='simhei')
#将图片保存在当前目录1.png里面
plt.savefig('2022.png')plt.show()
执行该代码产生如下图片:
4.8、创建一个sendmail.py的文件,用来将最后的柱状图发送给老师的邮箱
"""
@data:2023/3/8
@file:sendmail
@author:Claylpf
"""
#邮件协议
#smtp协议 发邮件的协议 默认端口25号
#pop3 收邮件的协议 默认端口110
#imap 收邮件的协议 默认端口143#pop3和imap都是收邮件的协议,区别在于
#pop3在客户端操作不会反馈到服务器
#imap在客户端的操作会反馈到服务器#163.com邮箱的授权码
#safdsdfsad(这里保存好自己的授权码)#python里两个模块发邮件
#1、smtplib 发邮件
#2、email 构建邮件内容import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication#设置第三方发邮件 SMTP服务
mail_host = "smtp.163.com" #设置服务器
mail_user = "123456@163.com" #用户名
mail_pass = "saasfd" #授权码,填自己正确的授权码receivers = "123456@163.com" #接收邮箱,可以自己给自己发邮件
#创建一个实例
msg = MIMEMultipart()
#设置标题的一些主题,邮件来自和邮件发送
msg['Subject'] = 'lhj'
msg['From'] = mail_user
msg['To'] = receivers#给邮件加入内容
message = MIMEText('lhj')
msg.attach(message)
#发送附件2022.png图片
message = MIMEApplication(open('2022.png','rb').read())
message.add_header('Content-Disposition','attachment',filename='1.png')
msg.attach(message)try:#创建一个实例s = smtplib.SMTP()#创建连接s.connect(mail_host)#登录,mail_user是用户名,mail_pass是授权码s.login(mail_user,mail_pass)#发送邮件,第一个参数是谁发送的,第二个参数是发送给谁的,后面的是将内容变成strings.sendmail(mail_user,receivers,msg.as_string())
except Exception as e:print(e)
s.close()
最后执行代码,将图片发送给老师邮箱
实验完毕!!!
五、项目心得
1、通过这次项目使我能更加熟练运用python的基本库了(requests、os、pandas、pymysql、matplotlib等库)
2、它加深了我对 python爬虫 和 数据处理分析 的理解
3、它还提升了我的文档的编写能力,提升了我对一些代码的规划编写能力
4、也指出了我一些粗心大意的地方,并使我能更加熟练的解决相同的问题
最后希望这篇文章对你有所帮助。🙂
https://gitee.com/Claylpf/UN-agricultural-product-data-analysis-python