仅需下述几个步骤即可快速镜像一个网站,镜像的内容包括html,js,css,image等静态页面资源,暂时无法镜像有用户交互的动态页面。
1、安装wget工具,以ubuntu系统为例
sudo apt-get install wget
2、下载网站资源
以网站http://www.szsh-gov.com/为例,静态页面比较多
执行如下命令:
wget -r -p -np -k http://www.szsh-gov.com/
下载完成后,查看本地目录结构如下:
3、搭建本地轻量级web服务
下载python + bottle作为轻量级的web框架,以ubuntu为例
sudo apt-get install python
sudo pip install bottle
在上图同级目录创建main.py文件,内容如下
#coding=utf-8
from bottle import route,request,template,view,run,Bottle,static_file,get, post, response
import bottle@get("/js/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh1(filepath):return static_file(filepath, root="js")
@get("/js2/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh2(filepath):return static_file(filepath, root="js2")
@get("/images/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh3(filepath):return static_file(filepath, root="images")
@get("/images2/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh4(filepath):return static_file(filepath, root="images2")
@get("/css/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh5(filepath):return static_file(filepath, root="css")
@get("/css2/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh6(filepath):return static_file(filepath, root="css2")
@get("/uploadfile/<filepath:re:.*\.(css|jpg|png|gif|ico|svg|js|eot|otf|svg|ttf|woff|woff2?)>")
def szsh7(filepath):return static_file(filepath, root="uploadfile")
@get("/html/<filepath:re:.*\.(html|1)>")
def szsh8(filepath):return static_file(filepath, root="html")
@get("/2016/<filepath:re:.*\.(html)>")
def szsh9(filepath):return static_file(filepath, root="2016")
@get("/2017/<filepath:re:.*\.(html)>")
def szsh10(filepath):return static_file(filepath, root="2017")
@get("/2020/<filepath:re:.*\.(html)>")
def szsh11(filepath):return static_file(filepath, root="2020")
@get("/2021/<filepath:re:.*\.(html)>")
def szsh12(filepath):return static_file(filepath, root="2021")
@get("/active/<filepath:re:.*\.(html)>")
def szsh13(filepath):return static_file(filepath, root="active")
@get("/complaint/<filepath:re:.*\.(html)>")
def szsh14(filepath):return static_file(filepath, root="complaint")
@get("/event/<filepath:re:.*\.(html)>")
def szsh15(filepath):return static_file(filepath, root="event")
@get("/policy/<filepath:re:.*\.(html)>")
def szsh16(filepath):return static_file(filepath, root="policy")
@get("/research/<filepath:re:.*\.(html)>")
def szsh17(filepath):return static_file(filepath, root="research")
@get("/service/<filepath:re:.*\.(html)>")
def szsh18(filepath):return static_file(filepath, root="service")
@get("/<filepath:re:.*\.(html)>")
def szsh19(filepath):return static_file(filepath, root="") @route('/',method = 'GET')
def index():return template("index" , result = [])run(host='0.0.0.0',port=8888)
执行如下命令,启动web服务
python main.py
[WARNING] wkhtmltopdf is not installed/configured properly. PDF Report Generation is disabled
Bottle v0.13-dev server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8888/
Hit Ctrl-C to quit.
本地访问,效果如下:
4、总结
总体思路是,先用wget下载整个网站静态资源到本地,然后通过一个轻量级的web框架,搭建web服务。
我选择python + bottle作为web框架(超简单),小伙伴们根据个人喜好,也可以选择其他web框架。