全面理解xml文件

`XML`

在这里插入图片描述

`XML`相关概念

概述：Extensible Markup Language 可扩展标记语言

功能
- 存储数据
  1. 配置文件
  2. 在网络中传输
xml和html的区别
1. XML标签都是自定义的，hmtl标签是预先定义的
2. xml的语法严格，html语法松散
3. XML是存储数据的，html是展示数据
w3c万维网联盟

语法

基本语法

xml文档的后缀名必须为.xml
xml第一行必须定义为文档声明
xml文档中有且仅有一要根标签【就是只能有一个根标签】
属性值必须使用双引号或单引号引起来
标签必须是有开始和有结束的
xml是区分大小写的

快速入门

<?xml version="1.0" encoding="utf-8" ?><users><user id="1"><userName>海康</userName><age>20</age></user><user id="2"><userName>南宁</userName><age>21</age></user>
</users>

语法组成部分

文档声明
1. 格式：<?xml 属性列表 ?>【注意第一个？号必须与xml紧张贴着，?与< >不能有空格】
  1. 属性列表取值：
    - version：版本号，必须要写的属性
    - encodint：编码方式，告知解析引擎当前文档使用的字符集，默认是IS0-8859-1
    - standalone是否是独立的【了解】
      - 取值：yes是独立的【不依赖其他文件】，no是不独立的【依赖其他文件】
2. 指令【了解】
  
  <?xml-stylesheet type="text/css" href="a.css" ?>
3. 标签：标签名称自定义的
  1. 规则：
    1. 名称可以包含字母，数字，以及其他的字符
    2. 名称不能以数字或者标点符号开始
    3. 名称不能以字母 xml（或者XML、Xml等等）开始
    4. 名称不能包含空格
4. 属性：id属性值唯一
5. 文本：使用CDATA区：在该区域数据会被原样展示
  
  格式： <![CDATA[ 数据 ]]>

`xml`语法中的约束

约束：规定xml文档的书写规则，作为框架的使用者(程序员)：
1. 能够在xml中引入约束文档
2. 能够简单的读懂约束文档

分类：

DTD约束：是一种简单的结束技术
Schema约束：一种复杂的约束技术

`DTD`约束

使用的步骤：

引入dtd文档到xml文档中【引入的方式有两种：内部和外部】
- 内部dtd：将约束规则定义在xml文档中
- 外部dtd：将约束规则定义在外部的dtd文件中【外部分为两种：本地和网络的】
  - 本地语法：本地：<!DOCTYPE 根标签名 SYSTEM "dtd文件的位置">
  - 网络语法：网络：<!DOCTYPE 根标签名 PUBLIC "dtd文件名字" "dtd文件的位置URL">【注意是：dtd文件名字任意的】
书写xml文件

dtd的缺点是：不能规定标签中的文本内容

案例：

第一步：书写dtd文档

<!ELEMENT students (student*) ><!--表示是根标签是students，子标签是student可以出现0到多次-->
<!ELEMENT student (name,age,sex)><!--student中的属性是name,age,sex，必须按照这个顺序书写-->
<!ELEMENT name (#PCDATA)><!--表示是字符串-->
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>
<!ATTLIST student number ID #REQUIRED><!--表示有ID且唯一-->

第二步：书写xml文档

<?xml version="1.0" encoding="UTF-8" ?>
<!--下面的语句表示引入dtd文档-->
<!DOCTYPE students SYSTEM "student.dtd"><students><student number="s_1"><name>海康</name><age>20</age><sex>男</sex></student>
</students>

`Schema`约束【文件的后缀是`.xsd`】

使用的步骤：

编写xml文档的根元素

引入 xsi前缀

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"【是固定的】

引入 xsd文件命名空间

 xsi:schemaLocation="http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans.xsdhttp://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsdhttp://www.springframework.org/schema/mvchttp://www.springframework.org/schema/mvc/spring-mvc.xsd">
【注意是：表示一次性可以引入多个xsd约束】

为每一个xsd约束声明一个前缀，作为标识，方便区别使用的是那个约束

xmlns="http://www.springframework.org/schema/beans"  表示这个是默认约束，不写前缀时，默认使用就是这个xmlns:context="http://www.springframework.org/schema/context" 表示前缀是 contextxmlns:mvc="http://www.springframework.org/schema/mvc"  表示前缀是 mvc

案例：

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"  -- 表示这个beans是默认约束，不写前缀时，使用就它xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   --引入 `xsi`前缀xmlns:context="http://www.springframework.org/schema/context" --表示前缀是 contextxmlns:mvc="http://www.springframework.org/schema/mvc" --表示前缀是 mvcxsi:schemaLocation=" -- 引入多个不同约束http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans.xsdhttp://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsdhttp://www.springframework.org/schema/mvchttp://www.springframework.org/schema/mvc/spring-mvc.xsd"><context:annotation-config /> --表示使用context约束<mvc:resources mapping="/resources/**" location="/resources/" /> --表示使用mvc约束<bean class="org.springframework.web.servlet.view.InternalResourceViewResolver"><property name="suffix" value=".jsp" /></bean> --不写前缀表示使用的是默认约束</beans>

解析

解析：操作xml文档，将文档中的数据读取到内存中
操作xml文档

 		1. 解析(读取)：将文档中的数据读取到内存中2. 写入：将内存中的数据保存到`xml`文档中。持久化的存储

解析`xml`的方式：

1. DOM：将标记语言文档一次性加载进内存，在内存中形成一颗dom树优点：操作方便，可以对文档进行CRUD的所有操作缺点：占内存
2. SAX：逐行读取，基于事件驱动的。优点：不占内存。缺点：只能读取，不能增删改

解析工具

xml常见的解析器：

 		1. `JAXP`：`sun`公司提供的解析器，支持`dom`和`sax`两种思想2. ` DOM4J`：一款非常优秀的解析器3. `Jsoup`：`jsoup` 是一款J`ava `的`HTML`解析器，可直接解析某个`URL`地址、`HTML`文本内容。它提供了一套非常省力的`API`，可通过`DOM，CSS`以及类似于`jQuery`的操作方法来取出和操作数据。【同时也可以解析`xml`】4. `PULL`：`Android`操作系统内置的解析器，`sax`方式的。

`jsoup`解析工具

`jsoup`的快速入门步骤：

导入相关jar包
获取Document对象【使用的是JSoup静态方法parse】
根据Document对象获取元素集合【使用的是getElementsByTag方法】
根据元素集合获取指定元素//根据ArrayList集合方法
获取指定元素中的文本【使用的是test方法】

package jsoup;/*** @author: 海康* @version: 1.0*/import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;/*** JSoup的快速入门*/
public class JSoupDome1 {public static void main(String[] args) throws IOException {//使用的步骤// 1.导入相关jar包// 2获取Document对象// 2.1相加相关xml文件String path = JSoupDome1.class.getClassLoader().getResource("introduction.xml").getPath();// 2.2使用Jsoup中静态方法parse获取一个Document对象Document parse = Jsoup.parse(new File(path), "utf-8");//3.根据Document对象获取元素集合//public class Elements extends ArrayList<Element>继承ArrayList集合说明可以当作ArrayList使用Elements userName = parse.getElementsByTag("userName");//4.根据元素集合获取指定元素//根据ArrayList集合方法Element element = userName.get(0);//5.获取元素中内容String text = element.text();//对象获取的内容进行打印System.out.println(text);}
}

`JSoup`工具类中对象

1.Jsoup：工具类，可以解析html或xml文档，返回Document

常用方法parse并提供了重载方法

`static Document`	`parse(File in, String charsetName)`	Parse the contents of a file as HTML.【常用】

`static Document`	`parse(String html)`	Parse HTML into a Document.【了解】

`static Document`	`parse(URL url, int timeoutMillis)`	Fetch a URL, and parse it as HTML.【通过网络中的url获取`Document`对象】

案例：

@Testpublic void method() throws IOException {/*** 对象parse方法【有三个常用的重载方法】常用方法进行说明*/// 方法一：【掌握】String path = JSoupDome1.class.getClassLoader().getResource("introduction.xml").getPath();// 2.2使用Jsoup中静态方法parse获取一个Document对象Document parse = Jsoup.parse(new File(path), "utf-8");System.out.println(parse);//方法二：【了解】Document document = Jsoup.parse("<?xml version=\"1.0\" encoding=\"utf-8\" ?>\n" +"<users>\n" +"    <user id=\"1\">\n" +"        <userName>海康</userName>\n" +"        <age>20</age>\n" +"    </user>\n" +"</users>");System.out.println(document);// 方法三：使用URL获取网上的html或xml文件【掌握】URL url = new URL("https://www.csdn.net/");Document parse1 = Jsoup.parse(url, 20000);//第二参数是设定超时的毫秒值System.out.println(parse1);}

2.Document对象：文档对象。代表内存中的dom树

获取Element或Elements对象

常用方法：

`Element`	`getElementById(String id)`	Find an element by ID, including or under this element.【根据`id`获取一个`Element`对象】

`Elements`	`getElementsByAttribute(String key)`	Find elements that have a named attribute set.【根据键【属性名称】获取一个`Elements`对象】

`Elements`	`getElementsByAttributeValue(String key, String value)`	Find elements that have an attribute with the specific value【根据属性的键和值获取一个`Elements`对象】

`Elements`	`getElementsByTag(String tagName)`	Finds elements, including and recursively under this element, with the specified tag name.【根据名称获取一个`Elements`对象】

案例：

@Testpublic void testElement() throws IOException {// 根据id获取一个Element对象【注意是属性名一定是id,否则报错】//加载xml文件String path = JSoupDome1.class.getClassLoader().getResource("introduction.xml").getPath();Document parse = Jsoup.parse(new File(path), "utf-8");// 根据id获取一个Element对象【注意是属性名一定是id,否则获取是一个null值】Element eeje = parse.getElementById("eeje");System.out.println(eeje);System.out.println("============================");// 根据名称获取一个Element对象对象Elements users = parse.getElementsByTag("users");System.out.println(users);System.out.println("============================");// 根据一个键【就是标签中属性名】Elements对象Elements id = parse.getElementsByAttribute("id");System.out.println(id);System.out.println("============================");// 根据键值对象获取一个Elements对象Elements elements = parse.getElementsByAttributeValue("id", "eeje");System.out.println(elements);}<?xml version="1.0" encoding="utf-8" ?>
<users><user id="1"><userName>海康</userName><age>20</age></user><user id="2"><userName id="eeje">南宁</userName><age>21</age></user>
</users>

3.Elements：元素Element对象的集合。可以当做 ArrayList<Element>来使用

使用Document返回一个Elements对象

4.Element：元素对象

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c5Fr5vPK-1649680663780)(E:\Typora笔记\javaweb\IMG\image-20220411162830683.png)]$

由于Document类继承于Element类在上述中的方法是来至Document类的

获取子元素对象方法：

`Element`	`getElementById(String id)`	Find an element by ID, including or under this element.【根据`id`获取一个`Element`对象】

`Elements`	`getElementsByAttribute(String key)`	Find elements that have a named attribute set.【根据键【属性名称】获取一个`Elements`对象】

`Elements`	`getElementsByAttributeValue(String key, String value)`	Find elements that have an attribute with the specific value【根据属性的键和值获取一个`Elements`对象】

`Elements`	`getElementsByTag(String tagName)`	Finds elements, including and recursively under this element, with the specified tag name.【根据名称获取一个`Elements`对象】

获取属性值：

`tring`	`attr(String attributeKey)`	Get an attribute’s value by its key.【根据属性名获取属性值`value`】

获取文本内容：

`String`	`text()`	Gets the combined text of this element and all its children【获取标签中的文本内容】

`String`	`html()`	Retrieves the element’s inner HTML.【与`text`方法的区别是：`html`是获取所有的标签加标签中的文本内容，而`test`只获取标签中的内容】

案例：

 @Testpublic void elementMethod() throws IOException {// 根据id获取一个Element对象【注意是属性名一定是id,否则报错】//加载xml文件String path = JSoupDome1.class.getClassLoader().getResource("introduction.xml").getPath();Document parse = Jsoup.parse(new File(path), "utf-8");//获取Elements对象Elements user = parse.getElementsByTag("user");System.out.println(user.size());System.out.println("==================");//获取Element对象【就是获取一个子标签】Elements elementsByTag = parse.getElementsByTag("user");Element element = elementsByTag.get(0);//获取一个Element对象//属性值：根据属性名称获取属性值String id = element.attr("id");//根据标签中的属性名获取标签中属性的值System.out.println(id);//id="1_1"// 文本内容String text = element.text();String html = element.html();System.out.println(text);//返回：海康 20System.out.println(html);/*** 返回* <username>*  海康* </username>* <age>*  20* </age>*/}<?xml version="1.0" encoding="utf-8" ?>
<users><user id="1_1"><userName >海康</userName><age>20</age></user><user id="2"><userName id="eeje">南宁</userName><age>21</age></user>
</users>

node：节点对象

是Document和Element的父类

快捷查询方式

有两种查询方式：

selector：选择器
XPaht：XPath即为XML路径语言，它是一种用来确定xml（标准通用标记语言的子集）文档中某个部分位置的语言

`selector`选择器方式

使用的方法是：

`Elements`	`select(String cssQuery)`	Find elements that match the `Selector` CSS query, with this element as the starting context.【根据文本标签获取】

语法需要参考Selector类中定义的方法

Selector选择器概述

tagname: 通过标签查找元素，比如：a
ns|tag: 通过标签在命名空间查找元素，比如：可以用 fb|name 语法来查找 <fb:name> 元素
#id: 通过ID查找元素，比如：#logo
.class: 通过class名称查找元素，比如：.masthead
[attribute]: 利用属性查找元素，比如：[href]
[^attr]: 利用属性名前缀来查找元素，比如：可以用[^data-] 来查找带有HTML5 Dataset属性的元素
[attr=value]: 利用属性值来查找元素，比如：[width=500]
[attr^=value], [attr$=value], [attr*=value]: 利用匹配属性值开头、结尾或包含属性值来查找元素，比如：[href*=/path/]
[attr~=regex]: 利用属性值匹配正则表达式来查找元素，比如： img[src~=(?i)\.(png|jpe?g)]
*: 这个符号将匹配所有元素

Selector选择器组合使用

el#id: 元素+ID，比如： div#logo
el.class: 元素+class，比如： div.masthead
el[attr]: 元素+class，比如： a[href]
任意组合，比如：a[href].highlight
ancestor child: 查找某个元素下子元素，比如：可以用.body p 查找在"body"元素下的所有 p元素
parent > child: 查找某个父元素下的直接子元素，比如：可以用div.content > p 查找 p 元素，也可以用body > * 查找body标签下所有直接子元素
siblingA + siblingB: 查找在A元素之前第一个同级元素B，比如：div.head + div
siblingA ~ siblingX: 查找A元素之前的同级X元素，比如：h1 ~ p
el, el, el:多个选择器组合，查找匹配任一选择器的唯一元素，例如：div.masthead, div.logo

案例：

public static void main(String[] args) throws IOException {URL resource = JsoupSelector.class.getClassLoader().getResource("introduction.xml");String path = resource.getPath();Document parse = Jsoup.parse(new File(path), "utf-8");// 注意是：一般都是使用Document对象，因为继承了Element功能更强大//查询标签名为userNameElements userName = parse.select("userName");System.out.println(userName);System.out.println("======================");//查询id值为1_1的元素Elements id = parse.select("#1_1");System.out.println(id);System.out.println("========================");// 查询user标签并且id属性值为1_1的age子标签// 步骤1：获取user标签并且id属性值为1_1Elements select = parse.select("user[id='1_1']");System.out.println(select);System.out.println("===============");// 步骤2：获取user标签中的子标签ageElements age = parse.select("user[id='1_1'] > age");System.out.println(age);}

`XPath`查询方式

XPath查询：XPath即为XML路径语言，它是一种用来确定XML（标准通用标记语言的子集）文档中某部分位置的语言

XPath使用步骤：

导入使用JSoup的XPath需要额外的jar包
由于Document对象，不支持XPath语法，所以需要创建一个JXDocument对象，并将Doc树传入JXDocument对象本质是JXDocument操作是DOC树】
获取一个标签对象，进行相关的操作

案例：

/*** @author: 海康* @version: 1.0*/
public class XPath {public static void main(String[] args) throws IOException, XpathSyntaxErrorException {//加载 introduction.xml 文件String path = XPath.class.getClassLoader().getResource("introduction.xml").getPath();Document parse = Jsoup.parse(new File(path), "utf-8");//获取一个DOM树// 由于Document不支持 XPath语法，需要JXDocument对象，将DOM树传入JXDocument jxDocument = new JXDocument(parse);// 获取所有user标签List<JXNode> user = jxDocument.selN("//user");// 遍历所有user标签for (JXNode jxNode:user) {System.out.println(jxNode);}System.out.println("====================");// 查询所有user标签下userName标签List<JXNode> userName = jxDocument.selN("//user/userName");// 遍历所有userName标签for (JXNode jxNode:userName) {System.out.println(jxNode);}//查询user标签下userName标签有id的标签List<JXNode> jxNodes = jxDocument.selN("//user/userName[@id]");System.out.println(jxNodes);// 查询user标签下userName标签有id的标签并且id值为eejeJXNode jxNode = jxDocument.selNOne("//user/userName[@id='eeje']");System.out.println(jxNode);}
}