【论文阅读】MIMICS: A Large-Scale Data Collection for Search Clarification

article/2025/10/12 1:21:09

文章目录

    • Motivation
    • Intro
    • Contribution
    • MIMICS-Click
    • MIMICS-ClickExplore
    • MIMICS-Manual
    • Data Analysis
      • Question Template Analysis
      • Analyzing Engagement Based on Clarification Impression
      • Analysis Based on Query Length
      • Analysis Based on the Number of Candidate Answers
      • Analyzing Click Entropy Distribution on Candidate Answers(confused)

原文链接: MIMICS: A Large-Scale Data Collection for Search Clarification (arxiv.org)

Motivation

The research community still feels the lack of a large-scale data for studying different aspects of search clarification.

Intro

在这里插入图片描述

  • each clarification in MIMICS consist of a clarifying question and up to five candidate answers

在这里插入图片描述

All the datasets presented in this paper only demonstrate the queries from the en-US market.

MIMICS-Click and MIMICS-ClickExplore are based on user interaction in Bing.

MIMICS-Manual is based on manual annotations of clarification paned by multiple trained annotators.

Contribution

Create MIMICS, consists of three datasets:

  • MIMICS-Click: includes over 400k unique queries with the associated clarification panes.
  • MIMICS-ClickExplore: is an exploration data and contains multiple clarification panes per query. It includes over 60k unique queries.
  • MIMICS-Manual: is a smaller dataset with manual annotations for clarifying questions, candidate answer sets, and the landing result page after clicking on individual candidate answers.

MIMICS-Click

  • only kept the queries for which a clarification pane was rendered in the search engine result page (SERP).

  • the clarification panes were solely generated based on the submitted queries, therefore they do not include session and personalized information

  • This resulted in 414,362 unique queries, each associated with exactly one clarification pane. Out of which 71,188 of clarifications have received positive clickthrough rates.

在这里插入图片描述

MIMICS-ClickExplore

Although MIMICS-Click is a invaluable resource for learning to generate clarification and related research problems, it does not allow researchers to study some tasks, such as studying click bias in user interactions with clarification.

  • we used the top-m clarifications generated by our algorithms and presented them to different sets of users. The user interactions with multiple clarification panes for the same query at the same time period enable
    comparison of these clarification panes

  • The resulted dataset contains 64,007 unique queries and 168,921 query-clarification pairs. Out of which, 89,441 query-clarification pairs received positive engagements.

  • Note that the sampling strategies for MIMICS-Click and MIMICS-ClickExplore are different which resulted in significantly more query-clarification pairs with low impressions in MIMICS-Click.

MIMICS-Manual

Click does not necessarily reflect all quality aspects. In addition, it can be biased for many reasons.

  • we randomly sampled queries from the query logs to collect manual annotations for a set of realistic user queries.
  • further used the same algorithm to generate one or more clarification pairs for each query
  • Each query-clarification pair was assigned to at least three annotators

step1: asked the annotators to skim and review a few pages of the search results returned by Bing

step2: Each clarifying question is given a label 2 (Good), 1 (Fair), or 0 (Bad)(does not show the candidate answers to the annotators at this stage)

step3: the annotators were asked to judge the overall quality of the candidate answer set(2, 1, 0)

step4: Annotating the Landing SERP(the SERP after clicking one answer) Quality for Each Individual Candidate Answer

  • Our annotations resulted in over 2.4k unique queries and over 2.8k query-clarification pairs

Note: in case of having a generic template instead of clarifying questions (i.e., “select one to refine your search”), we do not ask the annotators to provide a question quality labels.

Data Analysis

Question Template Analysis

在这里插入图片描述

  • the last four templates (T4 - T7)(less frequent in the dataset and more specific) have led to higher engagements compared to T1, T2, and T3 in both MIMICS-Click and MIMICS-ClickExplore
  • the exploration dataset has higher average engagements compared to MIMICS-Click.
    • The reason is that the number of query-clarification pairs with zero engagements in MIMICS-Click are higher than those in MIMICS-ClickExplore

Analyzing Engagement Based on Clarification Impression

在这里插入图片描述

MIMICS-Click and MIMICS-ClickExplore contain a three-level impression label per query-clarification pair

  • there is negligible difference between the average engagements across impression levels

Analysis Based on Query Length

We study user engagements and manual quality labels with respect to query length

在这里插入图片描述

  • the average engagement increases as the queries get longer.

    • longer queries are often natural language questions, while short queries are keyword queries
  • this is inconsistent with the manual annotations suggesting that single word queries have higher question quality, answer set quality, and also landing page quality

    • This observation suggests that user engagement with clarification is not necessarily aligned with the clarification quality

Analysis Based on the Number of Candidate Answers

在这里插入图片描述

  • there is a small difference between average engagements in both MIMICS-Click and MIMICS-ClickExplore datasets
    • The clarifications with three candidate answers have led to a slightly higher engagement than the rest.
  • The clarifications with three candidate answers has best question quality but worst answer set quality
    • This highlights that the question quality may play a key role in increasing user engagements

Analyzing Click Entropy Distribution on Candidate Answers(confused)

MIMICS-Click and MIMICS-ClickExplore both contain conditional click probability on each individual answer.

The entropy of this probabilistic distribution demonstrates how clicks are distributed across candidate answers.

在这里插入图片描述

  • the number of peaks in the entropy distribution is aligned with the number of candidate answers.
    • The entropy values where the histogram peaks suggest that in many cases there is a uniform-like distribution for m out of n candidate answers

http://chatgpt.dhexx.cn/article/AZZld8N7.shtml

相关文章

使用mimics重建CT图像

1.打开CT图像文件 2.界面左边为CT图像的3个视图,右边三个区域分别是 掩模 3D对象 和多边形 3.用 1选择其中一个视图 2选取骨头阈值 3区域生长选取一个掩模 4对掩模进行编辑,添加消除 5.生成三维线条 我们生成一个3维骨骼模型说明用法 4.对比度选择soft…

【Mimics】口腔牙齿三维重建

前提:我并非医学相关专业,应用软件也非专业的,过程结果都仅供参考 mimics21.0 空格键 放大视窗 ctrl右键上拉 ct视图放大 segment->threshold 区域增长消除噪点 单击牙齿 黄色模型为与牙齿相邻处 裁切框选 下拉选框 提取单颗牙齿分别…

Mimics: Edit mask in 3D

Mimics21.0 操作技巧 MaskEdit mask in 3D Objects使用3-matic编辑几何将3-matic编辑过的模型导入mimics Mask Edit mask in 3D 20.0以及21.0版本的mimics没有Edit mask in 3D这个菜单栏,所以我们要: VIEW --> Visualization options–> Mask 3D…

Mimics医学建模学习笔记

也是前不久开始学习CT三维建模,找了好些建模方法,后来还是采用了Mimics软件平台进行CT三维建模。以下分享下用到的一些资料。 mimics 16软件压缩包及安装方法百度云: 链接:https://pan.baidu.com/s/1hFIrdBfRE-VbFV3oGqW26Q …

Mimics:快捷键介绍

Mimics 软件介绍 mimics界面翻译mimics快捷键2D窗格:编辑图层(Edit Masks)3D窗格:观察几何(3D Objects) mimics界面翻译 mimics快捷键 视图快捷键 操作快捷方式1快捷方式2适用窗格平移Shift按住鼠标右键移…

Mimics 21.0 安装

文章目录 Mimics 21.0安装教程安装前准备主程序安装Mimics3-matic汉化教程 功能介绍模块介绍基础模块可选模块 软件优势比利时 Materialise 公司介绍 Mimics 21.0 安装教程 安装前准备 1.首先下载安装文件。 2.下载文件后解压文件,在网盘下载的要把几个分卷一起…

Mimics 21安装

Mimics 21破解版是一款非常专业的交互式的医学影像控制系统,全称为“Materialises interactive medical image control system”,是全球领先的致力于快速成型领域的开发与研究的Materialise公司发明的一种医学影像控制系统,是模块化结构的软件…

Mimics 21.0软件学习笔记(一)基本操作

Materialise Mimics Medical 21.0 打开工程窗口化Thresholding 二值化Region growing区域增长创建一个3D表示,并显示股骨头从髋臼中区分出来,并单独为股骨头建立3D模型 CT图像的形态学操作STL过程 STL 生成 打开工程 Opening the Project窗口化 Windowin…

Mimics-基础操作教程-2

本篇主要介绍 MIMICS 中的 VIEW 菜单各项功能。 Camera 各选项功能介绍 1. Rotate选项: 旋转功能 Rotate 旋转功能只能在三维模型视图上使用,有以下几种不同的方法来实现旋转功能。 a. 使用用鼠标右键拖动; b. 使用箭头键进行精确旋转…

Mimics-基础操作教程-1

1. Mimics简介(Mimics21) Mimics---是一种交互式医学图像处理软件,是一个连接二维图像数据(CT, MRI, 工业扫描数据)和三维工程学应用的图像处理工具。应用领域包括:解剖学测量、三维分析、有限元分析(FEA)、客制化植入体或装置设…

dcl java_Java内存模型之从JVM角度分析DCL

DCL,即Double Check Lock,中卫双重检查锁定。其实DCL很多人在单例模式中用过,LZ面试人的时候也要他们写过,但是有很多人都会写错。他们为什么会写错呢?其错误根源在哪里?有什么解决方案?下面就随…

java dcl_【Java并发(七)】--java内存模型之DCL

如未作特殊说明,本文均为原创,转发请注明出处 [TOC] 前言 ​ 提到双重检查锁(Double-Checked Locking)通常简称为DCL,肯定很多人第一时间想到的就是单例模式。 ​ 单例模式通常有两种方式:饿汉与懒汉模式。那么懒汉模式采用了延迟…

java 单例 dcl_DCL单例到底需不需要volatile修饰

我们先来回答这个问题 DCL单例一定需要volatile修饰。 volatile有两个功能 内存可见 防止指令冲排序 这里主要考察volatile第二个功能。在介绍为什么DCL单例一定需要volatile修饰之前,我们先来看一下DCL单例和类实例化过程。 DCL单例 public class SingleExample { …

DCL命令

目录 1.MySQL中的权限 2.命令 2.1grant:赋权,给某个用户授予指定的权限。 2.2revoke:移除指定用户的权限 1.MySQL中的权限 角色:拥有某一类权限的用户的统称。比如:超级管理员,root;普通用户…

DCL单例模式

一、对象的创建过程 视频教程 对象的创建过程: 创建->初始化->建立连接 1.先申请内存,赋值默认值0 2.构造方法赋值初始值,8 3.建立连接,t->T 二、DCL单例 我们第一次写的单例模式是下面这样的: public cla…

MySQL之DCL

DCL (Data Control Language) DDL: create / alter / dropDML:insert /update/deleteDQL :select /showDCL :grant /revoke ​ 我们现在默认使用的都是 root 用户,超级管理员,拥有全部的权限。但是&#…

DCL

管理用户、授权 1.管理用户 1.添加用户 2.删除用户 3.修改用户 4.查找用户 注意: 通配符% 表示可以在任意主机使用用户登录数据库 2.权限管理 1.查询权限 2.授予权限 3.撤销权限

Java ~ 双重检查锁(DCL)的原理与失效原因

前言 为了保证线程的安全性,往往要以牺牲性能为代价。为了兼得二者,前人进行了多番尝试,也确实创造出诸多有效方案,双重检查锁就是其中的一种。 DCL:Double Check Lock(双重检查锁)。令人哭笑不…

DCL——数据控制语言

DCL全称是Data Control Language,即数据控制语言,主要是用来管理数据库用户,控制数据库的访问权限。 1、查询用户 use mysql; select * from user; 在MySQL中 用户的信息和具有的权限的信息 都是存放在系统数据库mysql中的user表中。 ※ho…