BERT相关论文合集

技术
   自从2018年10月google开源BERT预训练模型以来,极大地推动了AI领域的发展,尤其在NLP领域,围绕BERT的论文层出不穷,下面按**Downstream task****Generation****Modification**  **(multi-task** , **masking strategy** , etc.)、**Transformer variants****Probe****Inside BERT****Multi-lingual** (多语言)、**Other than English models** (英语以外的模型)、**Domain specific** (特定领域)、**Multi-modal** (多模态)、**Model compression** (模型压缩)、**Misc** (其他混杂论文)进行论文分享。

Table of Contents

  • Downstream task
  • Generation
  • Modification (multi-task, masking strategy, etc.)
  • Transformer variants
  • Probe
  • Inside BERT
  • Multi-lingual
  • Other than English models
  • Domain specific
  • Multi-modal
  • Model compression
  • Misc.

1、Downstream task

1.1、QA, MC, Dialogue

  • A BERT Baseline for the Natural Questions
  • MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
  • Unsupervised Domain Adaptation on Reading Comprehension
  • BERTQA -- Attention on Steroids
  • A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
  • SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
  • Multi-hop Question Answering via Reasoning Chains
  • Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
  • Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
  • End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
  • Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
  • Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
  • Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
  • Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
  • Unsupervised Question Answering by Cloze Translation (ACL2019)
  • Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
  • A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
  • Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
  • Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
  • Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
  • SG-Net: Syntax-Guided Machine Reading Comprehension
  • MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
  • Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
  • ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
  • Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
  • BAS: An Answer Selection Method Using BERT Language Model
  • Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
  • A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
  • FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
  • BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
  • GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
  • Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
  • XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
  • Cross-Lingual Machine Reading Comprehension (EMNLP2019)
  • Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
  • Multilingual Question Answering from Formatted Text applied to Conversational Agents
  • BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
  • MLQA: Evaluating Cross-lingual Extractive Question Answering
  • Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
  • SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
  • Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
  • BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
  • Dialog State Tracking: A Neural Reading Comprehension Approach
  • A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)
  • Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
  • Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
  • Domain Adaptive Training BERT for Response Selection
  • BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

1.2、Slot filling

  • BERT for Joint Intent Classification and Slot Filling
  • Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
  • A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)

1.3、Analysis

  • Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
  • Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
  • BERT-based Lexical Substitution (ACL2019)
  • Assessing BERT’s Syntactic Abilities
  • Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
  • Simple BERT Models for Relation Extraction and Semantic Role Labeling
  • LIMIT-BERT : Linguistic Informed Multi-Task BERT
  • A Simple BERT-Based Approach for Lexical Simplification
  • Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
  • Towards Minimal Supervision BERT-based Grammar Error Correction
  • BERT-Based Arabic Social Media Author Profiling
  • Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
  • Evaluating the Factual Consistency of Abstractive Text Summarization
  • NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
  • xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
  • TabFact: A Large-scale Dataset for Table-based Fact Verification
  • Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
  • LAMBERT: Layout-Aware language Modeling using BERT for information extraction
  • Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings (ECIR2020) [github]
  • Keyphrase Extraction with Span-based Feature Representations
  • What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

1.4、Word segmentation, parsing, NER

  • BERT Meets Chinese Word Segmentation
  • Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
  • Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
  • Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
  • NEZHA: Neural Contextualized Representation for Chinese Language Understanding
  • Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
  • Parsing as Pretraining (AAAI2020)
  • Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
  • Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
  • A Unified MRC Framework for Named Entity Recognition
  • Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
  • Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
  • LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
  • MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
  • Portuguese Named Entity Recognition using BERT-CRF
  • Towards Lingua Franca Named Entity Recognition with BERT

1.5、Pronoun/coreference resolution

  • Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
  • Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
  • Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
  • MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
  • Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
  • On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
  • Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
  • BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
  • Coreference Resolution with Entity Equalization (ACL2019)
  • BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
  • WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
  • Ellipsis and Coreference Resolution as Question Answering
  • Coreference Resolution as Query-based Span Prediction

1.6、Word sense disambiguation

  • GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
  • Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
  • Using BERT for Word Sense Disambiguation
  • Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)
  • Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings (KONVENS2019)

1.7、Sentiment analysis

  • Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
  • BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
  • Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
  • Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
  • An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
  • "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
  • Adversarial Training for Aspect-Based Sentiment Analysis with BERT
  • Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference

1.8、Relation extraction

  • Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
  • BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
  • Enriching Pre-trained Language Model with Entity Information for Relation Classification
  • Span-based Joint Entity and Relation Extraction with Transformer Pre-training
  • Fine-tune Bert for DocRED with Two-step Process
  • Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
  • Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

1.9、Knowledge base

  • KG-BERT: BERT for Knowledge Graph Completion
  • Language Models as Knowledge Bases? (EMNLP2019) [github]
  • BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
  • Inducing Relational Knowledge from BERT (AAAI2020)
  • Latent Relation Language Models (AAAI2020)
  • Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
  • Zero-shot Entity Linking with Dense Entity Retrieval
  • Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
  • Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
  • How Can We Know What Language Models Know?
  • REALM: Retrieval-Augmented Language Model Pre-Training

1.10、Text classification

  • How to Fine-Tune BERT for Text Classification?
  • X-BERT: eXtreme Multi-label Text Classification with BERT
  • DocBERT: BERT for Document Classification
  • Enriching BERT with Knowledge Graph Embeddings for Document Classification
  • Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
  • BERT for Evidence Retrieval and Claim Verification
  • Conditional BERT Contextual Augmentation
  • Stacked DeBERT: All Attention in Incomplete Data for Text Classification

1.11、WSC, WNLI, NLI

  • Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
  • A Surprisingly Robust Trick for the Winograd Schema Challenge
  • WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)
  • Improving Natural Language Inference with a Pretrained Parser
  • Adversarial NLI: A New Benchmark for Natural Language Understanding
  • Adversarial Analysis of Natural Language Inference Systems (ICSC2020)
  • HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference (LREC2020)
  • Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)

1.12、Commonsense

  • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)
  • HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
  • Story Ending Prediction by Transferable BERT (IJCAI2019)
  • Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
  • Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
  • Informing Unsupervised Pretraining with External Linguistic Knowledge
  • Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
  • BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
  • Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)
  • KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP2019)
  • Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
  • Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)
  • PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)
  • Evaluating Commonsense in Pre-trained Language Models (AAAI2020)
  • Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
  • Do Neural Language Representations Learn Physical Commonsense? (CogSci2019)

1.13、Extractive summarization

  • HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
  • Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
  • Discourse-Aware Neural Extractive Model for Text Summarization

1.14、IR

  • Passage Re-ranking with BERT
  • Investigating the Successes and Failures of BERT for Passage Re-Ranking
  • Understanding the Behaviors of BERT in Ranking
  • Document Expansion by Query Prediction
  • CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
  • Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
  • FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
  • Multi-Stage Document Ranking with BERT

2、Generation

  • BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
  • Pretraining-Based Natural Language Generation for Text Summarization
  • Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
  • Multi-stage Pretraining for Abstractive Summarization
  • PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
  • MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
  • Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)
  • UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
  • Towards Making the Most of BERT in Neural Machine Translation
  • Improving Neural Machine Translation with Pre-trained Representation
  • On the use of BERT for Neural Machine Translation (EMNLP2019 WS)
  • Incorporating BERT into Neural Machine Translation (ICLR2020)
  • Recycling a Pre-trained BERT Encoder for Neural Machine Translation
  • Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
  • Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  • ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
  • Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
  • Multilingual Denoising Pre-training for Neural Machine Translation
  • PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
  • Unsupervised Pre-training for Natural Language Generation: A Literature Review

3、Modification (multi-task, masking strategy, etc.)

  • Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
  • The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
  • BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
  • Unifying Question Answering and Text Classification via Span Extraction
  • ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
  • ERNIE: Enhanced Representation through Knowledge Integration
  • ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
  • Pre-Training with Whole Word Masking for Chinese BERT
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans [github]
  • Blank Language Models
  • Efficient Training of BERT by Progressively Stacking (ICML2019) [github]
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020)
  • FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
  • KERMIT: Generative Insertion-Based Modeling for Sequences
  • DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
  • Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
  • SenseBERT: Driving Some Sense into BERT
  • Semantics-aware BERT for Language Understanding (AAAI2020)
  • K-BERT: Enabling Language Representation with Knowledge Graph
  • Knowledge Enhanced Contextual Word Representations (EMNLP2019)
  • KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
  • SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
  • Universal Text Representation from BERT: An Empirical Study
  • Symmetric Regularization based BERT for Pair-wise Semantic Reasoning
  • Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
  • Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
  • SesameBERT: Attention for Anywhere
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [github]
  • SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

4、Transformer variants

  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
  • Compressive Transformers for Long-Range Sequence Modelling
  • The Evolved Transformer (ICML2019)
  • Reformer: The Efficient Transformer (ICLR2020) [github]
  • GRET: Global Representation Enhanced Transformer (AAAI2020)
  • Transformer on a Diet [github]

5、Probe

  • A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
  • Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
  • Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
  • BERT Rediscovers the Classical NLP Pipeline (ACL2019)
  • Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
  • Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
  • What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
  • Quantity doesn't buy quality syntax with neural language models (EMNLP2019)
  • Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)
  • oLMpics -- On what Language Model Pre-training Captures
  • How Much Knowledge Can You Pack Into the Parameters of a Language Model?
  • What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

6、Inside BERT

  • What does BERT learn about the structure of language? (ACL2019)
  • Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
  • Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
  • Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
  • What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
  • Do Attention Heads in BERT Track Syntactic Dependencies?
  • Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
  • Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
  • A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
  • Visualizing and Measuring the Geometry of BERT
  • How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
  • Are Sixteen Heads Really Better than One? (NeurIPS2019)
  • On the Validity of Self-Attention as Explanation in Transformer Models
  • Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
  • Attention Interpretability Across NLP Tasks
  • Revealing the Dark Secrets of BERT (EMNLP2019)
  • Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
  • The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
  • A Primer in BERTology: What we know about how BERT works
  • Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
  • How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
  • Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
  • What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
  • exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]

7、Multi-lingual

  • Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
  • Language Model Pretraining (NeurIPS2019) [github]
  • 75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
  • Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)
  • Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
  • How multilingual is Multilingual BERT? (ACL2019)
  • How Language-Neutral is Multilingual BERT?
  • Is Multilingual BERT Fluent in Language Generation?
  • Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (EMNLP2019)
  • BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)
  • Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)
  • Multilingual Alignment of Contextual Word Representations (ICLR2020)
  • On the Cross-lingual Transferability of Monolingual Representations
  • Unsupervised Cross-lingual Representation Learning at Scale
  • Emerging Cross-lingual Structure in Pretrained Language Models
  • Can Monolingual Pretrained Models Help Cross-Lingual Classification?
  • Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)

8、Other than English models

  • CamemBERT: a Tasty French Language Model
  • FlauBERT: Unsupervised Language Model Pre-training for French
  • Multilingual is not enough: BERT for Finnish
  • BERTje: A Dutch BERT Model
  • RobBERT: a Dutch RoBERTa-based Language Model
  • Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
  • AraBERT: Transformer-based Model for Arabic Language Understanding
  • PhoBERT: Pre-trained language models for Vietnamese
  • CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

9、Domain specific

  • BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  • Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
  • BERT-based Ranking for Biomedical Entity Normalization
  • PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
  • Pre-trained Language Model for Biomedical Question Answering
  • How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
  • ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
  • Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
  • Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT
  • SciBERT: Pretrained Contextualized Embeddings for Scientific Text [github]
  • PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model

10、Multi-modal

  • VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
  • VisualBERT: A Simple and Performant Baseline for Vision and Language
  • Selfie: Self-supervised Pretraining for Image Embedding
  • ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
  • Contrastive Bidirectional Transformer for Temporal Representation Learning
  • M-BERT: Injecting Multimodal Information in the BERT Structure
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
  • Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
  • BERT representations for Video Question Answering (WACV2020)
  • Unified Vision-Language Pre-Training for Image Captioning and VQA [github]
  • Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
  • UNITER: Learning UNiversal Image-TExt Representations
  • Supervised Multimodal Bitransformers for Classifying Images and Text
  • Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
  • BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
  • BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
  • SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
  • vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
  • Effectiveness of self-supervised pre-training for speech recognition
  • Understanding Semantics from Speech Through Pre-training
  • Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

11、Model compression

  • Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
  • Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
  • Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
  • Pruning a BERT-based Question Answering Model
  • TinyBERT: Distilling BERT for Natural Language Understanding [github]
  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
  • Knowledge Distillation from Internal Representations (AAAI2020)
  • PoWER-BERT: Accelerating BERT inference for Classification Tasks
  • WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
  • Extreme Language Model Compression with Optimal Subwords and Shared Projections
  • BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
  • Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
  • Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
  • MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
  • Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)

12、Misc.

  • jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [github]

  • Cloze-driven Pretraining of Self-attention Networks

  • Learning and Evaluating General Linguistic Intelligence

  • To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)

  • BERTScore: Evaluating Text Generation with BERT (ICLR2020)

  • Machine Translation Evaluation with BERT Regressor

  • SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)

  • Learning to Speak and Act in a Fantasy Text Adventure Game (EMNLP2019)

  • Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)

  • Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)

  • A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)

  • Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)

  • Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)

  • Graph-Bert: Only Attention is Needed for Learning Graph Representations

  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages

  • Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

  • Extending Machine Language Models toward Human-Level Language Understanding

  • Glyce: Glyph-vectors for Chinese Character Representations

  • Back to the Future -- Sequential Alignment of Text Representations

  • Improving Cuneiform Language Identification with BERT (NAACL2019 WS)

  • BERT has a Moral Compass: Improvements of ethical and moral values of machines

  • SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)

  • On the comparability of Pre-trained Language Models

  • Transformers: State-of-the-art Natural Language Processing

  • Evolution of transfer learning in natural language processing

0
0
0
0
关于作者
关于作者

文章

0

获赞

0

收藏

0

相关资源
如何利用云原生构建 AIGC 业务基石
AIGC即AI Generated Content,是指利用人工智能技术来生成内容,AIGC也被认为是继UGC、PGC之后的新型内容生产方式,AI绘画、AI写作等都属于AIGC的分支。而 AIGC 业务的部署也面临着异构资源管理、机器学习流程管理等问题,本次分享将和大家分享如何使用云原生技术构建 AIGC 业务。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论