From a8d481df6fa1a3571ce3e7dca9c09a0776066050 Mon Sep 17 00:00:00 2001 From: meghdadFar Date: Fri, 5 Apr 2024 15:16:23 +0200 Subject: [PATCH 1/5] Update Text Analysis --- README.rst | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/README.rst b/README.rst index 1660dca..9f23522 100644 --- a/README.rst +++ b/README.rst @@ -10,8 +10,12 @@ Wordview ######## -Wordview is a Python package for Exploratory Data Analysis of text and provides many statistics about your data in the form of plots, tables, and descriptions allowing you to have both a high-level and detailed overview of your data. -It has functions to analyze explicit text elements such as words, n-grams, POS tags, and multi-word expressions, as well as implicit elements such as clusters, anomalies, and biases. Full documentation is available at `Wordview’s documentation page `__. +Wordview is a Python package for Exploratory Data Analysis of text and provides +many statistics about your data in the form of plots, tables, and descriptions +allowing you to have both a high-level and detailed overview of your data. +It has functions to analyze explicit text elements such as words, n-grams, POS tags, +and multi-word expressions, as well as implicit elements such as clusters, anomalies, and biases. +Full documentation is available at `Wordview’s documentation page `__. .. image:: sphinx-docs/figs/cover.png :alt: Wordview Cover @@ -25,16 +29,19 @@ Install the package via ``pip``: ``pip install wordview`` -To explore various features and functionalities, consult the documentation pages. The following sections -present a high-level description of Wordview's features and functionalities. For details, tutorials and worked examples, corresponding -documentation pages are linked in each section. +The following sections present a high-level description of Wordview's features and functionalities. +For details, usage, tutorials, and worked examples see +the `documentation page `__. Text Analysis ************* -Using this feature, you can gain a comprehensive overview of your text data in terms of various statistics, plots, and distributions. -It enables a rapid understanding of the underlying patterns present in your dataset. -By visually representing the data's nuances, this feature can aid in making informed decisions for downstream applications. -It's a step forward in ensuring that you have a grasp on the intricacies of your data before delving deeper into more complex tasks. +Using this feature, you can gain a comprehensive overview of your text data in terms of various statistics, +plots, and distributions. It enables a rapid understanding of the underlying patterns present in your dataset.  +You can see, for instance, what languages were used in your corpus, the average document lengths +(in terms of tokens), how many documents and words are in your corpus, various part-of-speech tags, and more. +You can also look at different distributions, plots, and word clouds to gain valuable insights into your corpus. +Worldview uses Plotly interactive plots, with many intriguing features such as zooming, +panning, selection, hovering, and screenshots. .. image:: sphinx-docs/figs/textanalysiscover.png :alt: Text Analysis Cover From 4341bba1d01548533501ff2a92bfb3c89b945ad0 Mon Sep 17 00:00:00 2001 From: meghdadFar Date: Sun, 7 Apr 2024 10:12:34 +0200 Subject: [PATCH 2/5] Update labels --- README.rst | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/README.rst b/README.rst index 9f23522..b7a741d 100644 --- a/README.rst +++ b/README.rst @@ -39,7 +39,7 @@ Using this feature, you can gain a comprehensive overview of your text data in t plots, and distributions. It enables a rapid understanding of the underlying patterns present in your dataset.  You can see, for instance, what languages were used in your corpus, the average document lengths (in terms of tokens), how many documents and words are in your corpus, various part-of-speech tags, and more. -You can also look at different distributions, plots, and word clouds to gain valuable insights into your corpus. +You can also look at different distributions, plots, and word clouds to gain valuable insights into your text corpus. Worldview uses Plotly interactive plots, with many intriguing features such as zooming, panning, selection, hovering, and screenshots. @@ -50,12 +50,10 @@ panning, selection, hovering, and screenshots. Analysis of Labels ****************** -In the realm of Natural Language Processing (NLP), the proper analysis and understanding of labels within datasets can provide valuable insights, ensuring that models are trained on balanced and representative data. -Recognizing this, Wordview is engineered to compute an array of statistics tailored for labeled datasets. -These statistics cater to both document and sequence levels, providing a holistic view of the dataset's structure. -By diving deep into the intricacies of the labels, Wordview offers an enriched perspective, helping researchers and practitioners identify -potential biases, discrepancies, or areas of interest, -which are essential for creating robust and effective models. +In NLP, the proper analysis and understanding of labels within datasets can provide valuable insights for some of downstream tasks, +ensuring that models are trained on balanced and representative set of labels. +Wordview calculates an array of statistics tailored for labeled datasets. It provides a comprehensive overview of the distribution of labels, +the frequency of each label, and the distribution of labels across different categories. .. image:: sphinx-docs/figs/labels_peach.png :width: 100% From c467c875789954d3c2167a1399933101f09fd6f7 Mon Sep 17 00:00:00 2001 From: meghdadFar Date: Sun, 7 Apr 2024 10:14:55 +0200 Subject: [PATCH 3/5] Update MWEs section --- README.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index b7a741d..251d42e 100644 --- a/README.rst +++ b/README.rst @@ -61,10 +61,9 @@ the frequency of each label, and the distribution of labels across different cat Extraction & Analysis of Multiword Expressions ********************************************** -Multiword Expressions (MWEs) are phrases that can be treated as a single -semantic unit. E.g. *swimming pool* and *climate change*. MWEs have -application in different areas including: parsing, language models, -language generation, terminology extraction, and topic models. Wordview can extract different types of MWEs from text. +Multiword Expressions (MWEs) are phrases that can be treated as a single semantic unit, e.g., *swimming pool* and *climate change*. They can offer great insights into natural language data and have many practical applications, including machine translation, topic modeling, named entity recognition, terminology extraction, profanity detection, and more. +At the high level, we define MWEs as phrases whose components co-occur more than expected by chance and identify MWEs using precisely this property, which is modeled by statistical association measures such as PMI, and NPMI. +Wordview's MWE features is one of the most powerful, comprehensive, and easy-to-use tools that are available for the extraction of MWEs. .. raw:: html From f6820a69104b6be5a95dd99784f26febfc7aafdf Mon Sep 17 00:00:00 2001 From: meghdadFar Date: Sun, 7 Apr 2024 10:15:54 +0200 Subject: [PATCH 4/5] Update bias section --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 251d42e..f3e5ee2 100644 --- a/README.rst +++ b/README.rst @@ -73,9 +73,9 @@ Wordview's MWE features is one of the most powerful, comprehensive, and easy-to- Bias Analysis ************** -In the rapidly evolving realm of Natural Language Processing (NLP), downstream models are as unbiased and fair as the data on which they are trained. -Wordview Bias Analysis module is designed to assist in the rigorous task of ensuring that underlying training datasets are devoid of explicit negative biases related to categories such as gender, race, and religion. -By identifying and rectifying these biases, Wordview attempts to pave the way for the creation of more inclusive, fair, and unbiased NLP applications, leading to better user experiences and more equitable technology. +In the rapidly evolving realm of Natural Language Processing (NLP), downstream models can be as fair and unbiased as the data on which they are trained. Wordview's bias analysis module is designed to help ensure that underlying training datasets are devoid of explicit negative biases related to categories such as gender, race, and religion. +By identifying and rectifying these biases, Wordview attempts to help with the creation of more inclusive, fair, and unbiased NLP applications. +Bias analysis is currently based on sentiment-analysis and a predefined set of categories, but we are working hard to extend it and make it better in many ways. .. raw:: html From 159c72965cf27d073fd9d9d1826a3e6dbf30fe86 Mon Sep 17 00:00:00 2001 From: meghdadFar Date: Sun, 7 Apr 2024 10:16:45 +0200 Subject: [PATCH 5/5] Update Contributing section --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index f3e5ee2..dccf6e2 100644 --- a/README.rst +++ b/README.rst @@ -116,6 +116,6 @@ Wordview offers a number of utility functions that you can use for common pre an Contributing ############ -Thank you for contributing to wordview! We and the users of this repo -appreciate your efforts! You can visit the `contributing page `__ for detailed instructions about how you can contribute to Wordview. +We are just getting started with Wordview and are looking to make Wordview a go-to solution for anyone who loves NLP and knows and appreciates the actual value of data and data analysis. But that requires help from the community. So, we are looking forward to seeing you join Wordview as a collaborator. +You can visit the `contributing page `__ for detailed instructions about how you can contribute to Wordview.