Transform text to word embeddings

textEmbed()

Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe.

textEmbedLayersOutput()

Extract layers of hidden states (word embeddings) for all character variables in a given dataframe.

textEmbedLayerAggregation()

Select and aggregate layers of hidden states to form a word embeddings.

textEmbedStatic()

Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variables

Train word embeddings

textTrain()

Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.

textTrainLists()

Individually trains word embeddings from several text variables to several numeric or categorical variables. It is possible to have word embeddings from one text variable and several numeric/categprical variables; or vice verse, word embeddings from several text variables to one numeric/categorical variable. It is not possible to mix numeric and categorical variables.

textTrainRegression()

Train word embeddings to a numeric variable.

textTrainRandomForest()

Train word embeddings to a categorical variable using random forrest.

textPredict()

Predict scores or classification from, e.g., textTrain.

textPredictTest()

Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.

Semantic Similairties

textSimilarity()

Compute the cosine semantic similarity between two text variables.

textSimilarityNorm()

Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct).

textSimilarityTest()

Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

Plot words in the word embedding space

textProjection()

Compute Supervised Dimension Projection and related variables for plotting words.

textProjectionPlot()

Plot words according to Supervised Dimension Projection.

textCentrality()

Compute cosine semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

textCentralityPlot()

Plot words according to cosine semantic similarity to the aggregated word embedding.

textPCA()

Compute 2 PCA dimensions of the word embeddings for individual words.

textPCAPlot()

Plot words according to 2-D plot from 2 PCA components.

Example Data

Language_based_assessment_data_8

Text and numeric data for 10 participants.

wordembeddings4

Wordembeddings for 4 text variables for 40 participants

Language_based_assessment_data_3_100

Example text and numeric data.

DP_projections_HILS_SWLS_100

Data for plotting a Dot Product Projection Plot.

centrality_data_harmony

Example data for plotting a Semantic Centrality Plot.

PC_projections_satisfactionwords_40

Example data for plotting a Principle Component Projection Plot.