Documentation

General Word Shift

class shifterator.shifterator.Shift(type2freq_1, type2freq_2, type2score_1=None, type2score_2=None, reference_value=None, handle_missing_scores='error', stop_lens=None, stop_words=None, normalization='variation')[source]

Shift object for calculating weighted scores of two systems of types, and the shift between them

Parameters:
  • type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
  • type2score_1, type2score_2 (dict or str, optional) – If dict, types are keys and values are scores associated with each type. If str, the name of a score lexicon included in Shifterator. If None and other type2score is None, defaults to uniform scores across types. Otherwise defaults to the other type2score dict
  • reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. If None and a lexicon is selected for type2score, uses the respective middle point in that lexicon’s scale. Otherwise if None, uses zero as the reference point
  • handle_missing_scores (str, optional) – If ‘error’, throws an error whenever a word has a score in one score dictionary but not the other. If ‘exclude’, excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If ‘adopt’ and the score is missing in one dictionary, then uses the score from the other dictionary if it is available
  • stop_lens (iterable of 2-tuples, optional) – Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations
  • stop_words (set, optional) – Denotes words that should be excluded from calculation of word shifts
  • normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
get_shift_component_sums()[source]

Calculates the cumulative contribution of each component of the different kinds of shift scores.

Returns:
  • Dictionary with six keys, one for each of the different component
  • contributions (pos_s_pos_p, pos_s_neg_p, neg_s_pos_p, neg_s_neg_p,)
  • pos_s, neg_s. Values are the total contribution from that component
  • across all types
get_shift_graph(ax=None, top_n=50, text_size_inset=True, cumulative_inset=True, show_plot=True, filename=None, **kwargs)[source]

Plot the shift graph between two systems of types

Parameters:
  • ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
  • top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
  • cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
  • text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
  • show_plot (bool, optional) – Whether to show plot when it is done being rendered
  • filename (str, optional) – If not None, name of the file for saving the shift graph
Returns:

Matplotlib ax of shift graph. Displays shift graph if show_plot=True

Return type:

ax

get_shift_scores(details=False)[source]

Calculates the type shift scores between the two systems

Parameters:details (boolean) – If true, returns each of the major components of each type’s shift score, along with the overall shift scores. Otherwise, only returns the overall shift scores
Returns:
  • type2p_diff (dict) – If details is True, returns dict where keys are types and values are the difference in relatively frequency, i.e. p_i,2 - p_i,1 for type i
  • type2s_diff (dict,) – If details is True, returns dict where keys are types and values are the relative differences in score, i.e. s_i,2 - s_i,1 for type i
  • type2p_avg (dict,) – If details is True, returns dict where keys are types and values are the average relative frequencies, i.e. 0.5*(p_i,1+p_i,2) for type i
  • type2s_ref_diff (dict) – If details is True, returns dict where keys are types and values are relative deviation from reference score, i.e. 0.5*(s_i,2+s_i,1)-s_ref for type i
  • type2shift_score (dict) – Keys are types and values are shift scores. The overall shift scores are normalized according to the normalization parameter of the Shift object
get_weighted_score(type2freq, type2score)[source]

Calculate an average score according to a set of frequencies and scores

Parameters:
  • type2freq (dict) – Keys are types and values are frequencies
  • type2score (dict) – Keys are types and values are scores
Returns:

s_avg – Average weighted score of system

Return type:

float

Word Shift Implementations

class shifterator.shifts.EntropyShift(type2freq_1, type2freq_2, base=2, alpha=1, reference_value=0, normalization='variation')[source]

Shift object for calculating the shift in entropy between two systems

Parameters:
  • type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
  • base (float, optional) – Base of the logarithm for calculating entropy
  • alpha (float, optional) – The parameter for the generalized Tsallis entropy. Setting alpha=1 recovers the Shannon entropy. Higher alpha emphasizes more common types, lower alpha emphasizes less common types For details: https://en.wikipedia.org/wiki/Tsallis_entropy
  • reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. Otherwise, uses zero as the reference point
  • normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
get_shift_graph(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, filename=None, **kwargs)[source]

Plot the shift graph between two systems of types

Parameters:
  • ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
  • top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
  • cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
  • text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
  • show_plot (bool, optional) – Whether to show plot when it is done being rendered
  • filename (str, optional) – If not None, name of the file for saving the shift graph
Returns:

Matplotlib ax of shift graph. Displays shift graph if show_plot=True

Return type:

ax

class shifterator.shifts.JSDivergenceShift(type2freq_1, type2freq_2, base=2, weight_1=0.5, weight_2=0.5, alpha=1, reference_value=0, normalization='variation')[source]

Shift object for calculating the Jensen-Shannon divergence (JSD) between two systems

Parameters:
  • type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
  • weight_1, weight_2 (float) – Relative weights of type2freq_1 and type2frq_2 when constructing their mixed distribution. Should sum to 1
  • base (float, optional) – Base of the logarithm for calculating entropy
  • alpha (float, optional) – The parameter for the generalized Tsallis entropy. Setting alpha=1 recovers the Shannon entropy. Higher alpha emphasizes more common types, lower alpha emphasizes less common types For details: https://en.wikipedia.org/wiki/Tsallis_entropy
  • reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. Defaults to zero as the reference point
  • normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
get_shift_graph(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]

Plot the shift graph between two systems of types

Parameters:
  • ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
  • top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
  • cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
  • text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
  • show_plot (bool, optional) – Whether to show plot when it is done being rendered
  • filename (str, optional) – If not None, name of the file for saving the shift graph
Returns:

Matplotlib ax of shift graph. Displays shift graph if show_plot=True

Return type:

ax

class shifterator.shifts.KLDivergenceShift(type2freq_1, type2freq_2, base=2, reference_value=0, normalization='variation')[source]

Shift object for calculating the Kullback-Leibler divergence (KLD) between two systems

Parameters:
  • type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types. The KLD will be computed with respect type2freq_1, i.e. D(T2 || T1). For the KLD to be well defined, all types must have nonzero frequencies in both type2freq_1 and type2_freq2
  • base (float, optional) – Base of the logarithm for calculating entropy
  • stop_lens (iterable of 2-tuples, optional) – Denotes intervals that should be excluded when calculating shift scores
  • normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
get_shift_graph(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]

Plot the shift graph between two systems of types

Parameters:
  • ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
  • top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
  • cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
  • text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
  • show_plot (bool, optional) – Whether to show plot when it is done being rendered
  • filename (str, optional) – If not None, name of the file for saving the shift graph
Returns:

Matplotlib ax of shift graph. Displays shift graph if show_plot=True

Return type:

ax

class shifterator.shifts.ProportionShift(type2freq_1, type2freq_2)[source]

Shift object for calculating differences in proportions of types across two systems

Parameters:type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
get_shift_graph(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]

Plot the shift graph between two systems of types

Parameters:
  • ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
  • top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
  • cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
  • text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
  • show_plot (bool, optional) – Whether to show plot when it is done being rendered
  • filename (str, optional) – If not None, name of the file for saving the shift graph
Returns:

Matplotlib ax of shift graph. Displays shift graph if show_plot=True

Return type:

ax

class shifterator.shifts.WeightedAvgShift(type2freq_1, type2freq_2, type2score_1=None, type2score_2=None, reference_value=None, handle_missing_scores='error', stop_lens=None, stop_words={}, normalization='variation')[source]

Shift object for calculating weighted scores of two systems of types, and the shift between them

Parameters:
  • type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
  • type2score_1, type2score_2 (dict or str, optional) – If dict, types are keys and values are scores associated with each type. If str, the name of a score lexicon included in Shifterator. If None and other type2score is None, defaults to uniform scores across types. Otherwise defaults to the other type2score dict
  • reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. If None and a lexicon is selected for type2score, uses the respective middle point in that lexicon’s scale. Otherwise if None, uses zero as the reference point
  • handle_missing_scores (str, optional) – If ‘error’, throws an error whenever a word has a score in one score dictionary but not the other. If ‘exclude’, excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If ‘adopt’ and the score is missing in one dictionary, then uses the score from the other dictionary if it is available
  • stop_lens (iterable of 2-tuples, optional) – Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations
  • stop_words (set, optional) – Denotes words that should be excluded from word shifts calculations
  • normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified