Documentation¶
General Word Shift¶
-
class
shifterator.shifterator.
Shift
(type2freq_1, type2freq_2, type2score_1=None, type2score_2=None, reference_value=None, handle_missing_scores='error', stop_lens=None, stop_words=None, normalization='variation')[source]¶ Shift object for calculating weighted scores of two systems of types, and the shift between them
Parameters: - type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
- type2score_1, type2score_2 (dict or str, optional) – If dict, types are keys and values are scores associated with each type. If str, the name of a score lexicon included in Shifterator. If None and other type2score is None, defaults to uniform scores across types. Otherwise defaults to the other type2score dict
- reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. If None and a lexicon is selected for type2score, uses the respective middle point in that lexicon’s scale. Otherwise if None, uses zero as the reference point
- handle_missing_scores (str, optional) – If ‘error’, throws an error whenever a word has a score in one score dictionary but not the other. If ‘exclude’, excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If ‘adopt’ and the score is missing in one dictionary, then uses the score from the other dictionary if it is available
- stop_lens (iterable of 2-tuples, optional) – Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations
- stop_words (set, optional) – Denotes words that should be excluded from calculation of word shifts
- normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
-
get_shift_component_sums
()[source]¶ Calculates the cumulative contribution of each component of the different kinds of shift scores.
Returns: - Dictionary with six keys, one for each of the different component
- contributions (pos_s_pos_p, pos_s_neg_p, neg_s_pos_p, neg_s_neg_p,)
- pos_s, neg_s. Values are the total contribution from that component
- across all types
-
get_shift_graph
(ax=None, top_n=50, text_size_inset=True, cumulative_inset=True, show_plot=True, filename=None, **kwargs)[source]¶ Plot the shift graph between two systems of types
Parameters: - ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
- top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
- cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
- text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
- show_plot (bool, optional) – Whether to show plot when it is done being rendered
- filename (str, optional) – If not None, name of the file for saving the shift graph
Returns: Matplotlib ax of shift graph. Displays shift graph if show_plot=True
Return type: ax
-
get_shift_scores
(details=False)[source]¶ Calculates the type shift scores between the two systems
Parameters: details (boolean) – If true, returns each of the major components of each type’s shift score, along with the overall shift scores. Otherwise, only returns the overall shift scores Returns: - type2p_diff (dict) – If details is True, returns dict where keys are types and values are the difference in relatively frequency, i.e. p_i,2 - p_i,1 for type i
- type2s_diff (dict,) – If details is True, returns dict where keys are types and values are the relative differences in score, i.e. s_i,2 - s_i,1 for type i
- type2p_avg (dict,) – If details is True, returns dict where keys are types and values are the average relative frequencies, i.e. 0.5*(p_i,1+p_i,2) for type i
- type2s_ref_diff (dict) – If details is True, returns dict where keys are types and values are relative deviation from reference score, i.e. 0.5*(s_i,2+s_i,1)-s_ref for type i
- type2shift_score (dict) – Keys are types and values are shift scores. The overall shift scores are normalized according to the normalization parameter of the Shift object
-
get_weighted_score
(type2freq, type2score)[source]¶ Calculate an average score according to a set of frequencies and scores
Parameters: - type2freq (dict) – Keys are types and values are frequencies
- type2score (dict) – Keys are types and values are scores
Returns: s_avg – Average weighted score of system
Return type: float
Word Shift Implementations¶
-
class
shifterator.shifts.
EntropyShift
(type2freq_1, type2freq_2, base=2, alpha=1, reference_value=0, normalization='variation')[source]¶ Shift object for calculating the shift in entropy between two systems
Parameters: - type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
- base (float, optional) – Base of the logarithm for calculating entropy
- alpha (float, optional) – The parameter for the generalized Tsallis entropy. Setting alpha=1 recovers the Shannon entropy. Higher alpha emphasizes more common types, lower alpha emphasizes less common types For details: https://en.wikipedia.org/wiki/Tsallis_entropy
- reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. Otherwise, uses zero as the reference point
- normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
-
get_shift_graph
(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, filename=None, **kwargs)[source]¶ Plot the shift graph between two systems of types
Parameters: - ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
- top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
- cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
- text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
- show_plot (bool, optional) – Whether to show plot when it is done being rendered
- filename (str, optional) – If not None, name of the file for saving the shift graph
Returns: Matplotlib ax of shift graph. Displays shift graph if show_plot=True
Return type: ax
-
class
shifterator.shifts.
JSDivergenceShift
(type2freq_1, type2freq_2, base=2, weight_1=0.5, weight_2=0.5, alpha=1, reference_value=0, normalization='variation')[source]¶ Shift object for calculating the Jensen-Shannon divergence (JSD) between two systems
Parameters: - type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
- weight_1, weight_2 (float) – Relative weights of type2freq_1 and type2frq_2 when constructing their mixed distribution. Should sum to 1
- base (float, optional) – Base of the logarithm for calculating entropy
- alpha (float, optional) – The parameter for the generalized Tsallis entropy. Setting alpha=1 recovers the Shannon entropy. Higher alpha emphasizes more common types, lower alpha emphasizes less common types For details: https://en.wikipedia.org/wiki/Tsallis_entropy
- reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. Defaults to zero as the reference point
- normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
-
get_shift_graph
(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]¶ Plot the shift graph between two systems of types
Parameters: - ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
- top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
- cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
- text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
- show_plot (bool, optional) – Whether to show plot when it is done being rendered
- filename (str, optional) – If not None, name of the file for saving the shift graph
Returns: Matplotlib ax of shift graph. Displays shift graph if show_plot=True
Return type: ax
-
class
shifterator.shifts.
KLDivergenceShift
(type2freq_1, type2freq_2, base=2, reference_value=0, normalization='variation')[source]¶ Shift object for calculating the Kullback-Leibler divergence (KLD) between two systems
Parameters: - type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types. The KLD will be computed with respect type2freq_1, i.e. D(T2 || T1). For the KLD to be well defined, all types must have nonzero frequencies in both type2freq_1 and type2_freq2
- base (float, optional) – Base of the logarithm for calculating entropy
- stop_lens (iterable of 2-tuples, optional) – Denotes intervals that should be excluded when calculating shift scores
- normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified
-
get_shift_graph
(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]¶ Plot the shift graph between two systems of types
Parameters: - ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
- top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
- cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
- text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
- show_plot (bool, optional) – Whether to show plot when it is done being rendered
- filename (str, optional) – If not None, name of the file for saving the shift graph
Returns: Matplotlib ax of shift graph. Displays shift graph if show_plot=True
Return type: ax
-
class
shifterator.shifts.
ProportionShift
(type2freq_1, type2freq_2)[source]¶ Shift object for calculating differences in proportions of types across two systems
Parameters: type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types -
get_shift_graph
(top_n=50, show_plot=True, detailed=False, text_size_inset=True, cumulative_inset=True, title=None, filename=None, **kwargs)[source]¶ Plot the shift graph between two systems of types
Parameters: - ax (matplotlib.pyplot.axes.Axes, optional) – Axes to draw figure onto. Will create new axes if none are given.
- top_n (int, optional) – Display the top_n types as sorted by their absolute contribution to the difference between systems
- cumulative_inset (bool, optional) – Whether to show an inset showing the cumulative contributions to the shift by ranked types
- text_size_inset (bool, optional) – Whether to show an inset showing the relative sizes of each system
- show_plot (bool, optional) – Whether to show plot when it is done being rendered
- filename (str, optional) – If not None, name of the file for saving the shift graph
Returns: Matplotlib ax of shift graph. Displays shift graph if show_plot=True
Return type: ax
-
-
class
shifterator.shifts.
WeightedAvgShift
(type2freq_1, type2freq_2, type2score_1=None, type2score_2=None, reference_value=None, handle_missing_scores='error', stop_lens=None, stop_words={}, normalization='variation')[source]¶ Shift object for calculating weighted scores of two systems of types, and the shift between them
Parameters: - type2freq_1, type2freq_2 (dict) – Keys are types of a system and values are frequencies of those types
- type2score_1, type2score_2 (dict or str, optional) – If dict, types are keys and values are scores associated with each type. If str, the name of a score lexicon included in Shifterator. If None and other type2score is None, defaults to uniform scores across types. Otherwise defaults to the other type2score dict
- reference_value (str or float, optional) – The reference score to use to partition scores into two different regimes. If ‘average’, uses the average score according to type2freq_1 and type2score_1. If None and a lexicon is selected for type2score, uses the respective middle point in that lexicon’s scale. Otherwise if None, uses zero as the reference point
- handle_missing_scores (str, optional) – If ‘error’, throws an error whenever a word has a score in one score dictionary but not the other. If ‘exclude’, excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If ‘adopt’ and the score is missing in one dictionary, then uses the score from the other dictionary if it is available
- stop_lens (iterable of 2-tuples, optional) – Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations
- stop_words (set, optional) – Denotes words that should be excluded from word shifts calculations
- normalization (str, optional) – If ‘variation’, normalizes shift scores so that the sum of their absolute values sums to 1. If ‘trajectory’, normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and ‘trajectory’ is specified