<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Approaching Significance</title>
<link>https://approachingsignificance.com/blog.html</link>
<atom:link href="https://approachingsignificance.com/blog.xml" rel="self" type="application/rss+xml"/>
<description>Side projects, methodology rabbit holes, and tools I&#39;ve built.</description>
<generator>quarto-1.6.1</generator>
<lastBuildDate>Tue, 12 Aug 2025 04:00:00 GMT</lastBuildDate>
<item>
  <title>Simulating Participant Data with Large Language Models</title>
  <dc:creator>Ethan Milne</dc:creator>
  <link>https://approachingsignificance.com/blog/expected_parrot/</link>
  <description><![CDATA[ 





<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>I have been exploring <a href="https://www.expectedparrot.com"><code>Expected Parrot</code></a>—a framework for simulating participant data with LLMs—to replicate the results of my experimental studies. I think there is a lot of potential for these sorts of tools to give researchers preliminary insights into the likely effects of their experimental studies when using power analyses or other methods for determining appropriate sample sizes.</p>
<p>As an initial test, I wanted to look at a main effect study from my recent paper on <a href="https://approachingsignificance.com/papers/Retributive%20Philanthropy/">“Retributive Philanthropy”</a>. Here, we used some scenario studies to assess the impact of volitional and non-volitional wrongdoing on willingness to make retributive donations. The underlying theory here is that wrongdoing that is willful and/or actively desired by the wrongdoer tells you much more about their character than wrongdoing which is accidental.</p>
<p>The specific manipulation we used was <a href="https://www.insidehighered.com/news/2020/09/08/professor-suspended-saying-chinese-word-sounds-english-slur">a real story</a> of a professor who said a Chinese word that sounds similar to the N-word to his students. This was manipulated to be either presented as-is, or with an alternative story where the professor <em>actually</em> said the N-word to students. Participants were then presented with the opportunity to make a retributive donation, with willingness to donate rated on a 7-point scale.</p>
</section>
<section id="simulation" class="level1">
<h1>Simulation</h1>
<p>There are a few relatively simple steps I took to begin simulating this data: library setup, scenario and question setup, model setup, and participant setup. Note that this is all done in Python, as there isn’t an equivalent R package to do this (yet).</p>
<section id="library-setup" class="level3">
<h3 class="anchored" data-anchor-id="library-setup">Library setup</h3>
<p>First, I needed to load in the <code>edsl</code> library as well as another supporting library:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> edsl <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> QuestionMultipleChoice, QuestionFreeText, QuestionLinearScale, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-2">    Survey, FileStore, Scenario, ScenarioList, AgentList, Agent, Model, ModelList</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> itertools <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> product</span></code></pre></div>
</div>
</section>
<section id="scenario-and-question-setup" class="level3">
<h3 class="anchored" data-anchor-id="scenario-and-question-setup">Scenario and question setup</h3>
<p>Next, I pre-set the scenario manipulation as well as the questions used in our study with the <code>edsl</code> package:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">scenarios <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ScenarioList.from_source(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pdf"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stimuli/intentional.pdf"</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-2">ScenarioList.from_source(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pdf"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stimuli/unintentional.pdf"</span>)</span>
<span id="cb2-3"></span>
<span id="cb2-4">stimuli <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QuestionFreeText(</span>
<span id="cb2-5">  question_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stimuli"</span>,</span>
<span id="cb2-6">  question_text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do you feel about this story: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> scenario.text </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}}</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb2-7">)</span>
<span id="cb2-8"></span>
<span id="cb2-9">nmj_1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QuestionLinearScale(</span>
<span id="cb2-10">  question_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nmj_1"</span>,</span>
<span id="cb2-11">  question_text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I believe Professor Gerber harmed his Black students"</span>,</span>
<span id="cb2-12">  question_options <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,],</span>
<span id="cb2-13">  option_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly disagree"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly agree"</span>},</span>
<span id="cb2-14">  include_comment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb2-15">)</span>
<span id="cb2-16"></span>
<span id="cb2-17">nmj_2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QuestionLinearScale(</span>
<span id="cb2-18">  question_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nmj_2"</span>,</span>
<span id="cb2-19">  question_text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I believe Professor Gerber intended to harm his Black students"</span>,</span>
<span id="cb2-20">  question_options <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,],</span>
<span id="cb2-21">  option_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly disagree"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly agree"</span>},</span>
<span id="cb2-22">  include_comment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb2-23">)</span>
<span id="cb2-24"></span>
<span id="cb2-25">nmj_3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QuestionLinearScale(</span>
<span id="cb2-26">  question_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nmj_3"</span>,</span>
<span id="cb2-27">  question_text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I blame Professor Gerber for harming his Black students"</span>,</span>
<span id="cb2-28">  question_options <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,],</span>
<span id="cb2-29">  option_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly disagree"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Strongly agree"</span>},</span>
<span id="cb2-30">  include_comment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb2-31">)</span>
<span id="cb2-32"></span>
<span id="cb2-33">dv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> QuestionLinearScale(</span>
<span id="cb2-34">  question_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dv"</span>,</span>
<span id="cb2-35">  question_text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"In response to this incident, the Kentucky Antiracist Students Alliance (KASA) is raising funds to support those harmed by the professor's speech. For each donation, KASA promises to send a letter calling for the professor's dismissal. Please rate your likelihood of donating to KASA below:"</span>,</span>
<span id="cb2-36">  question_options <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,],</span>
<span id="cb2-37">  option_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Extremely unlikely"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>:<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Extremely likely"</span>},</span>
<span id="cb2-38">  include_comment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb2-39">)</span></code></pre></div>
</div>
</section>
<section id="model-setup" class="level3">
<h3 class="anchored" data-anchor-id="model-setup">Model setup</h3>
<p>I then set up this process to use a range of different LLM models. Ideally, these results will not depend on the idiosyncracies of a single model, so I use a range of models to generate the results. This will help me understand how the results vary (or ideally, remain consistent) across different models. Right now, I am using several models from Google, Anthropic, and Mistral. These are the cheapest models available from each provider (based on this listed curated by ExpectedParrot: https://www.expectedparrot.com/models), which should be sufficient for this study. Presumably, more expensive models will produce better results, but I am not interested in that right now.</p>
<p>I am also setting models to have a temperature of 1.0. Expected Parrot appears to have a default temperature value of 0.5, but given that I will be replicating responses many times, I want to get some more randomness in model results.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">models <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ModelList([</span>
<span id="cb3-2">  Model(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gemini-1.5-flash-8b"</span>, service_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"google"</span>, temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>),</span>
<span id="cb3-3">  Model(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-3-haiku-20240307"</span>, service_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"anthropic"</span>, temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>),</span>
<span id="cb3-4">  Model(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mistral.mistral-7b-instruct-v0:2"</span>, service_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bedrock"</span>, temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>),</span>
<span id="cb3-5">])</span></code></pre></div>
</div>
</section>
<section id="participant-setup" class="level3">
<h3 class="anchored" data-anchor-id="participant-setup">Participant setup</h3>
<p>I next wanted to get a relatively random assortment of participants for this analysis. The code below simulates a rough distribution of genders, ages, and other demigraphic characteristics to make each model’s results more robust.</p>
<p>Right now, what this code does is provide every combination of each option for age, gender, nationality, and political views. Realistically, I would want to include more nationalities, more gender diversity, have an age curve more broadly representative of the population and/or Prolific’s user base, and have better weighting of different combinations (e.g., men are more conservative than women, whereas this code assumes equal representation).</p>
<p>However, for the sake of simplicity, I will use this smaller set of agents:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">ages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">35</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span>]</span>
<span id="cb4-2">genders <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Male"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Female"</span>]</span>
<span id="cb4-3">nationalities <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"American"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Canadian"</span>]</span>
<span id="cb4-4">political_views <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Conservative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Liberal"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Moderate"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Progressive"</span>]</span>
<span id="cb4-5"></span>
<span id="cb4-6">agents <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> AgentList(</span>
<span id="cb4-7">    Agent(traits<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age"</span>: age, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gender"</span>: gender, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nationality"</span>: nationality, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"politics"</span>: politics})</span>
<span id="cb4-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> age, gender, nationality, politics <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> product(ages, genders, nationalities, political_views)</span>
<span id="cb4-9">)</span></code></pre></div>
</div>
</section>
<section id="running-the-survey" class="level3">
<h3 class="anchored" data-anchor-id="running-the-survey">Running the survey</h3>
<p>Finally, I can run the survey and save all data:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">survey <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Survey(questions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [stimuli, nmj_1, nmj_2, nmj_3, dv])</span>
<span id="cb5-2">survey <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> survey.set_full_memory_mode()</span>
<span id="cb5-3"></span>
<span id="cb5-4">results <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> survey.by(scenarios).by(agents).by(models).run()</span>
<span id="cb5-5"></span>
<span id="cb5-6">results.to_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"results/blended/full.csv"</span>)</span>
<span id="cb5-7"></span>
<span id="cb5-8">results.select(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"filename"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"model"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gender"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nationality"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"politics"</span>, </span>
<span id="cb5-9">               <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"generated_tokens.nmj_1_generated_tokens"</span>, </span>
<span id="cb5-10">               <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"generated_tokens.nmj_2_generated_tokens"</span>, </span>
<span id="cb5-11">               <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"generated_tokens.nmj_3_generated_tokens"</span>, </span>
<span id="cb5-12">               <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"generated_tokens.dv_generated_tokens"</span>).to_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"results/blended/reduced.csv"</span>)</span></code></pre></div>
</div>
<p>Here is what running the model looks like:</p>
<p><img src="https://approachingsignificance.com/blog/expected_parrot/images/Model run.png" class="img-fluid"></p>
<p>Now that the simulated data is generated, I can then analyze the results and compare them to what we found in our study of real participants.</p>
</section>
</section>
<section id="comparison" class="level1">
<h1>Comparison</h1>
<p>I generally prefer working with R, so I will load the data in and analyze it using R instead of Python. Here’s some quick code to load in the simulated data and report the results of a between-conditions t-test:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(here)</span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb6-3"></span>
<span id="cb6-4">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blog"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"expected_parrot"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"results"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"google"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"reduced.csv"</span>))</span>
<span id="cb6-5"></span>
<span id="cb6-6">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matches</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nmj|dv"</span>),</span>
<span id="cb6-8">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(.x), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">d"</span>)))) </span>
<span id="cb6-9"></span>
<span id="cb6-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(data) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace_all</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"generated_tokens</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">."</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace_all</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_generated_tokens"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span>
<span id="cb6-13"></span>
<span id="cb6-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(data[data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>scenario.filename <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"intentional.pdf"</span>,]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dv,</span>
<span id="cb6-15">data[data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>scenario.filename <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unintentional.pdf"</span>,]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dv)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Welch Two Sample t-test

data:  data[data$scenario.filename == "intentional.pdf", ]$dv and data[data$scenario.filename == "unintentional.pdf", ]$dv
t = 5.5398, df = 361.49, p-value = 5.84e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.4255307 0.8939137
sample estimates:
mean of x mean of y 
 2.690972  2.031250 </code></pre>
</div>
</div>
<p>These results look like very similar results to what was reported in the paper! The mean willingness to retributively donate in the intentional condition was almost exactly the same for the real (M = 2.67) and simulated data (M = 2.69); the mean willingness to retributively donate in the unintentional condition was very close for both the real (M = 1.83) and simulated data (M = 2.03) as well. The statistical tests were both significant as well (though these two studies differed in overall sample size):</p>
<ul>
<li><strong>Real</strong>: t(1109.2) = 8.40, (p &lt; .001)</li>
<li><strong>Simulated</strong>: t(361.49) = 5.54, (p &lt; .001)</li>
</ul>
<p><em>(Note that for the purpose of this analysis, I am only showing the output of Google’s gemini 1.5 flash modelm but the results I am sharing are consistent across the different models I tested.)</em></p>
<p>In short, it looks like for this sort of study, LLM-based simulations work very well! I can imagine a lot of use-cases for this. In particular, I think power analyses could be greatly helped by LLM-based simulation: right now, power analysis often requires the researcher to either guess what an effect might be or <a href="https://www.sciencedirect.com/science/article/pii/S0022103121000627#:~:text=By%20examining%20whether%20an%20observed,meaningful%2C%20by%20using%20equivalence%20tests.">identify what a minimally interesting effect size might be</a> in order to calculate the necessary sample for an experiment. If these models work well, I could see researchers using them to more easily simulate data prior to power analysis in order to get a better grounding for their likely effect sizes.</p>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>This was a small test of using LLMs to simulate experimental data. This is a growing area of research that a lot of behavioral scientists are paying attention to, and noting areas where LLM-simulated data converges and diverges from “real” studies will be important as our discipline considers how best to use this information in the design, execution, and replication of experimental studies. Hopefully this was useful to anybody interested in trying this method out for themselves!</p>


</section>

 ]]></description>
  <category>code</category>
  <category>methodology</category>
  <guid>https://approachingsignificance.com/blog/expected_parrot/</guid>
  <pubDate>Tue, 12 Aug 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Introducing Haikuify</title>
  <dc:creator>Ethan Milne</dc:creator>
  <link>https://approachingsignificance.com/blog/haikuify/</link>
  <description><![CDATA[ 





<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>I made an R package. Not a good one, but a package nonetheless. It’s called <code>Haikuify</code>, and can be accessed at <a href="https://github.com/SEthanMilne/Haikuify">this GitHub repo</a>.</p>
<p>This package does exactly what the name suggests – it takes text and…. haikuifies it. That is, the package takes strings and identifies when there are haikus present.</p>
<p>Haikus are well-suited for text analysis because they do not have any requirements for rhyming, which can be difficult to determine with basic text analysis tools. Instead, Haikus have only one rule: <em>The poem must follow a 5-7-5 syllable structure</em>. Counting syllables is easy with modern packages like <code>Quanteda</code> or <code>nsyllable</code>, so it is pretty easy to determine whether a sentence meets the criteria of a haiku.</p>
</section>
<section id="package-mechanics" class="level1">
<h1>Package mechanics</h1>
<p>The <code>Haikuify</code> package contains a single function, reproduced below:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">haikuify <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span> (x) {</span>
<span id="cb1-2">    x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-3">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">strsplit</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(?&lt;=[[:punct:]])</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">s(?=[A-Z])"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">perl =</span> T) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-4">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-5">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentences"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentences =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"[[:punct:]]"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, Sentences),</span>
<span id="cb1-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentences =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tolower</span>(Sentences),</span>
<span id="cb1-10">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentence_ID =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()</span>
<span id="cb1-11">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-12">        tidytext<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_tokens</span>(word, Sentences) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-13">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-14">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">syllables =</span> nsyllable<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nsyllable</span>(word) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-15">                <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>()</span>
<span id="cb1-16">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(Sentence_ID) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-18">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentence_syllables =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(syllables)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-19">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(sentence_syllables <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-20">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">syllable_count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(syllables)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-21">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-22">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-23">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-24">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-25">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">word =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, </span>
<span id="cb1-26">                          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(word, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>),</span>
<span id="cb1-27">                          word),</span>
<span id="cb1-28">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">word =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, </span>
<span id="cb1-29">                          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(word, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>),</span>
<span id="cb1-30">                          word)</span>
<span id="cb1-31">        ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-32">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(word, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-33">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_to_title</span>(text)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-34">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(text)</span>
<span id="cb1-35">}</span></code></pre></div>
</div>
<p>Let’s break this down piece by piece. First, we need to split a given text string into its component sentences. The following chunk takes a string (x) and splits it whenever it finds punctuation. These split strings are output as a list, which is then unlisted and turned into a dataframe. Then, the dataframe is given sensible column names (e.g.&nbsp;“sentences”), and the punctuation is stripped from the string. Finally, all words are turned into lowercase using the tolower() function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">strsplit</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(?&lt;=[[:punct:]])</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">s(?=[A-Z])"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">perl =</span> T) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-4">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-5">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentences"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-7">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentences =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"[[:punct:]]"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, Sentences),</span>
<span id="cb2-8">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentences =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tolower</span>(Sentences),</span>
<span id="cb2-9">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sentence_ID =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()</span>
<span id="cb2-10">        )</span></code></pre></div>
</div>
<p>Next, we need to figure out the number of syllables contained in each word, in each sentence. The following chunk “unnests” the words in each sentence, splitting by the spaces in the sentence and giving each word its own row in a new dataframe. Then, the <code>quanteda</code> function nsyllable() is applied to each individual word in its row to come up with a syllable value per word.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">tidytext<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_tokens</span>(word, Sentences) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb3-3">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">syllables =</span> quanteda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nsyllable</span>(word, </span>
<span id="cb3-4">                                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">syllable_dictionary =</span> quanteda<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>data_int_syllables) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-5">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>()</span>
<span id="cb3-6">    )</span></code></pre></div>
</div>
<p>Now that syllables have been determined, we can identify if a sentence meets basic criteria for a haiku. The first check is whether a sentence has 17 total syllables (the sum of 5 + 7 + 5):</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(Sentence_ID) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentence_syllables =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(syllables)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(sentence_syllables <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span>)</span></code></pre></div>
</div>
<p>But not every sentence that has 17 syllables is truly a haiku. Consider the following sentence:</p>
<blockquote class="blockquote">
<p>“Antidisestablishmentarianism is a cool word eh?”</p>
</blockquote>
<p>While this sentence <em>does</em> have 17 syllables, the word “antidisestablishmentarianism” cannot be split across multiple lines. The solution, then, is to cumulatively sum the syllables in a given sentence, and identify when a sentence reaches a syllable count of 5, 12 (5+7), and 17 (5+7+5) respectively:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">syllable_count =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(syllables)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-4">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">any</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">17</span>))</span></code></pre></div>
</div>
<p>Now we need to identify the line breaks for a haiku. With our list of true haikus, we can easily append “/” to the end of the 5 and 12-syllable words to signify line breaks:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">word =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, </span>
<span id="cb6-2">                     <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(word, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>), </span>
<span id="cb6-3">                     word),</span>
<span id="cb6-4">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">word =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(syllable_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, </span>
<span id="cb6-5">                          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(word, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>), </span>
<span id="cb6-6">                          word)</span>
<span id="cb6-7">        )</span></code></pre></div>
</div>
<p>Finally, we can pull everything together and output a full haiku, complete with line breaks and all:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(word, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_to_title</span>(text)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">        dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(text)</span></code></pre></div>
</div>
</section>
<section id="implementation" class="level1">
<h1>Implementation</h1>
<p>Now to install the package and give it a live test:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">devtools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install_github</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SEthanMilne/Haikuify"</span>)</span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(Haikuify)</span>
<span id="cb8-3"></span>
<span id="cb8-4"></span>
<span id="cb8-5">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I've been wondering for a while now how we might start this project. We need to make sure that the tools are all in place to get started soon. Sound good? Let's get going."</span></span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">haikuify</span>(text)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "We Need To Make Sure / That The Tools Are All In Place / To Get Started Soon"</code></pre>
</div>
</div>
<p>The use-case for <code>Haikuify</code> is clear: whimsical exploration of otherwise serious text data. As an example, I undertook a small project a couple years back using this package to find accidental haikus in 10-k reports from major tech firms. Here’s a haiku sourced from Microsoft’s 2018 report (with a background chart of their stock movement)</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://approachingsignificance.com/blog/haikuify/microsoft_haiku.png" class="img-fluid figure-img"></p>
<figcaption>Microsoft Haiku</figcaption>
</figure>
</div>


</section>

 ]]></description>
  <category>code</category>
  <category>R packages</category>
  <guid>https://approachingsignificance.com/blog/haikuify/</guid>
  <pubDate>Fri, 04 Mar 2022 05:00:00 GMT</pubDate>
  <media:content url="https://approachingsignificance.com/blog/haikuify/microsoft_haiku.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>JCR P-Values</title>
  <dc:creator>Ethan Milne</dc:creator>
  <link>https://approachingsignificance.com/blog/jcr_pvals/</link>
  <description><![CDATA[ 





<p>Another fun project – during my comprehensive exams I was given a large set of JCR papers to read and review. Since I had all these PDFs lying around, I had a good opportunity to learn more about automated data extraction from PDF documents. I decided to look at the distribution of P values in all 2019-2020 JCR papers.</p>
<section id="required-packages" class="level1">
<h1>Required Packages</h1>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pdftools)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggthemes)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggtext)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggpubr)</span></code></pre></div>
</div>
</section>
<section id="extract-p-values" class="level1">
<h1>Extract P Values</h1>
<p>The code chunk below extracts the P values from papers in the following way:</p>
<ul>
<li>Get the names of all papers in a folder containing them</li>
<li>Loop through each paper name and use the <code>pdftools</code> package to extract their raw text</li>
<li>Do some basic cleaning (e.g.&nbsp;getting rid of “\n”, which denote paragraph breaks)</li>
<li>Extract all strings of text that are preceded by <code>"P &gt;"</code>, <code>"P &lt;"</code>, <code>"P ="</code>, and which end in a closed bracket <code>")"</code></li>
<li>Output a data frame with 2 columns: <code>P values</code> and <code>Paper Name</code></li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PDFs"</span>)</span>
<span id="cb2-2"></span>
<span id="cb2-3">Results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ncol =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nrow =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>))</span>
<span id="cb2-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(Results) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P_Value"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Paper"</span>)</span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(files)) {</span>
<span id="cb2-7">  name <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> files[i]</span>
<span id="cb2-8">  </span>
<span id="cb2-9">  text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdf_text</span>(</span>
<span id="cb2-11">        here<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blog"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"jcr_pvals"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PDFs"</span>, name)</span>
<span id="cb2-12">    )</span>
<span id="cb2-13">  text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, text)</span>
<span id="cb2-14">  text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, text)</span>
<span id="cb2-15">  </span>
<span id="cb2-16">  values <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract_all</span>(text, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'p</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">s?[=&lt;&gt;]</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">s?</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">d{1,4}'</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-19">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Paper =</span> name)</span>
<span id="cb2-20">  </span>
<span id="cb2-21">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_empty</span>(values)) {</span>
<span id="cb2-22">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">next</span></span>
<span id="cb2-23">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>{</span>
<span id="cb2-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(values) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P_Value"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Paper"</span>)</span>
<span id="cb2-25">    </span>
<span id="cb2-26">    Results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(values, Results)</span>
<span id="cb2-27">  }</span>
<span id="cb2-28">}</span></code></pre></div>
</div>
</section>
<section id="cleaning" class="level1">
<h1>Cleaning</h1>
<p>Now that I have every P value, I need to extract the actual number. This is moderately challenging - P values are often reported without the leading <code>0</code> (e.g.&nbsp;p = .07), and a p value that is reported as <strong>greater than</strong> 0.05 is different from one that is <strong>equal</strong> to or <strong>less than</strong>, and those differences need to be recorded somewhere for any future work I may do.</p>
<p>In summary, what the below code does is:</p>
<ul>
<li>Extract the “raw numeric value” from each reported p value string</li>
<li>Replace the prior “p [&lt;=&gt;]” with a “0” instead</li>
<li>Convert this column to numeric</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">Cleaned_Results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> Results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb3-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> P_Value,</span>
<span id="cb3-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p &lt; "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p = "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p &gt; "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p &gt;"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p ="</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p &lt;"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p&lt; "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p= "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p&gt; "</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p&lt;"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p="</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value),</span>
<span id="cb3-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Raw_Value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p&gt;"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>, Raw_Value)</span>
<span id="cb3-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Operator =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(P_Value, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"[=&lt;&gt;]"</span>))</span>
<span id="cb3-18"></span>
<span id="cb3-19">Cleaned_Results<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Raw_Value <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(Cleaned_Results<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Raw_Value)</span></code></pre></div>
</div>
</section>
<section id="plotting" class="level1">
<h1>Plotting</h1>
<p>Finally, I wanted to plot the raw p values I’ve found. There’s little analytic code here - mostly just ggplot aesthetic wrangling. I used the <code>{ggthemes}</code> package for R to get the <code>theme_fivethirtyeight</code> function which gives me a lot of aesthetic power, for lack of a better term.</p>
<p>I’ve added X axis breaks at the standard p value thresholds - 0.1, 0.05, 0.01, 0.001. We should expect that there is significant clustering around these thresholds, as most researchers seem to report inequalities (p &lt; x) rather than exact values.</p>
<p>Finally, p values aren’t exactly linear in their distribution, so everything is put on a log scale for easier interpretation.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(Cleaned_Results, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Raw_Value)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_log10</span>(</span>
<span id="cb4-4">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0001</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.001</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb4-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".0001"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".001"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".01"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".05"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span>)</span>
<span id="cb4-6">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_pubclean</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(),</span>
<span id="cb4-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.title.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(),</span>
<span id="cb4-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_markdown</span>(),</span>
<span id="cb4-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_rect</span>(</span>
<span id="cb4-13">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb4-14">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>,</span>
<span id="cb4-15">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb4-16">        )</span>
<span id="cb4-17">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb4-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;span style = 'color: #ed713a;'&gt;Distribution of P Values&lt;/span&gt;"</span>,</span>
<span id="cb4-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Appearing in JCR, 2019-2020 editions"</span>,</span>
<span id="cb4-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"P Values"</span>,</span>
<span id="cb4-22">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span></span>
<span id="cb4-23">    )</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://approachingsignificance.com/blog/jcr_pvals/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>


</section>

 ]]></description>
  <category>code</category>
  <category>methodology</category>
  <guid>https://approachingsignificance.com/blog/jcr_pvals/</guid>
  <pubDate>Thu, 03 Mar 2022 05:00:00 GMT</pubDate>
</item>
</channel>
</rss>
