Subtitle = "This relationship would be great for pangram models, but we don't know the # of words ahead of playing" )Īs expected, given how the scoring works, the score (proxied by the minimum genius score) is closely correlated with the maximum number of words, but there is some variance. Title = "The possible score is related to the number of words you can make", Ggplot ( data, aes ( max_words, min_genius ) ) + geom_point ( ) + geom_smooth ( ) + theme_minimal ( ) + labs ( Safe analysis despite data prep failuresīefore completely calling it, we can used the data we successfully parsed off the HTML pages (not the image data) to see if there is a simple relationship between the genius score and the number of pangrams: In other words, where the total_letters_found != 7, it is safe to assume the required letter is off as well. If a letter can’t be found, this messes with the order, and prevents the data about the required letter from being accurate. In addition to the wrong letters, my determination of the required letter was based on the OCR analysis recording identifying letters in a consistent order. The days with too many letters are equally troubling. These letters have a major impact on game scoring, so any model of the game would have significant bias without days that included these letters. Out of 365 days, it produced 142 accurate datasets with 7 letters, less than 50% of the attempted 365 days! Of the inaccuracies, the biggest is missing letters: it did not identify any O, Q, or Xs. # what are A and S correlated with? # result: they could be confused with L, N, |, T, C, E, R a_s_days % filter ( letters %in% c ( "S", "A" ) ) %>% pull ( date ) For now, let’s take a quick look at the result of this workflow: The details of this workflow are detailed at the bottom section of this post.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |