Unfortunately, this character level model struggled to create coherent thoughts. Given the seed one of the biggest scams is believing, the algorithm completed the phrase with to suffer. Given the seed dogs are really just people that should, the algorithm completed the phrase with live to kill. Model output from a test run with a (very) small data set. With a few hundred thousand scraped posts and a few hours on anĪWS p2 GPU instance, the model got from nonsense to semi-logical posts Training the model went surprisingly smoothly. X = Dense(., activation='softmax', name='char_prediction_softmax')(x)Ĭhar_pile(optimizer=optimizer, loss='categorical_crossentropy') X = Embedding(., name='char_embedding')(sequence_input) Ultimately, the model looked something like: sequence_input = keras.Input(., name='char_input') Look at embedding clustering and distances for characters, similar to In particular, it would be interesting to Greatly increase training time, and could allow for interesting further work. Heuristically, there didn't seem toīe much of a difference between One Hot Encoded inputs and using an embedding layer, but the embedding layers didn't In addition to the LSTM architecture, I chose to add a character embedding layer. Why this particular architecture choice works well is beyond the scope of this post, but Similar to the keras example code, I went with a Recurrent Neural Network (RNN), with Would become the X, y pair: ,ĭata in hand, I built a model. Converting text into an X array containing a fixed length arrays of characters, and a y array, containing theįor example If my boss made me do as much homework as my kids' teachers make them, I'd tell him to go f.Replacing all illegal characters with a space.Converted the string to a list of characters.Once I had the data set, I performed a set of standard data transformations, including: Though I was mainly interested in the title field, a long list of other fields were available, I scraped all posts for a 100 day period in 2017 utilizing Reddit's PRAW PythonĪPI wrapper. Machine Learning is to Computers what Evolution is to Organisms.Google should make it so that looking up "Is Santa real?" With safe search on only gives yes answers.It kinda makes sense that the target audience for fidget spinners lost interest in them so quickly.Every machine can be utilised as a smoke machine if it is used wrong enough.R/Showerthoughts is an online message board, to "share those miniatureĮpiphanies you have" while in the shower. Utilizing training data from r/Showerthoughts, and starter codeįrom Keras, I built and trained a deep learning model that learned to generate new (and sometimes profound) shower Similar to repeatedly pressing auto-correct's top choice, this processĬan be repeated to generate a string of AI generated characters. Generally, character level models look at a window of precedingĬharacters, and try to infer the next character. More about building character level deep learning models. Though I've had some previous experience with linear NLP models and word level deep learning models, I wanted to learn ML focused on human language models - has gone from sci-fi to example code. Textbook writing, Natural Language Processing (NLP) - the branch of Given the seed smart phones are today s version of the, the algorithm completed the phrase with friend to the millions.ĭeep learning has drastically changed the way machines interact with human languages. Learned pithiness, curse words and clickbait-ing. Tl dr: I tried to train a Deep Learning character model to have shower thoughts, using Reddit data. Teaching AI to have shower thoughts, trained with Reddit's r/Showerthoughts
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |