This story is borne of boredom. It begins one cold winter night in Resnik house. I was there to hang out with my friend and get some help understanding something in a class. It was the normal weeknight until around 9pm somebody came over and told us we had to play HQ.
"What is HQ?" I asked. "It's like a trivia game on your phone, just download it hurry, use my code." they replied. So, I downloaded the app and was able to enter before the game started. We were led into somebody's room with 7-8 people playing the game. The first question hit, "Which of these organs is found inside the skull?"
I figured out the game quickly. You just select the answer to the question within 10 seconds. If you get it right you continue, otherwise you're out and cannot answer the next question. The questions get progressively trickier and the prize money for the game is split between the remaining players. I was amazed at the knowledge of the group, when asked questions like "Which actress has portrayed both Queen Elizabeth I and II?" Somebody in the room knew the answer, unfortunately we all got out on a previous question. But after that I was hooked.
I considered how I might get better at the game. Being a student of computer science, my ideas immediately gravitated towards generating a program to assist in playing the game. So first I looked at what other people had done. Most of the stuff I found were undocumented github repositories that OCR'd the game's screen (being streamed from a phone or in an android emulator) and then used the result of the OCR to google the question with each answer choice, and then presented likelihoods for each answer choice based on a metric like # of search results. This worked pretty well, it would typically get the first 6-7 questions of a game. Using this as a factor in your decision making in a game helped a great deal on getting farther in the game. In fact, it netted me my first win one day in one of the wean clusters.
I couldn't believe it. I had actually won the game. The winnings weren't much as a lot of other people won, but it was enough to take my SO upscale restaurant near my house that we meme about.
So, then the question became: how do I make it better? First order of business was figuring out how to interface with the server directly to potentially get the video stream and OCR it from there, removing latency from the loop. Instead, what I found whenever I took a peek inside the app is that the text of the question and answers are relayed without any obfuscation using a websocket connection. Score, no OCR required.
Once I had made the bindings to connect to the app's server in python, I was off to the races on making this better. There was a distinct difficulty in building this though, because you can only test, if you are lucky, twice a day. So, a game would begin, I would start panicking because some parsing error had popped up and try to patch it and run it again before the questions get hard, all while trying to play the game on my own phone. So, I quickly learned to save the raw messages and test using server replays.
While I was developing assistant software, something else was also developing. So many of my friends had gotten into HQ. Every day, a few minutes before 3PM and 9PM, somebody would message a group chat and tell us they've got a place. We would prefer study spaces on campus where we can talk to play the game. Study spaces are a tight commodity at CMU though, so we'd have use our collective to find somewhere with space on short notice twice a day. We'd then all rush over before the game begins and play together. Twice a day, a large portion of the people I knew would get together to share in our excitement playing HQ. Not everybody would make every game, but about 25 people were in the group chat and at minimum 3-4 people would show up. Even if some of us couldn't show up in person, we would have one person facetime others into the main discussion. A social group formed around playing HQ together and this later influenced the software as the minds of the group, collaborating with the software ultimately became a juggernaut to win HQ.
Eventually I had formed a model based on various features in the questions, answers, and google results for the queries based on the question and answers. Creating and training various models with this method was fun, but there were always some trick questions at the end where the model was useless. I needed something better. I also tried using some research question-answering systems like DrQA, but I found them to be both too slow and incapable of answering the pop culture type questions prevalent on HQ.
The thing that helped the most wasn't a better model or some complex algorithm. It was better UX. An opaque confidence in each answer is just a terrible UX communicating knowledge about answers to humans. What was important in a game was presenting the internet's knowledge in a format parsable in under 10 seconds. This allowed you to use the internet's knowledge to make an informed guess at the answer. This discovery is where my program started to get good, like really good.
Whenever I had this hypothesis, I rewrote the assistant. There were some other issues that I was facing with the ] old approach. I had previously used Python to easily use and train models, but for the core of the assistant, python wasn't great. Using asyncio to query google up to 12 times in just a few seconds in order to make a judgement wasn't that fast and sometimes it would take up to 5-6 seconds, leaving no time to think before answering. I moved to a new model with a server written in Java that would run on Google Cloud Compute for low search api latency used 1 thread per search, decreasing the time until results to under 2-3 seconds end to end.
The assistant in action