This chatbot aims to help answer common computer science questions by replying with related stack overflow threads. There is also a general conversational ability provided using the pretrained conversational dialog engine, Chatterbot. Users can interact with the chatbot through the Telegram messaging app.

Developing this chatbot comprised of two main parts:

  • Developing the programming question-answering abilities
  • Adding general conversational abilities
  • 1. Determining User Intent

    I began by trying to determine the intent of users interacting with the chatbot in order to distinguish between general dialogue and a programming-related question. I also tried to determine that if a programming-related question was being asked, which programming-language it was about. I created a model capable of distinguishing between general dialogue and programming-related questions using a logistic regression model trained on TF-IDF features. In order to train the model, I used two datasets:

  • A dataset of tagged stack overflow posts (positive samples)
  • A dataset of dialog phrases from movie subtitles (negative samples)
  • A second classifier was trained in order to determine what programming language a user is referring to in their question. A OneVsRest Classifier was used on top of a logistic regression classifier in order to classify a question as relating to one out of ten programming languages: C#, C/C++, Java, JavaScript, PHP, Python, R, Ruby, Swift, or VB.

    2. Ranking Stack Overflow Threads for Responses

    In order to determine the best Stack Overflow threads to answer a user's question, I attempted to determine the cosine similarity between the user's question and the question asked in a given Stack Overflow thread. To generate embeddings for the phrases, StarSpace word embeddings were trained on the Stack Overflow dataset. When a user asks a programming-related question, the language that the user is asking about is first determined, and only those threads that relate to that programming language are ranked to improve efficiency.

    3. Conversational Dialogue Generation

    The chatbot also has the ability to carry on dialogue with a user that might not be related to computer science. Chatterbot is a pretrained conversational engine that can be trained to generate responses to questions based on collections of known conversations. I have trained and applied the engine for this chatbot. When a user interacts with the chatbot, a phrase is first classified to be a CS question or a conversational phrase, and Chatterbot is used to generate a response.