My First Model¶

Overview¶

CCOBRA provides an interface class called CCobraModel which defines the set of function used to handle the framework-model communication:

Function name	Required	Description
`setup_environment`	no	Provide meta-information about the evaluation setting.
`start_participant`	no	Provide information about the participant to be predicted for.
`end_participant`	no	Callback hook for when evaluation for a participant ends. Allows the model to log information.
`pre_train`	no	Provides data for training on unrelated examples.
`pre_train_person`	no	Provides data for training on responses by the participant to be predicted for.
`pre_person_background`	no	Provides data for training on external data from the participant to be predicted for.
`predict`	yes	Queries the model for a prediction for a specific task.
`adapt`	no	Provides the true participant response to allow for online learning.

Model Object Lifetime¶

The evaluation paradigm of CCOBRA fundamentally relies on the comparison of predictions with the true responses given by individual participants. As a result models are queried individually for each task a human participant provided a response for. In some circumstances (e.g., when leave-one-out crossvalidation is performed), the responses given by all remaining participants is used as training data for the model.

To ensure that no responses provided as part of the training dataset leak into the respective participant evaluations, CCOBRA clearly defines the lifespan of a model to only last for a single participant. Technically, this means that after all predictions for a participant have been obtained, a new instance of the model is created to provide a clean starting point for the evaluation of the next participant.

For model developers, this means that no clean-up or reset steps are required. Models can be developed in the mindset that they will only be applied to predict responses for a single participant.

Implementing my first model¶

For this tutorial, we will redevelop the MFA model for syllogistic reasoning. This model is based on the idea that a storage of known human responses to syllogistic problems is maintained and predictions are generated by computing the most-frequent answer to the problem in question.

The full model can be found here.

Constructor¶

import random
import ccobra

class MFAModel(ccobra.CCobraModel):
    def __init__(self, name='MFAModel', k=1):
        super(MFAModel, self).__init__(name, ["syllogistic"], ["single-choice"])

        # Initialize member variables
        self.mfa_population = dict()
        self.mfa_personal = dict()

Our model’s constructur starts by calling the constructor of the parent class CCobraModel passing information about the supported domains ['syllogistic'] and the supported response types ['single-choice']. In this case, our model is only applicable to the domain of syllogistic reasoning and expects that only a single response conclusion was given by experimental participants.

Subsequently, the model initializes its member variables, i.e., dictionaries to store the observed responses for both, the general population (extracted from the pre-training data) and for the personal responses (i.e., the responses given by the participant to be predicted).

Pre-Training¶

def pre_train(self, dataset):
    # Iterate over subjects in the dataset
    for subj_data in dataset:
        # Iterate over the task for an individual subject
        for task_data in subj_data:
            # Create the syllogism object and extract the task and response encodings
            syllogism = ccobra.syllogistic.Syllogism(task_data['item'])
            encoded_task = syllogism.encoded_task
            encoded_response = syllogism.encode_response(task_data['response'])

            # Prepare the response counter for this task if not present already
            if encoded_task not in self.mfa_population:
                self.mfa_population[encoded_task] = dict()

            # Increment the response count for the present task
            self.mfa_population[encoded_task][encoded_response] = \
                self.mfa_population[encoded_task].get(encoded_response, 0) + 1

For the pre-training step, we iterate over all the experimental participants and for each of them over the individual tasks contained in the dataset. In the innermost loop, the task_data variable contains an item and the corresponding response. Here, the item refers to an instance of CCOBRA’s Item class, which contains all the information available with respect to the presented problem (e.g., premises, response choices, etc.).

To be able to match structurally equivalent syllogism even though they may contain different categorical terms (e.g., All A are B; Some B are C and All farmers are models; Some models are runners), we need to convert the task information from the item container into a standardized form. In syllogistic reasoning, there exists a common form of abbreviating tasks by encoding the quantifiers using capital letters (e.g., AI1 for the example from before). This conversion step is handled internally by the Syllogism helper class which is instantiated in line 7. Consequently, we can use this object to extract encodings for the task and response.

Finally, we populate the mfa dictionary for the population data (self.mfa_population) using the encoded tasks and responses. The mfa dictionary thus maps from syllogistic task encodings to response count dictionaries.

Person-Training¶

def pre_train_person(self, dataset):
    # Iterate over the given tasks for the individual subject to be predicted for
    for task_data in dataset:
        # Create the syllogism object and extract the task and response encodings
        syllogism = ccobra.syllogistic.Syllogism(task_data['item'])
        encoded_task = syllogism.encoded_task
        encoded_response = syllogism.encode_response(task_data['response'])

        # Prepare the response counter for this task if not present already
        if encoded_task not in self.mfa_personal:
            self.mfa_personal[encoded_task] = dict()

        # Increment the response count for the present task
        self.mfa_personal[encoded_task][encoded_response] = \
            self.mfa_personal[encoded_task].get(encoded_response, 0) + 1

Person training is performed in precisely the same way as pre-training except for the fact that we now use the self.mfa_personal dictionary to store response counts.

Predictions¶

def get_mfa_prediction(self, item, mfa_dictionary):
    # Extract the encoded task
    syllogism = ccobra.syllogistic.Syllogism(item)
    encoded_task = syllogism.encoded_task
    encoded_choices = [syllogism.encode_response(x) for x in item.choices]

    if encoded_task in mfa_dictionary:
        # Extract the potential MFA responses which are allowed in terms
        # of the possible response choices
        potential_responses = []
        for response, count in mfa_dictionary[encoded_task].items():
            if response in encoded_choices:
                potential_responses.append((response, count))

        # If potential responses are available, determine the one with
        # maximum frequency
        if potential_responses:
            max_count = -1
            max_responses = []
            for response, count in potential_responses:
                if count > max_count:
                    max_count = count
                    max_responses = []

                if count >= max_count:
                    max_responses.append(response)

            # In case of ties, draw the MFA response at random from the options
            # with maximum frequency
            encoded_prediction = max_responses[random.randint(0, len(max_responses) - 1)]
            return encoded_prediction

    # If no MFA response is available, return None
    return None

To prepare for our actual prediction generation function, we need to provide a function that extracts the most-frequent answer from our mfa dictionaries. For this, we first check whether MFA information for the task to be predicted is available (line 7). If this is the case, we retrieve only those response candidates that are valid with respect to the list of possible choices from the task item. Subsequently, we extract the respons with maximum frequency and return it. In case of ties, we randomly sample from the MFA responses.

If the mfa dictionary does not contain appropriate responses, we return None instead.

def predict(self, item, **kwargs):
    # Create the syllogism object
    syllogism = ccobra.syllogistic.Syllogism(item)

    # Return the personal MFA if available
    personal_prediction = self.get_mfa_prediction(item, self.mfa_personal)
    if personal_prediction is not None:
        return syllogism.decode_response(personal_prediction)

    # Return the population MFA if available
    population_prediction = self.get_mfa_prediction(item, self.mfa_population)
    if population_prediction is not None:
        return syllogism.decode_response(population_prediction)

    # Return a random response if no MFA data is available
    return item.choices[random.randint(0, len(item.choices) - 1)]

To generate the actual prediction for a syllogistic task (represented by the item object passed to the predict function), we first check if a personal MFA prediction is available. If not, we proceed by checking the population mfa data, and ultimately if not MFA can be generated for the problem to solve, we return a random response from the list of possible choices.

Adaption¶

def adapt(self, item, target, **kwargs):
    # Extract the encoded task and response
    syllogism = ccobra.syllogistic.Syllogism(item)
    encoded_task = syllogism.encoded_task
    encoded_response = syllogism.encode_response(target)

    # Prepare the response counter for this task if not present already
    if encoded_task not in self.mfa_personal:
        self.mfa_personal[encoded_task] = dict()

    # Increment the response count for the present task
    self.mfa_personal[encoded_task][encoded_response] = \
        self.mfa_personal[encoded_task].get(encoded_response, 0) + 1

To adapt the model, we increment the corresponding count in the personal mfa dictionary. This essentially reflects a single iteration of the pre_train_person function.

Note

Since the pre_train_person function often performs just a sequence of adaption steps, the CCobraModel class implements this as a default functionality. Hence, for our implementation of the MFA model, we could have omitted the pre_train_person function without altering the model’s behavior.