.. _myfirstmodel: My First Model ============== Overview ------------------- CCOBRA provides an interface class called :class:`~ccobra.CCobraModel` which defines the set of function used to handle the framework-model communication: ========================= ======== ============================================================================================== Function name Required Description ========================= ======== ============================================================================================== ``setup_environment`` no Provide meta-information about the evaluation setting. ``start_participant`` no Provide information about the participant to be predicted for. ``end_participant`` no Callback hook for when evaluation for a participant ends. Allows the model to log information. ``pre_train`` no Provides data for training on unrelated examples. ``pre_train_person`` no Provides data for training on responses by the participant to be predicted for. ``pre_person_background`` no Provides data for training on external data from the participant to be predicted for. ``predict`` yes Queries the model for a prediction for a specific task. ``adapt`` no Provides the true participant response to allow for online learning. ========================= ======== ============================================================================================== Model Object Lifetime ::::::::::::::::::::: The evaluation paradigm of CCOBRA fundamentally relies on the comparison of predictions with the true responses given by individual participants. As a result models are queried individually for each task a human participant provided a response for. In some circumstances (e.g., when leave-one-out crossvalidation is performed), the responses given by all remaining participants is used as training data for the model. To ensure that no responses provided as part of the training dataset leak into the respective participant evaluations, CCOBRA clearly defines the lifespan of a model to only last for a single participant. Technically, this means that after all predictions for a participant have been obtained, a new instance of the model is created to provide a clean starting point for the evaluation of the next participant. For model developers, this means that no clean-up or reset steps are required. Models can be developed in the mindset that they will only be applied to predict responses for a single participant. Implementing my first model --------------------------- For this tutorial, we will redevelop the MFA model for syllogistic reasoning. This model is based on the idea that a storage of known human responses to syllogistic problems is maintained and predictions are generated by computing the *most-frequent answer* to the problem in question. The full model can be found `here `_. Constructor ::::::::::: .. code-block:: python :linenos: import random import ccobra class MFAModel(ccobra.CCobraModel): def __init__(self, name='MFAModel', k=1): super(MFAModel, self).__init__(name, ["syllogistic"], ["single-choice"]) # Initialize member variables self.mfa_population = dict() self.mfa_personal = dict() Our model's constructur starts by calling the constructor of the parent class :class:`~ccobra.CCobraModel` passing information about the supported domains ``['syllogistic']`` and the supported response types ``['single-choice']``. In this case, our model is only applicable to the domain of syllogistic reasoning and expects that only a single response conclusion was given by experimental participants. Subsequently, the model initializes its member variables, i.e., dictionaries to store the observed responses for both, the general population (extracted from the pre-training data) and for the personal responses (i.e., the responses given by the participant to be predicted). Pre-Training :::::::::::: .. code-block:: python :linenos: def pre_train(self, dataset): # Iterate over subjects in the dataset for subj_data in dataset: # Iterate over the task for an individual subject for task_data in subj_data: # Create the syllogism object and extract the task and response encodings syllogism = ccobra.syllogistic.Syllogism(task_data['item']) encoded_task = syllogism.encoded_task encoded_response = syllogism.encode_response(task_data['response']) # Prepare the response counter for this task if not present already if encoded_task not in self.mfa_population: self.mfa_population[encoded_task] = dict() # Increment the response count for the present task self.mfa_population[encoded_task][encoded_response] = \ self.mfa_population[encoded_task].get(encoded_response, 0) + 1 For the pre-training step, we iterate over all the experimental participants and for each of them over the individual tasks contained in the dataset. In the innermost loop, the ``task_data`` variable contains an ``item`` and the corresponding ``response``. Here, the item refers to an instance of CCOBRA's :class:`~ccobra.Item` class, which contains all the information available with respect to the presented problem (e.g., premises, response choices, etc.). To be able to match structurally equivalent syllogism even though they may contain different categorical terms (e.g., *All A are B; Some B are C* and *All farmers are models; Some models are runners*), we need to convert the task information from the item container into a standardized form. In syllogistic reasoning, there exists a common form of abbreviating tasks by encoding the quantifiers using capital letters (e.g., *AI1* for the example from before). This conversion step is handled internally by the :class:`~ccobra.syllogistic.Syllogism` helper class which is instantiated in line 7. Consequently, we can use this object to extract encodings for the task and response. Finally, we populate the mfa dictionary for the population data (``self.mfa_population``) using the encoded tasks and responses. The mfa dictionary thus maps from syllogistic task encodings to response count dictionaries. Person-Training ::::::::::::::: .. code-block:: python :linenos: def pre_train_person(self, dataset): # Iterate over the given tasks for the individual subject to be predicted for for task_data in dataset: # Create the syllogism object and extract the task and response encodings syllogism = ccobra.syllogistic.Syllogism(task_data['item']) encoded_task = syllogism.encoded_task encoded_response = syllogism.encode_response(task_data['response']) # Prepare the response counter for this task if not present already if encoded_task not in self.mfa_personal: self.mfa_personal[encoded_task] = dict() # Increment the response count for the present task self.mfa_personal[encoded_task][encoded_response] = \ self.mfa_personal[encoded_task].get(encoded_response, 0) + 1 Person training is performed in precisely the same way as pre-training except for the fact that we now use the ``self.mfa_personal`` dictionary to store response counts. Predictions ::::::::::: .. code-block:: python :linenos: def get_mfa_prediction(self, item, mfa_dictionary): # Extract the encoded task syllogism = ccobra.syllogistic.Syllogism(item) encoded_task = syllogism.encoded_task encoded_choices = [syllogism.encode_response(x) for x in item.choices] if encoded_task in mfa_dictionary: # Extract the potential MFA responses which are allowed in terms # of the possible response choices potential_responses = [] for response, count in mfa_dictionary[encoded_task].items(): if response in encoded_choices: potential_responses.append((response, count)) # If potential responses are available, determine the one with # maximum frequency if potential_responses: max_count = -1 max_responses = [] for response, count in potential_responses: if count > max_count: max_count = count max_responses = [] if count >= max_count: max_responses.append(response) # In case of ties, draw the MFA response at random from the options # with maximum frequency encoded_prediction = max_responses[random.randint(0, len(max_responses) - 1)] return encoded_prediction # If no MFA response is available, return None return None To prepare for our actual prediction generation function, we need to provide a function that extracts the most-frequent answer from our mfa dictionaries. For this, we first check whether MFA information for the task to be predicted is available (line 7). If this is the case, we retrieve only those response candidates that are valid with respect to the list of possible choices from the task item. Subsequently, we extract the respons with maximum frequency and return it. In case of ties, we randomly sample from the MFA responses. If the mfa dictionary does not contain appropriate responses, we return ``None`` instead. .. code-block:: python :linenos: def predict(self, item, **kwargs): # Create the syllogism object syllogism = ccobra.syllogistic.Syllogism(item) # Return the personal MFA if available personal_prediction = self.get_mfa_prediction(item, self.mfa_personal) if personal_prediction is not None: return syllogism.decode_response(personal_prediction) # Return the population MFA if available population_prediction = self.get_mfa_prediction(item, self.mfa_population) if population_prediction is not None: return syllogism.decode_response(population_prediction) # Return a random response if no MFA data is available return item.choices[random.randint(0, len(item.choices) - 1)] To generate the actual prediction for a syllogistic task (represented by the item object passed to the predict function), we first check if a personal MFA prediction is available. If not, we proceed by checking the population mfa data, and ultimately if not MFA can be generated for the problem to solve, we return a random response from the list of possible choices. Adaption :::::::: .. code-block:: python :linenos: def adapt(self, item, target, **kwargs): # Extract the encoded task and response syllogism = ccobra.syllogistic.Syllogism(item) encoded_task = syllogism.encoded_task encoded_response = syllogism.encode_response(target) # Prepare the response counter for this task if not present already if encoded_task not in self.mfa_personal: self.mfa_personal[encoded_task] = dict() # Increment the response count for the present task self.mfa_personal[encoded_task][encoded_response] = \ self.mfa_personal[encoded_task].get(encoded_response, 0) + 1 To adapt the model, we increment the corresponding count in the personal mfa dictionary. This essentially reflects a single iteration of the ``pre_train_person`` function. .. note:: Since the pre_train_person function often performs just a sequence of adaption steps, the :class:`~ccobra.CCobraModel` class implements this as a default functionality. Hence, for our implementation of the MFA model, we could have omitted the ``pre_train_person`` function without altering the model's behavior.