My First Model¶
Overview¶
CCOBRA provides an interface class called CCobraModel
which defines
the set of function used to handle the framework-model communication:
Function name |
Required |
Description |
---|---|---|
|
no |
Provide meta-information about the evaluation setting. |
|
no |
Provide information about the participant to be predicted for. |
|
no |
Callback hook for when evaluation for a participant ends. Allows the model to log information. |
|
no |
Provides data for training on unrelated examples. |
|
no |
Provides data for training on responses by the participant to be predicted for. |
|
no |
Provides data for training on external data from the participant to be predicted for. |
|
yes |
Queries the model for a prediction for a specific task. |
|
no |
Provides the true participant response to allow for online learning. |
Model Object Lifetime¶
The evaluation paradigm of CCOBRA fundamentally relies on the comparison of predictions with the true responses given by individual participants. As a result models are queried individually for each task a human participant provided a response for. In some circumstances (e.g., when leave-one-out crossvalidation is performed), the responses given by all remaining participants is used as training data for the model.
To ensure that no responses provided as part of the training dataset leak into the respective participant evaluations, CCOBRA clearly defines the lifespan of a model to only last for a single participant. Technically, this means that after all predictions for a participant have been obtained, a new instance of the model is created to provide a clean starting point for the evaluation of the next participant.
For model developers, this means that no clean-up or reset steps are required. Models can be developed in the mindset that they will only be applied to predict responses for a single participant.
Implementing my first model¶
For this tutorial, we will redevelop the MFA model for syllogistic reasoning. This model is based on the idea that a storage of known human responses to syllogistic problems is maintained and predictions are generated by computing the most-frequent answer to the problem in question.
The full model can be found here.
Constructor¶
1 2 3 4 5 6 7 8 9 10 | import random
import ccobra
class MFAModel(ccobra.CCobraModel):
def __init__(self, name='MFAModel', k=1):
super(MFAModel, self).__init__(name, ["syllogistic"], ["single-choice"])
# Initialize member variables
self.mfa_population = dict()
self.mfa_personal = dict()
|
Our model’s constructur starts by calling the constructor of the parent class
CCobraModel
passing information about the supported domains ['syllogistic']
and
the supported response types ['single-choice']
. In this case, our model is only
applicable to the domain of syllogistic reasoning and expects that only a single response
conclusion was given by experimental participants.
Subsequently, the model initializes its member variables, i.e., dictionaries to store the observed responses for both, the general population (extracted from the pre-training data) and for the personal responses (i.e., the responses given by the participant to be predicted).
Pre-Training¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def pre_train(self, dataset):
# Iterate over subjects in the dataset
for subj_data in dataset:
# Iterate over the task for an individual subject
for task_data in subj_data:
# Create the syllogism object and extract the task and response encodings
syllogism = ccobra.syllogistic.Syllogism(task_data['item'])
encoded_task = syllogism.encoded_task
encoded_response = syllogism.encode_response(task_data['response'])
# Prepare the response counter for this task if not present already
if encoded_task not in self.mfa_population:
self.mfa_population[encoded_task] = dict()
# Increment the response count for the present task
self.mfa_population[encoded_task][encoded_response] = \
self.mfa_population[encoded_task].get(encoded_response, 0) + 1
|
For the pre-training step, we iterate over all the experimental participants and for each
of them over the individual tasks contained in the dataset. In the innermost loop, the
task_data
variable contains an item
and the corresponding response
. Here, the
item refers to an instance of CCOBRA’s Item
class, which contains all the information
available with respect to the presented problem (e.g., premises, response choices, etc.).
To be able to match structurally equivalent syllogism even though they may contain different
categorical terms (e.g., All A are B; Some B are C and
All farmers are models; Some models are runners), we need to convert the task information
from the item container into a standardized form. In syllogistic reasoning, there exists
a common form of abbreviating tasks by encoding the quantifiers using capital letters (e.g.,
AI1 for the example from before). This conversion step is handled internally by the
Syllogism
helper class which is instantiated in line 7. Consequently, we can use this
object to extract encodings for the task and response.
Finally, we populate the mfa dictionary for the population data (self.mfa_population
)
using the encoded tasks and responses. The mfa dictionary thus maps from syllogistic task
encodings to response count dictionaries.
Person-Training¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def pre_train_person(self, dataset):
# Iterate over the given tasks for the individual subject to be predicted for
for task_data in dataset:
# Create the syllogism object and extract the task and response encodings
syllogism = ccobra.syllogistic.Syllogism(task_data['item'])
encoded_task = syllogism.encoded_task
encoded_response = syllogism.encode_response(task_data['response'])
# Prepare the response counter for this task if not present already
if encoded_task not in self.mfa_personal:
self.mfa_personal[encoded_task] = dict()
# Increment the response count for the present task
self.mfa_personal[encoded_task][encoded_response] = \
self.mfa_personal[encoded_task].get(encoded_response, 0) + 1
|
Person training is performed in precisely the same way as pre-training except for the fact that
we now use the self.mfa_personal
dictionary to store response counts.
Predictions¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | def get_mfa_prediction(self, item, mfa_dictionary):
# Extract the encoded task
syllogism = ccobra.syllogistic.Syllogism(item)
encoded_task = syllogism.encoded_task
encoded_choices = [syllogism.encode_response(x) for x in item.choices]
if encoded_task in mfa_dictionary:
# Extract the potential MFA responses which are allowed in terms
# of the possible response choices
potential_responses = []
for response, count in mfa_dictionary[encoded_task].items():
if response in encoded_choices:
potential_responses.append((response, count))
# If potential responses are available, determine the one with
# maximum frequency
if potential_responses:
max_count = -1
max_responses = []
for response, count in potential_responses:
if count > max_count:
max_count = count
max_responses = []
if count >= max_count:
max_responses.append(response)
# In case of ties, draw the MFA response at random from the options
# with maximum frequency
encoded_prediction = max_responses[random.randint(0, len(max_responses) - 1)]
return encoded_prediction
# If no MFA response is available, return None
return None
|
To prepare for our actual prediction generation function, we need to provide a function that extracts the most-frequent answer from our mfa dictionaries. For this, we first check whether MFA information for the task to be predicted is available (line 7). If this is the case, we retrieve only those response candidates that are valid with respect to the list of possible choices from the task item. Subsequently, we extract the respons with maximum frequency and return it. In case of ties, we randomly sample from the MFA responses.
If the mfa dictionary does not contain appropriate responses, we return None
instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def predict(self, item, **kwargs):
# Create the syllogism object
syllogism = ccobra.syllogistic.Syllogism(item)
# Return the personal MFA if available
personal_prediction = self.get_mfa_prediction(item, self.mfa_personal)
if personal_prediction is not None:
return syllogism.decode_response(personal_prediction)
# Return the population MFA if available
population_prediction = self.get_mfa_prediction(item, self.mfa_population)
if population_prediction is not None:
return syllogism.decode_response(population_prediction)
# Return a random response if no MFA data is available
return item.choices[random.randint(0, len(item.choices) - 1)]
|
To generate the actual prediction for a syllogistic task (represented by the item object passed to the predict function), we first check if a personal MFA prediction is available. If not, we proceed by checking the population mfa data, and ultimately if not MFA can be generated for the problem to solve, we return a random response from the list of possible choices.
Adaption¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | def adapt(self, item, target, **kwargs):
# Extract the encoded task and response
syllogism = ccobra.syllogistic.Syllogism(item)
encoded_task = syllogism.encoded_task
encoded_response = syllogism.encode_response(target)
# Prepare the response counter for this task if not present already
if encoded_task not in self.mfa_personal:
self.mfa_personal[encoded_task] = dict()
# Increment the response count for the present task
self.mfa_personal[encoded_task][encoded_response] = \
self.mfa_personal[encoded_task].get(encoded_response, 0) + 1
|
To adapt the model, we increment the corresponding count in the personal mfa dictionary. This
essentially reflects a single iteration of the pre_train_person
function.
Note
Since the pre_train_person function often performs just a sequence of adaption steps,
the CCobraModel
class implements this as a default functionality. Hence,
for our implementation of the MFA model, we could have omitted the pre_train_person
function without altering the model’s behavior.