report-consumer-friendly-scoring
Dieses Dokument ist Teil der Anfrage „Gutachten des Sachverständigenrats für Verbraucherfragen“
46 Areas for action:the state of research
V. Baseline data
from algorithm developers’ point of view and attach
1. Accuracy, currency and higher priority to better data quality than to more re-
fined scoring algorithms 34 Britz (2008) reports on the
completeness significance of pure and simple input errors, though
without citing individual evidence.
Errors in the baseline data may directly affect the func-
tional quality of an algorithm, both for the computation For reasons of data protection, as little personal infor-
and for the operational application of a score. In the mation as possible is collected. This creates challeng-
most extreme case, the score assigned to a consumer es when it comes to matching a person unequivocally
is simply wrong because one person has been confused with a set of data. The challenges are even greater in
with another. We shall deal in more detail in the follow- the case of people who have come to Germany from
ing sections with quality problems relating to input data countries such as Bulgaria, Russia, India or Thailand,
used in the calculation of a score. In the first instance we whose names on their personal documents must be
shall examine the sources of error in the application of a transcribed into Latin script. And it seems logical, for in-
score, which has not been the subject of much research. stance, that people moving home to a different locality
might cause numerous problems of entity recognition
Possible sources of error in the application of credit in cases where a score provider has to start by establish-
scores which can be gleaned from specialised literature ing a new data set, which, because of a possible lack of
and from consumer complaints include missing data, data linkage, contains no information about a person’s
outdated information and mistaken identity (see, for credit history. If that person’s creditworthiness then has
example, Schröder et al., 2014). In the realm of telemat- to be estimated by means of geo-scoring, it may be that
ics-based motor insurance, it is reported that data have he will be assigned an unduly poor score (see also sec-
either been wrongly recorded or that errors have been tion B.V.2 below).
introduced through linkage with flawed data. Consumers
have been scored too low, for example, because of out- Credit scorers are particularly familiar with the challeng-
dated maps (see Part C below). In these cases, speeding es, which are addressed on a daily basis, as in the case
was wrongly recorded when the driver was adhering to of the General Credit Protection Agency (Schufa), by a
the speed limit in an area where, some time ago, there large team of specialists. The number of queries and
had been road works accompanied by speed restrictions. complaints received by Schufa, moreover, is extremely
small – and the subject of queries and complaints is cer-
With regard to the quality of entity recognition, i. e. the tainly not confined to mistakes in entity recognition. Ac-
assignment of the right data to a person, which repeat- cordingly, entity recognition problems cannot be all too
edly features prominently in anecdotal evidence report- great in number, but every time they occur they feed the
ed in the press (cf. for example Seibt, 2018), there is little bad reputation that scoring has among many people.35
research material. Verbeke et al. (2012) highlight the im-
portance of data quality in general
34 In the case of machine-learning technology, errors in the training data are also particularly irritating, because by creating an inaccurate model they distort all
predictions, even if the various items of input data fed into the model are correct.
35 A further impression is obtained if one considers, for example, the number of cases handled by the Schufa customer service centre for private individuals. The
Ombudsman’s activity report contains the following account: “About 1,000 new questions, comments and even complaints from consumers are received daily about
information stored in the Schufa database. In the […] department, a great deal of the focus is on the accuracy of data. Four teams with a total of 75 desk officers,
who include law graduates, legal secretaries, notarial secretaries and bank clerks, check whether the customers’ information and comments are correct and whether
entries may have to be amended or deleted” (Schufa Holding AG, 2018, pp. 36–7). If the 1,000 consumer problems per day are compared with the daily total of
400,000 or so requests made by Schufa clients (ibid.), a problem ratio of 0.25% emerges. This is very low, but also no doubt represents the minimum extent of the
actual data problems, since in many cases these problems will probably not be noticed by scored persons.
Areas for action:the state of research 47
Practitioners have made us aware of potential entity rec- alone.36 There have also been reports of cases in which
ognition problems along the whole chain of participants a consumer’s forename was used to deduce his likely
when Schufa scores are used. Even if the provider deliv- age (Unabhängiges Landeszentrum für Datenschutz
ers a score that is assigned to the right person, a misas- Schleswig-Holstein and GP Forschungsgruppe, 2014).
signment on the part of the requester may still occur. It
is possible, for instance, that the delivered score is not A similar case arises in telematics-based motor insur-
correctly assigned in the decision-making process itself, ance, where night-time and urban driving often impact
since in banking and insurance, but also in the realms of adversely on a person’s score.37 Even if it were possible
mail order and e-commerce, numerous data sources are to prove for the entire insured population that a causal
consulted when business decisions are taken. As soon relationship existed between the time of day and/or the
as more than one data source is used, correct entity rec- location of car journeys and accident probability and
ognition can no longer be taken for granted and is open even though such a relationship seems intuitive (longer
to error in the absence of an unequivocal identification reaction times and possible drink driving during the
number. There is plainly a need for research here so that night, and greater volumes of traffic in towns and cities
the actual scale of the problem can be better assessed plus higher stress levels than on country roads), it can-
in the first instance. not be concluded that the causal relationship increases
the probability that an individual will have an accident.
Particularly experienced city drivers and night-shift
workers are possibly being wrongly marked down and
may even be subject to direct discrimination in some
circumstances (see also chapter B.II above).
2. Use of proxy variables
Such variables that do not directly measure a consumer
Paradoxically, most credit scorers, Schufa being an ex- attribute but approximate it on the basis of other availa-
ception, do not possess detailed information about ble data are labelled proxy variables. In empirical social
consumers’ individual credit situations. Instead, these research it is generally recognised that recourse to proxy
scorers often resort to socio-demographic and/or mi- variables is possible if information on a characteristic is
crogeographic data, for example to draw conclusions not accessible or if accessing it would be unduly costly or
about the payment discipline of individual consumers time-consuming. Recourse to proxy variables must, how-
from the characteristics of their residential environment ever, be well justified, the proxy variable must be highly
(Kamp and Weichert, 2005). The payment discipline of correlated with the missing consumer attribute and the
individual consumers is deduced, for instance, from the limits on the information value of the overall model that
average number of negative attributes, such as institut- result from the use of proxy variables must be made trans-
ed debt-collection procedures, judicial debt-recovery parent. On the use of proxy variables in the context of cred-
proceedings and enforcement proceedings, in the same it scoring, see Berg, Burg, Gombović and Puri, 2018).
block of flats or the average number of negative attrib-
utes per household in the same street; this is known
as geo-scoring. Although the sole use of address data
is prohibited under section 31(1)(3) of the Federal Data
Protection Act (see subsection E.I.1 below for more de-
tails), in proceedings against the Hamburg- based credit
reference agency Bürgel in 2017, the court found that the
agency had derived a customer’s score from his address
36 Hamburg Local Court judgment of 16 March 2017 – case reference 233 OWi 12/17. See the Heise Online report at https://www.heise.de/newsticker/meldung/
Datenschutzverstoss-15-000-Euro-Bussgeld-wegen-Geoscoring-3664654.html, accessed on 15 May 2018
37 https://www.sparkassen-direkt.de/telematik/faq/; accessed on 24 May 2018
48 Areas for action:the state of research
If, on the other hand, proxy variables are used to predict
individual consumer behaviour – with considerable eco- 3. Weighting of input
nomic implications for consumers in some cases – the
need to justify the use of such variables for scoring pur-
variables
poses is far greater. Given the individuality of the scored
person, the risk of the complete misjudgement known as The score that is assigned to a consumer depends es-
the ecological fallacy (Kamp and Weichert, 2005) is great- sentially on the nature of the individual data and their
est in these circumstances. This fallacy entails wrongly relative weighting. In an extreme case, this may mean
deducing individual data from aggregated (‘ecologi- that consumers who tick most of the boxes for which
cal’) data. From information about living conditions in points are awarded nevertheless receive a lower over-
the neighbourhood of a scored person, conclusions are all score because disproportionately more weight is
drawn about that person’s financial situation in general attached to other data items. Three different weighting
and about his likelihood of defaulting on a loan in par- systems can be distinguished:
ticular (Kamp and Weichert, 2005). From a correlation
that appears to exist within the population as a whole, a Weighting on the basis of regression parameters: In an
causal relationship is inferred for an individual. algorithmic decision-making process, the weighting
of a consumer attribute is determined by the influ-
This is acceptable and expedient from the point of view ence of that predictor variable on the target variable;
of a business that seeks to avoid payment defaults and is the weighting is often defined in the form of parame-
not perturbed by the loss of turnover but not in the eyes ters in regression equations but can also be defined in
of an individual who is wrongly assessed. In the case of other ways (see the background note above as well as
geo-scoring, the ecological inference fallacy makes con- Unabhängiges Landeszentrum für Datenschutz Schle-
sumers jointly responsible for their neighbours’ miscon- swig-Holstein and GP Forschungsgruppe, 2014). How-
duct and curtails their sovereignty. It follows that a per- ever, since providers of credit scores regard the weight-
son cannot improve his score by gradually altering his ing of consumer attributes as part of their trade secret,
own behaviour but can only do so by taking an invasive a view that the Federal Court of Justice has endorsed,
measure such as moving to a ‘better’ neighbourhood, there are scarcely any research findings on how the
that is to say an area where average solvency ratings are weighting is determined and whether it is objective.
higher. There are no specific statutory provisions governing the
weighting of predictor variables.
Heuristic methods used in corporate practice: In the
realm of telematics-based motor insurance – for ex-
ample in the S-Drive tariff offered by the insurer Spar-
kassen DirektVersicherung – a rule of thumb (heuristic
approach) is used whereby the consumer is scored on
the basis of the attributes driving style (acceleration
and braking), speeding, night driving and urban driving,
which are weighted very neatly at 40%, 30%, 20% and
10% respectively.38 It is impossible to say whether and
to what extent these weightings reflect the relative in-
fluence of each of the listed consumer attributes on the
target variable.
38 Abgerufen am 23. Mai 2018 von URL https://www.sparkassen-direkt.de/telematik/faq/.
Areas for action:the state of research 49 Model-independent weighting: A familiar feature of bo- nus programmes is that the number of available bonus points is not necessarily determined by the actual ben- eficial effect of a health-promoting activity (consumer attribute) on a consumer’s health (target variable), i. e. is not necessarily model-dependent, but may relate to the time and/or expense that the consumer devotes to the activity (model-independent weighting). Since there are scarcely any research findings on model-in- dependent weighting, in-depth discussion of this issue is required, which we shall undertake in Part C. The link, i. e. the statistical correlation, between a predictor vari- able and the target variable is therefore largely severed in the case of these weighting factors. Adverse effects on consumers may result from the weighting of attributes in the following cases: (1) if the applied weighting factors are not clear to the competent supervisory authority, or possibly even to consumers, and the impact of consumers’ own behaviour on their score is unforeseeable; (2) if weighting factors vary con- siderably between scorers, in other words if a consumer attribute to which one score provider attaches a great deal of weight is irrelevant to another provider, a sit- uation that is most likely to occur if weighting factors are not based on objectifiable criteria such as the val- ue of the regression parameters; (3) if weighting factors change over the course of time.
50 Areas for action:the state of research
VI. Competing fairness criteria
Another scoring problem arises in connection with the Fairness in the calculation of scores is a fundamental
specific composition of data sets that are used to de- problem that arises irrespective of the method used in
termine a score. In the realm of machine learning, this practice – even rules of thumb and other heuristic meth-
kind of data set that is used for statistical analyses is ods can be unfair. The literature on fair machine learn-
known as a training data set (details on the principles of ing sheds the best light on these fundamental problems,
machine learning are presented in chapters IV.1 and IV.2 discussing quantitative methods designed to guarantee
and in Gesellschaft für Informatik, 2018, section 4.1). The the most comprehensive possible equal treatment of
important thing in this context is that specific elements individual groups and persons (Dwork, Hardt, Pitassi,
of the baseline data impact directly on the structure and Reingold and Zemel, 2012; Gesellschaft für Informatik,
predictive power of the model. For example, in a data set 2018; Kleinberg, Mullainathan and Raghavan, 2016). The
relating to creditworthiness, three quarters of all per- key conceptual terms in this discussion are overall ac-
sons deemed creditworthy may be female and only one curacy equality, statistical parity, conditional procedure
quarter male. In reality – in this case after adjustment for accuracy equality, conditional use accuracy equality
‘other’ genders – the balance between the sexes is about and treatment equality.
50/50, and so we have here an imbalance in the baseline
data from which any statistical software (‘learning algo- Berk et al. (2017) summarise the combination of all five
rithm’) is very likely to infer that attributes other than aspects of algorithmic fairness in the concept of ‘total
gender play only a minor part and that the probability fairness’ (cf. Gesellschaft für Informatik, 2018). It is most
of creditworthiness depends primarily on the sex of the likely impossible, however, to achieve total fairness by
applicant (Gesellschaft für Informatik, 2018). This phe- adjusting algorithms, because the various fairness crite-
nomenon is sometimes referred to in literature as bias ria are in competition with each other in the sense that
amplification (Zhao, Wang, Yatskar, Ordonez and Chang, all of the fairness criteria could never be fulfilled simul-
2017; cf. Gesellschaft für Informatik, 2018). taneously. This conclusion is reached by Chouldechova
(2017) and Kleinberg et al. (2016), who analyse three
measures of fairness and show that no method cur-
rently in existence meets all three quantitative fairness
criteria simultaneously. It is highly probable, because of
the prevalence of different risks among different groups,
that there will never be a method that can achieve all
fairness criteria at the same time.
Areas for action:the state of research 51
Competing measures of fairness:
a numerical example39
Let us assume that there is a wide variation in the ac- The achievement of quantitative fairness also has a
tual risk of payment default between two groups, say side-effect: if all consumer attributes which have a sta-
low earners and high earners, whose respective default tistically significant effect on the target variable are se-
probability rates are 5% and 0.5%. Assuming that a score lected – which modern machine-learning processes do
has the same predictive power of 90% accuracy for both automatically – these may include discriminatory and
groups, the number of right and wrong predictions per hence legally protected grounds or closely associated
group will be as follows if each group comprises 10,000 attributes. If discriminatory grounds are eliminated
persons: from the statistical model on that account, the statistical
model as a whole will become more imprecise. The more
In the high-risk group there will be 500 payment defaults attributes that are removed because of their association
(5% of 10,000), of which 450 will have been predicted with membership of a particular group, the more preci-
by the score and 50 will have been missed. Of the 9,500 sion will be lost, and the quality of the statistical model
cases in which there is no default (95% of 10,000), 8,550 will suffer accordingly (Gesellschaft für Informatik, 2018).
will have been correctly predicted thanks to the pre- In short, this creates an irresoluble conflict of aims be-
dictive power of 90%, but 950 will have been wrongly tween avoidance of recourse to protected grounds, even
marked as likely defaulters. The percentage of correctly though they may be significantly influential, and the
predicted payment defaults for the high-risk group will quality of the score. If statistically significant variables
therefore be 450 ÷ (450 + 950) = 32%. are not used, more people will be inaccurately scored.
It follows in turn that we are confronted with conflicting
For the low-risk group there will be 50 payment defaults fairness criteria which, in general, cannot be simultane-
(0.5% of 10,000), of which 45 will have been predicted by ously achieved. An optimum solution must be sought on
the score and 5 will have been missed. Of the 9,950 cases the basis of fairness priorities.
in which there is no default (99.5% of 10,000), 8,995 will
have been correctly predicted thanks to the predictive Which measures of fairness are to be prioritised and
power of 90%, but 955 will have been wrongly marked which are to be subordinated cannot be decided by
as likely defaulters. The percentage of correctly predict- mathematical formulae and machine learning. What is
ed payment defaults for the low-risk group will therefore needed is social accord on the legitimate purposes and
be 45 ÷ (45 + 955) = 4%. uses of attributes.
The difference in the percentage of correct predictions
stems from the fact that, where the risk is minimal, very
many non-risky cases are wrongly classified (false pos-
itives). The percentage of correct default predictions
would only be roughly equal for both groups if the algo-
rithm for the low-risk group were ten times more accu-
rate than that for the high-risk group, in other words if
the predictive power of the algorithms were about 99%
and 90% respectively. This is a very unlikely scenario
and does not generally occur.
The consequence for our example is that, if both groups
are treated equally in terms of accuracy, i. e. specificity
(conditional procedure accuracy equality), the low-risk
group runs a far higher risk of false positive assessment
(conditional use accuracy equality).
39 Cf. also the definitions and specimen calculations in Gesellschaft für Informatik, 2018, sections 4.3.1 and 4.3.2.
52 Areas for action:the state of research
VII. Consumers and society:
expectations, knowledge,
competence and implications
This chapter provides an overview of the state of re- other question that arises is whether and to what extent
search into consumer expectations and acceptance of consumers tend to approve or disapprove the linking of
scoring and into consumers’ scoring-related knowledge their scores and predictor attributes from various areas
and digital literacy in Germany. It should be said from of activity. Another issue entirely is whether society as a
the outset that scarcely any independent academic whole shares the consumers’ appraisal of what is war-
studies in Germany have examined consumers’ knowl- ranted and legitimate. In a democratic society, finding
edge and digital literacy regarding established scoring this out is ultimately incumbent on the parliamentary
systems, such as credit scoring, and potentially novel legislature, which must translate these moral and eth-
systems (in areas such as healthcare or in the calcula- ical perceptions into statutory rules or else decide to
tion of composite scores from various areas) as well as refrain from regulatory intervention.
the associated implications (for exceptions, see Fischer
and Petersen, 2018, although that work relates to algo- With regard to traditional credit scoring, acceptance of
rithmic decisions in general, Müller-Peters and Wagner, the communicated scores clearly seems to be relatively
2017, and PricewaterhouseCoopers, 2018). For this rea- low. In a representative study, for example, more than
son, the SVRV commissioned a representative survey half to three quarters of the respondents considered
(see Part D), one of the aims of which was to form a pic- their score to be unfair, although the level of acceptance
ture of knowledge and acceptance of scoring in various of scores depends on the company from which consum-
areas of life among the resident population of Germany. ers obtained their personal credit records (Unabhängiges
Landeszentrum für Datenschutz Schleswig-Holstein and
GP Forschungsgruppe, 2014). In general terms, accept-
ance is greater where scores are higher, i. e. more favour-
able, but not even high scores are necessarily perceived
as fair. We can only speculate on the reasons for this sit-
1. Consumers’ expectations uation. In the same survey, for instance, almost half of
the respondents stated that they found the explanations
and acceptance of scoring given by credit reference agencies to be inadequate and
often incomprehensible. More than 80% of the respond-
If consumer policy relating to scoring is to be shaped in ents, moreover, wished for more transparency and infor-
such a way as to focus on the justified and socially le- mation from credit reference agencies and supported an
gitimate expectations of consumers, the first task will be information access obligation for those agencies (Un-
to identify these expectations. Credit scoring has a long abhängiges Landeszentrum für Datenschutz Schleswig-
tradition, but scoring practices in other areas, such as Holstein and GP Forschungsgruppe, 2014).
telematics-based motor insurance and health insurance,
are relatively new phenomena which are only gradually Social scoring, a novel method used to determine cred-
coming to play a part in various aspects of consumers’ itworthiness in which data are obtained from social
lives; accordingly, independent and informative aca- networks, was assessed by more than half of the re-
demic studies that shed light on consumers’ attitudes spondents (56% of a sample numbering 1,023) as risky
and expectations relating to scoring are still a rarity in in a representative study dated 2018 (Pricewaterhouse-
Germany. It therefore seems advisable to establish em- Coopers, 2018). The majority of respondents (71%) stat-
pirically in which areas, to what extent and in what form ed that they saw the danger of flawed conclusions being
the scoring of consumers is regarded in Germany as le- drawn in credit reports as a result of the use of data from
gitimate and in what cases it is held to be unwarranted. social networks. More than half of the respondents in
the 18–25 age group, however, said that they would fa-
For example, which attributes do consumers regard as vour social scoring if certain transparency criteria were
legitimate and justified predictor variables for the as- met, such as disclosure of the calculated score and in-
sessment of their creditworthiness or of their motor or formation on the data used for scoring purposes (Price-
health insurance premiums and which do they not? An- waterhouseCoopers, 2018).
Areas for action:the state of research 53
Germany’s National Academy of Science and Engineer- er as a means of having their premiums considerably
ing (acatech) had a representative survey conducted at reduced. In addition, basing premiums on modifiable
the end of 2017 (sample size 2,002) on the subject of attributes such as careful driving tends to be accepted
technology with the main focus on digitisation (acat- by the majority of respondents, whereas factors that
ech and Körber-Stiftung, 2018). In general, digital tech- drivers cannot influence through their driving behav-
nology was viewed rather sceptically by the majority iour, such as whether their driving is done during the
of respondents. In the case of autonomous driving, it day or at night, tend to be rejected as pricing variables
emerges that the main concern expressed by the major- by the majority. In the case of health insurance, be-
ity of respondents relates to data security. In addition, tween half and two thirds of respondents consider it fair
the collection of personal data by the vehicle received that modifiable behavioural criteria such as whether a
a disapproval rating from almost two thirds of the re- person attends screening examinations or smokes or
spondents. Since telematics-based insurance tariffs drinks to excess should be considered when premiums
involve the recording of comparable data, their accept- are set. The vast majority, however, believe that attrib-
ance is also in doubt. utes which cannot be changed, such as a family history
of particular medical conditions, should not be factored
A widespread uneasiness regarding the disclosure of into the calculation of insurance premiums. In principle,
personal data seems to indicate an underlying scep- more than a third of respondents would sign up for a
ticism about scoring-based business models founded lifestyle-based health-insurance tariff if it would save
on big-data analyses. For example, two thirds of the re- them money (Müller-Peters and Wagner, 2017).
spondents in Germany fear that companies are collect-
ing excessive personal data through the Internet. The Consumers recognisably tend to be prepared in principle
greatest concerns, each echoed by almost 80% of the to make personal behavioural data available to insurers,
respondents, relate to the buying and selling of personal particularly if it can obtain them a price cut (Müller-Pe-
information, to a general lack of protection of personal ters and Wagner, 2017). Which attributes consumers re-
data and to the danger that personal information could gard as acceptable and unacceptable for consideration
come under surveillance (Centre for International Gov- in the calculation of insurance premiums appears to
ernance Innovation, 2017). It is not easily fathomable, depend on the type of insurance (motor or health insur-
however, why so many consumers nevertheless consent ance) and the modifiability of the attributes (e. g. family
quite readily to the storage and processing of their data. medical history versus alcohol consumption).
The offices of Data Protection Commissioners are not
by any means inundated with thousands of complaints It has so far remained a moot point, however, wheth-
from consumers putting up resistance against being er individual factors such as being personally affected
half-coerced into consent. There is a considerable diver- (e. g. one’s state of health), socio-economic status, de-
gence between political ideals and practical action. mographic variables and specific attitudes to things like
data privacy and technology as well as one’s locus of
Consumer acceptance and expectations of telemat- control alter consumers’ attitude to and acceptance of
ics-based motor insurance and lifestyle-based health scoring. If a high-definition image of consumer expecta-
insurance, including potential future developments in tions and acceptance of scoring is to be obtained so that
these fields, was the subject of a survey conducted on tailor-made measures of consumer policy can be adopt-
behalf of Cologne University of Applied Sciences (TH ed, it is important that specific consumer categories be
Köln).40 Almost half of the respondents (46% of a sam- identified.
ple of 834) could imagine having data on their driving
behaviour recorded and passed on to their motor insur-
40 The survey was conducted by an institute closely associated with the insurance industry; the findings, however, are essentially comparable with our own survey (see
Part D).
54 Areas for action:the state of research
Insurers who make initial forays into scoring often ad- 2. Knowledge and
vertise only beneficial implications, that is to say a bo-
nus system whereby policyholders, by behaving in par-
competence
ticular ways, can collect points with the prospect that a
certain number of points will qualify them for material
rewards or a reduction in their insurance premiums. 2.1. Consumers and algorithms: knowledge
and attitudes
The opposite scenario, that particular behaviour is lia- According to a recent survey on a representative sam-
ble to have adverse consequences, in other words a sys- ple of 1,221 persons by the Bertelsmann Foundation
tem of penalty points whereby a policyholder’s conduct (Fischer and Petersen, 2018), a lack of knowledge about
may, for example, increase his or her insurance premi- algorithms and ambivalence or reservations about their
ums, has not yet been incorporated into the building use are prevalent among most of the resident popu-
blocks of a telematics-based insurance tariff. Frequen- lation of Germany. Since scoring models are based on
cy of accidents has never yet been one of the variables algorithms as a rule, excerpts from the findings of that
that are used in calculating a person’s score. The bonus survey are presented in the following paragraphs.
programmes offered by the statutory health insurance
scheme also entail only bonuses, and non-participation Although three out of four respondents in the study say
in measures does not result in penalties such as dear- that they have heard the term ‘algorithm’, almost half
er insurance premiums, and this indeed is in line with of the sample cannot describe spontaneously what it
the relevant legal provision, section 65a(3) of Book V means. Of those who have heard at least once of algo-
of the German Social Code. It therefore seems logical rithms, however, more than half know nothing of how
that consumers’ attitudes to and acceptance of behav- algorithms basically work. Only one tenth of these re-
ioural premiums will also vary in accordance with the spondents claim to know how algorithms, as they un-
prospective consequences. If a telematics-based motor derstand the term, actually function. Whether the re-
insurance tariff offers only benefits, such as lower insur- spondents associate anything with the term ‘algorithm’
ance premiums for adherence to statutory speed limits, and know how algorithms work varies widely with age,
its acceptance level will presumably differ from that of a education level and gender: respondents with an Abi-
tariff which also involves financial penalties, i. e. as well tur, the German university entrance qualification, are
as not awarding bonus points to drivers who exceed the far more frequently able to express at least a vague un-
statutory speed limit, the insurer also penalises them by derstanding of algorithms than those with lower school
charging them more. For this reason, another objective qualifications, male respondents more frequently than
of the representative public survey commissioned by female respondents and persons under the age of 45
the SVRV was to establish the extent to which the ac- more frequently than the over-60s. The extent to which
ceptance or rejection of a behavioural pricing system for respondents are aware of the use of algorithms also var-
motor and health insurance premiums would be affect- ies from one area of activity to another: whereas more
ed if the system involved both bonus and penalty points than half of the respondents were aware – or claimed to
(see Part D below). be aware – that algorithms were used in individualised
advertising on the Internet, and half, or rather just less
than half, were aware that algorithms are used in facial
recognition in the context of video surveillance and in
the assessment of creditworthiness, only about a third
are aware that algorithms are used in some regions to
analyse staffing requirements and for police operations,
in which they identify areas where the risk of burglaries
is particularly high (predictive policing). And slightly
fewer than one fifth of respondents were aware that al-
gorithms can also be used by the judicial authorities to
assess the probability of reoffending.
Areas for action:the state of research 55
In the view of more than one third of the respondents In line with a predominantly unfavourable attitude to
in the Bertelsmann study, the risks inherent in algo- algorithms, almost two thirds of respondents across the
rithm-based decisions outweigh the opportunities they whole education and age spectrum support tighter con-
offer, whereas fewer than a fifth see them primarily as trol of algorithms. Measures designed to control the use
an opportunity (Fischer and Petersen, 2018). Almost half of algorithms such as compulsory indication of algorith-
of the respondents are undecided as to whether risks or mic decisions, disclosure of algorithms to independent
opportunities are preponderant. This suggests that a experts and the introduction of an ethics commission
large percentage of the resident population of Germany meet with the approval of the overwhelming majority of
has not yet formed a clear opinion on this matter. This respondents (Fischer and Petersen, 2018).
should not come as a surprise, since there is no clear
evidence yet of the actual risk-benefit ratio for many al- In general terms, the findings of the representative sur-
gorithm-based decisions, and even among experts the vey described above seem to indicate that there are
question remains a source of controversy. currently wide gaps in Germany in people’s knowledge
of what an algorithm is and how it works. The majority
A firm opinion among respondents is recognisable when of the population have scarcely looked into the subject
it comes to the question whether decisions should, as of algorithms, which ties in with the fact that only a mi-
a matter of principle, be made by algorithms or by hu- nority have a definite opinion on algorithms in general.
mans. A very large majority of respondents (79% of the At the same time, in many areas of activity decisions as-
sample of 1,221) feel uncomfortable with algorithmic sisted by or exclusively based on algorithms meet with
decisions and prefer human decisions. Broken down a great deal of scepticism and rejection. Accordingly, if
into areas of activity, a more differentiated picture algorithms in their various fields of application are to be
emerges: in spite of an overwhelming general rejection better understood and their pros and cons more objec-
of exclusively algorithm-based decisions, a majority tively assessed, it seems logical to pursue the aims of
would consent to the decision on efficient use and ad- reducing the knowledge deficit and developing digital
ministration of storage spaces being left to algorithms. literacy among the resident population of Germany.
Almost half, moreover, would be in favour of exclusively
algorithmic decisions on individualised online advertis- In addition, a balanced social debate should be initiat-
ing and spellchecking in the field of word processing. ed on the demonstrable implications of algorithms with
Most of the respondents, on the other hand, believe that a view to addressing fears, rejection and challenges.
the assessment of creditworthiness, medical diagnoses Equally, however, there is a need for education about
and identification of the probability of re-offending the empirically substantiated potential and opportu-
should be undertaken exclusively by humans or at most nities offered by new technologies and algorithms and
by humans taking decisions with the aid of algorithms. hence by scoring too.
The final decision, they believe, should be taken by a
person. In short, particularly in sensitive areas such as
creditworthiness, criminal justice and health, the ma-
jority oppose the use, or at least the exclusive use, of
algorithm-based decisions.