report-consumer-friendly-scoring

Dieses Dokument ist Teil der Anfrage „Gutachten des Sachverständigenrats für Verbraucherfragen

/ 142
PDF herunterladen
46   Areas for action:the state of research




     V.                     Baseline data


                                                                                               from algorithm developers’ point of view and attach
     1.	Accuracy, currency and                                                                higher priority to better data quality than to more re-
                                                                                               fined scoring algorithms 34 Britz (2008) reports on the
         completeness                                                                          significance of pure and simple input errors, though
                                                                                               without citing individual evidence.
     Errors in the baseline data may directly affect the func-
     tional quality of an algorithm, both for the computation                                  For reasons of data protection, as little personal infor-
     and for the operational application of a score. In the                                    mation as possible is collected. This creates challeng-
     most extreme case, the score assigned to a consumer                                       es when it comes to matching a person unequivocally
     is simply wrong because one person has been confused                                      with a set of data. The challenges are even greater in
     with another. We shall deal in more detail in the follow-                                 the case of people who have come to Germany from
     ing sections with quality problems relating to input data                                 countries such as Bulgaria, Russia, India or Thailand,
     used in the calculation of a score. In the first instance we                              whose names on their personal documents must be
     shall examine the sources of error in the application of a                                transcribed into Latin script. And it seems logical, for in-
     score, which has not been the subject of much research.                                   stance, that people moving home to a different locality
                                                                                               might cause numerous problems of entity recognition
     Possible sources of error in the application of credit                                    in cases where a score provider has to start by establish-
     scores which can be gleaned from specialised literature                                   ing a new data set, which, because of a possible lack of
     and from consumer complaints include missing data,                                        data linkage, contains no information about a person’s
     outdated information and mistaken identity (see, for                                      credit history. If that person’s creditworthiness then has
     example, Schröder et al., 2014). In the realm of telemat-                                 to be estimated by means of geo-scoring, it may be that
     ics-based motor insurance, it is reported that data have                                  he will be assigned an unduly poor score (see also sec-
     either been wrongly recorded or that errors have been                                     tion B.V.2 below).
     introduced through linkage with flawed data. Consumers
     have been scored too low, for example, because of out-                                    Credit scorers are particularly familiar with the challeng-
     dated maps (see Part C below). In these cases, speeding                                   es, which are addressed on a daily basis, as in the case
     was wrongly recorded when the driver was adhering to                                      of the General Credit Protection Agency (Schufa), by a
     the speed limit in an area where, some time ago, there                                    large team of specialists. The number of queries and
     had been road works accompanied by speed restrictions.                                    complaints received by Schufa, moreover, is extremely
                                                                                               small – and the subject of queries and complaints is cer-
     With regard to the quality of entity recognition, i. e. the                               tainly not confined to mistakes in entity recognition. Ac-
     assignment of the right data to a person, which repeat-                                   cordingly, entity recognition problems cannot be all too
     edly features prominently in anecdotal evidence report-                                   great in number, but every time they occur they feed the
     ed in the press (cf. for example Seibt, 2018), there is little                            bad reputation that scoring has among many people.35
     research material. Verbeke et al. (2012) highlight the im-
     portance of data quality in general




     34	In the case of machine-learning technology, errors in the training data are also particularly irritating, because by creating an inaccurate model they distort all
         predictions, even if the various items of input data fed into the model are correct.
     35	A further impression is obtained if one considers, for example, the number of cases handled by the Schufa customer service centre for private individuals. The
         Ombudsman’s activity report contains the following account: “About 1,000 new questions, comments and even complaints from consumers are received daily about
         information stored in the Schufa database. In the […] department, a great deal of the focus is on the accuracy of data. Four teams with a total of 75 desk officers,
         who include law graduates, legal secretaries, notarial secretaries and bank clerks, check whether the customers’ information and comments are correct and whether
         entries may have to be amended or deleted” (Schufa Holding AG, 2018, pp. 36–7). If the 1,000 consumer problems per day are compared with the daily total of
         400,000 or so requests made by Schufa clients (ibid.), a problem ratio of 0.25% emerges. This is very low, but also no doubt represents the minimum extent of the
         actual data problems, since in many cases these problems will probably not be noticed by scored persons.
48

Areas for action:the state of research                                                                                                                      47




Practitioners have made us aware of potential entity rec-                          alone.36 There have also been reports of cases in which
ognition problems along the whole chain of participants                            a consumer’s forename was used to deduce his likely
when Schufa scores are used. Even if the provider deliv-                           age (Unabhängiges Landeszentrum für Datenschutz
ers a score that is assigned to the right person, a misas-                         Schleswig-­Holstein and GP Forschungsgruppe, 2014).
signment on the part of the requester may still occur. It
is possible, for instance, that the delivered score is not                         A similar case arises in telematics-based motor insur-
correctly assigned in the decision-making process itself,                          ance, where night-time and urban driving often impact
since in banking and insurance, but also in the realms of                          adversely on a person’s score.37 Even if it were possible
mail order and e-commerce, numerous data sources are                               to prove for the entire insured population that a causal
consulted when business decisions are taken. As soon                               relationship existed between the time of day and/or the
as more than one data source is used, correct entity rec-                          location of car journeys and accident probability and
ognition can no longer be taken for granted and is open                            even though such a relationship seems intuitive (longer
to error in the absence of an unequivocal identification                           reaction times and possible drink driving during the
number. There is plainly a need for research here so that                          night, and greater volumes of traffic in towns and cities
the actual scale of the problem can be better assessed                             plus higher stress levels than on country roads), it can-
in the first instance.                                                             not be concluded that the causal relationship increases
                                                                                   the probability that an individual will have an accident.
                                                                                   Particularly experienced city drivers and night-shift
                                                                                   workers are possibly being wrongly marked down and
                                                                                   may even be subject to direct discrimination in some
                                                                                   circumstances (see also chapter B.II above).
2.	Use of proxy variables
                                                                                   Such variables that do not directly measure a consumer
Paradoxically, most credit scorers, Schufa being an ex-                            attribute but approximate it on the basis of other availa-
ception, do not possess detailed information about                                 ble data are labelled proxy variables. In empirical social
consumers’ individual credit situations. Instead, these                            research it is generally recognised that recourse to proxy
scorers often resort to socio-demographic and/or mi-                               variables is possible if information on a characteristic is
crogeographic data, for example to draw conclusions                                not accessible or if accessing it would be unduly costly or
about the payment discipline of individual consumers                               time-consuming. Recourse to proxy variables must, how-
from the characteristics of their residential environment                          ever, be well justified, the proxy variable must be highly
(Kamp and Weichert, 2005). The payment discipline of                               correlated with the missing consumer attribute and the
individual consumers is deduced, for instance, from the                            limits on the information value of the overall model that
average number of negative attributes, such as institut-                           result from the use of proxy variables must be made trans-
ed debt-collection procedures, judicial debt-recovery                              parent. On the use of proxy variables in the context of cred-
proceedings and enforcement proceedings, in the same                               it scoring, see Berg, Burg, Gombović and Puri, 2018).
block of flats or the average number of negative attrib-
utes per household in the same street; this is known
as geo-scoring. Although the sole use of address data
is prohibited under section 31(1)(3) of the Federal Data
Protection Act (see subsection E.I.1 below for more de-
tails), in proceedings against the Hamburg- based credit
reference agency Bürgel in 2017, the court found that the
agency had derived a customer’s score from his address



36	Hamburg Local Court judgment of 16 March 2017 – case reference 233 OWi 12/17. See the Heise Online report at https://www.heise.de/newsticker/meldung/
    Datenschutzverstoss-15-000-Euro-Bussgeld-wegen-Geoscoring-3664654.html, accessed on 15 May 2018
37 https://www.sparkassen-direkt.de/telematik/faq/; accessed on 24 May 2018
49

48   Areas for action:the state of research




     If, on the other hand, proxy variables are used to predict
     individual consumer behaviour – with considerable eco-                             3.	Weighting of input
     nomic implications for consumers in some cases – the
     need to justify the use of such variables for scoring pur-
                                                                                            variables
     poses is far greater. Given the individuality of the scored
     person, the risk of the complete misjudgement known as                             The score that is assigned to a consumer depends es-
     the ecological fallacy (Kamp and Weichert, 2005) is great-                         sentially on the nature of the individual data and their
     est in these circumstances. This fallacy entails wrongly                           relative weighting. In an extreme case, this may mean
     deducing individual data from aggregated (‘ecologi-                                that consumers who tick most of the boxes for which
     cal’) data. From information about living conditions in                            points are awarded nevertheless receive a lower over-
     the neighbourhood of a scored person, conclusions are                              all score because disproportionately more weight is
     drawn about that person’s financial situation in general                           attached to other data items. Three different weighting
     and about his likelihood of defaulting on a loan in par-                           systems can be distinguished:
     ticular (Kamp and Weichert, 2005). From a correlation
     that appears to exist within the population as a whole, a                          Weighting on the basis of regression parameters: In an
     causal relationship is inferred for an individual.                                 algorithmic decision-making process, the weighting
                                                                                        of a consumer attribute is determined by the influ-
     This is acceptable and expedient from the point of view                            ence of that predictor variable on the target variable;
     of a business that seeks to avoid payment defaults and is                          the weighting is often defined in the form of parame-
     not perturbed by the loss of turnover but not in the eyes                          ters in regression equations but can also be defined in
     of an individual who is wrongly assessed. In the case of                           other ways (see the background note above as well as
     geo-scoring, the ecological inference fallacy makes con-                           Unabhängiges Landeszentrum für Datenschutz Schle-
     sumers jointly responsible for their neighbours’ miscon-                           swig-Holstein and GP Forschungsgruppe, 2014). How-
     duct and curtails their sovereignty. It follows that a per-                        ever, since providers of credit scores regard the weight-
     son cannot improve his score by gradually altering his                             ing of consumer attributes as part of their trade secret,
     own behaviour but can only do so by taking an invasive                             a view that the Federal Court of Justice has endorsed,
     measure such as moving to a ‘better’ neighbourhood,                                there are scarcely any research findings on how the
     that is to say an area where average solvency ratings are                          weighting is determined and whether it is objective.
     higher.                                                                            There are no specific statutory provisions governing the
                                                                                        weighting of predictor variables.

                                                                                        Heuristic methods used in corporate practice: In the
                                                                                        realm of telematics-based motor insurance – for ex-
                                                                                        ample in the S-Drive tariff offered by the insurer Spar-
                                                                                        kassen DirektVersicherung – a rule of thumb (heuristic
                                                                                        approach) is used whereby the consumer is scored on
                                                                                        the basis of the attributes driving style (acceleration
                                                                                        and braking), speeding, night driving and urban driving,
                                                                                        which are weighted very neatly at 40%, 30%, 20% and
                                                                                        10% respectively.38 It is impossible to say whether and
                                                                                        to what extent these weightings reflect the relative in-
                                                                                        fluence of each of the listed consumer attributes on the
                                                                                        target variable.




     38	Abgerufen am 23. Mai 2018 von URL https://www.sparkassen-direkt.de/telematik/faq/.
50

Areas for action:the state of research                         49




Model-independent weighting: A familiar feature of bo-
nus programmes is that the number of available bonus
points is not necessarily determined by the actual ben-
eficial effect of a health-promoting activity (consumer
attribute) on a consumer’s health (target variable), i. e.
is not necessarily model-dependent, but may relate to
the time and/or expense that the consumer devotes
to the activity (model-independent weighting). Since
there are scarcely any research findings on model-in-
dependent weighting, in-depth discussion of this issue
is required, which we shall undertake in Part C. The link,
i. e. the statistical correlation, between a predictor vari-
able and the target variable is therefore largely severed
in the case of these weighting factors.

Adverse effects on consumers may result from the
weighting of attributes in the following cases: (1) if the
applied weighting factors are not clear to the competent
supervisory authority, or possibly even to consumers,
and the impact of consumers’ own behaviour on their
score is unforeseeable; (2) if weighting factors vary con-
siderably between scorers, in other words if a consumer
attribute to which one score provider attaches a great
deal of weight is irrelevant to another provider, a sit-
uation that is most likely to occur if weighting factors
are not based on objectifiable criteria such as the val-
ue of the regression parameters; (3) if weighting factors
change over the course of time.
51

50   Areas for action:the state of research




     VI.	Competing fairness criteria


     Another scoring problem arises in connection with the          Fairness in the calculation of scores is a fundamental
     specific composition of data sets that are used to de-         problem that arises irrespective of the method used in
     termine a score. In the realm of machine learning, this        practice – even rules of thumb and other heuristic meth-
     kind of data set that is used for statistical analyses is      ods can be unfair. The literature on fair machine learn-
     known as a training data set (details on the principles of     ing sheds the best light on these fundamental problems,
     machine learning are presented in chapters IV.1 and IV.2       discussing quantitative methods designed to guarantee
     and in Gesellschaft für Informatik, 2018, section 4.1). The    the most comprehensive possible equal treatment of
     important thing in this context is that specific elements      individual groups and persons (Dwork, Hardt, Pitassi,
     of the baseline data impact directly on the structure and      Reingold and Zemel, 2012; Gesellschaft für Informatik,
     predictive power of the model. For example, in a data set      2018; Kleinberg, Mullainathan and Raghavan, 2016). The
     relating to creditworthiness, three quarters of all per-       key conceptual terms in this discussion are overall ac-
     sons deemed creditworthy may be female and only one            curacy equality, statistical parity, conditional procedure
     quarter male. In reality – in this case after adjustment for   accuracy equality, conditional use accuracy equality
     ‘other’ genders – the balance between the sexes is about       and treatment equality.
     50/50, and so we have here an imbalance in the baseline
     data from which any statistical software (‘learning algo-      Berk et al. (2017) summarise the combination of all five
     rithm’) is very likely to infer that attributes other than     aspects of algorithmic fairness in the concept of ‘total
     gender play only a minor part and that the probability         fairness’ (cf. Gesellschaft für Informatik, 2018). It is most
     of creditworthiness depends primarily on the sex of the        likely impossible, however, to achieve total fairness by
     applicant (Gesellschaft für Informatik, 2018). This phe-       adjusting algorithms, because the various fairness crite-
     nomenon is sometimes referred to in literature as bias         ria are in competition with each other in the sense that
     amplification (Zhao, Wang, Yatskar, Ordonez and Chang,         all of the fairness criteria could never be fulfilled simul-
     2017; cf. Gesellschaft für Informatik, 2018).                  taneously. This conclusion is reached by Chouldechova
                                                                    (2017) and Kleinberg et al. (2016), who analyse three
                                                                    measures of fairness and show that no method cur-
                                                                    rently in existence meets all three quantitative fairness
                                                                    criteria simultaneously. It is highly probable, because of
                                                                    the prevalence of different risks among different groups,
                                                                    that there will never be a method that can achieve all
                                                                    fairness criteria at the same time.
52

Areas for action:the state of research                                                                                                                   51




Competing measures of fairness:
a numerical example39

Let us assume that there is a wide variation in the ac-                                   The achievement of quantitative fairness also has a
tual risk of payment default between two groups, say                                      side-effect: if all consumer attributes which have a sta-
low earners and high earners, whose respective default                                    tistically significant effect on the target variable are se-
probability rates are 5% and 0.5%. Assuming that a score                                  lected – which modern machine-learning processes do
has the same predictive power of 90% accuracy for both                                    automatically – these may include discriminatory and
groups, the number of right and wrong predictions per                                     hence legally protected grounds or closely associated
group will be as follows if each group comprises 10,000                                   attributes. If discriminatory grounds are eliminated
persons:                                                                                  from the statistical model on that account, the statistical
                                                                                          model as a whole will become more imprecise. The more
In the high-risk group there will be 500 payment defaults                                 attributes that are removed because of their association
(5% of 10,000), of which 450 will have been predicted                                     with membership of a particular group, the more preci-
by the score and 50 will have been missed. Of the 9,500                                   sion will be lost, and the quality of the statistical model
cases in which there is no default (95% of 10,000), 8,550                                 will suffer accordingly (Gesellschaft für Informatik, 2018).
will have been correctly predicted thanks to the pre-                                     In short, this creates an irresoluble conflict of aims be-
dictive power of 90%, but 950 will have been wrongly                                      tween avoidance of recourse to protected grounds, even
marked as likely defaulters. The percentage of correctly                                  though they may be significantly influential, and the
predicted payment defaults for the high-risk group will                                   quality of the score. If statistically significant variables
therefore be 450 ÷ (450 + 950) = 32%.                                                     are not used, more people will be inaccurately scored.
                                                                                          It follows in turn that we are confronted with conflicting
For the low-risk group there will be 50 payment defaults                                  fairness criteria which, in general, cannot be simultane-
(0.5% of 10,000), of which 45 will have been predicted by                                 ously achieved. An optimum solution must be sought on
the score and 5 will have been missed. Of the 9,950 cases                                 the basis of fairness priorities.
in which there is no default (99.5% of 10,000), 8,995 will
have been correctly predicted thanks to the predictive                                    Which measures of fairness are to be prioritised and
power of 90%, but 955 will have been wrongly marked                                       which are to be subordinated cannot be decided by
as likely defaulters. The percentage of correctly predict-                                mathematical formulae and machine learning. What is
ed payment defaults for the low-risk group will therefore                                 needed is social accord on the legitimate purposes and
be 45 ÷ (45 + 955) = 4%.                                                                  uses of attributes.

The difference in the percentage of correct predictions
stems from the fact that, where the risk is minimal, very
many non-risky cases are wrongly classified (false pos-
itives). The percentage of correct default predictions
would only be roughly equal for both groups if the algo-
rithm for the low-risk group were ten times more accu-
rate than that for the high-risk group, in other words if
the predictive power of the algorithms were about 99%
and 90% respectively. This is a very unlikely scenario
and does not generally occur.

The consequence for our example is that, if both groups
are treated equally in terms of accuracy, i. e. specificity
(conditional procedure accuracy equality), the low-risk
group runs a far higher risk of false positive assessment
(conditional use accuracy equality).



39 Cf. also the definitions and specimen calculations in Gesellschaft für Informatik, 2018, sections 4.3.1 and 4.3.2.
53

52   Areas for action:the state of research




     VII.	Consumers and society:
           ­expectations, knowledge,
            ­competence and implications
     This chapter provides an overview of the state of re-        other question that arises is whether and to what extent
     search into consumer expectations and acceptance of          consumers tend to approve or disapprove the linking of
     scoring and into consumers’ scoring-related knowledge        their scores and predictor attributes from various areas
     and digital literacy in Germany. It should be said from      of activity. Another issue entirely is whether society as a
     the outset that scarcely any independent academic            whole shares the consumers’ appraisal of what is war-
     studies in Germany have examined consumers’ knowl-           ranted and legitimate. In a democratic society, finding
     edge and digital literacy regarding established scoring      this out is ultimately incumbent on the parliamentary
     systems, such as credit scoring, and potentially novel       legislature, which must translate these moral and eth-
     systems (in areas such as healthcare or in the calcula-      ical perceptions into statutory rules or else decide to
     tion of composite scores from various areas) as well as      refrain from regulatory intervention.
     the associated implications (for exceptions, see Fischer
     and Petersen, 2018, although that work relates to algo-      With regard to traditional credit scoring, acceptance of
     rithmic decisions in general, Müller-Peters and Wagner,      the communicated scores clearly seems to be relatively
     2017, and PricewaterhouseCoopers, 2018). For this rea-       low. In a representative study, for example, more than
     son, the SVRV commissioned a representative survey           half to three quarters of the respondents considered
     (see Part D), one of the aims of which was to form a pic-    their score to be unfair, although the level of acceptance
     ture of knowledge and acceptance of scoring in various       of scores depends on the company from which consum-
     areas of life among the resident population of Germany.      ers obtained their personal credit records (Unabhängiges
                                                                  Landeszentrum für Datenschutz Schleswig-Holstein and
                                                                  GP Forschungsgruppe, 2014). In general terms, accept-
                                                                  ance is greater where scores are higher, i. e. more favour-
                                                                  able, but not even high scores are necessarily perceived
                                                                  as fair. We can only speculate on the reasons for this sit-
     1.	Consumers’ expectations                                  uation. In the same survey, for instance, almost half of
                                                                  the respondents stated that they found the explanations
         and acceptance of scoring                                given by credit reference agencies to be inadequate and
                                                                  often incomprehensible. More than 80% of the respond-
     If consumer policy relating to scoring is to be shaped in    ents, moreover, wished for more transparency and infor-
     such a way as to focus on the justified and socially le-     mation from credit reference agencies and supported an
     gitimate expectations of consumers, the first task will be   information access obligation for those agencies (Un-
     to identify these expectations. Credit scoring has a long    abhängiges Landeszentrum für Datenschutz Schleswig-­
     tradition, but scoring practices in other areas, such as     Holstein and GP Forschungsgruppe, 2014).
     telematics-based motor insurance and health insurance,
     are relatively new phenomena which are only gradually        Social scoring, a novel method used to determine cred-
     coming to play a part in various aspects of consumers’       itworthiness in which data are obtained from social
     lives; accordingly, independent and informative aca-         networks, was assessed by more than half of the re-
     demic studies that shed light on consumers’ attitudes        spondents (56% of a sample numbering 1,023) as risky
     and expectations relating to scoring are still a rarity in   in a representative study dated 2018 (Pricewaterhouse-
     Germany. It therefore seems advisable to establish em-       Coopers, 2018). The majority of respondents (71%) stat-
     pirically in which areas, to what extent and in what form    ed that they saw the danger of flawed conclusions being
     the scoring of consumers is regarded in Germany as le-       drawn in credit reports as a result of the use of data from
     gitimate and in what cases it is held to be unwarranted.     social networks. More than half of the respondents in
                                                                  the 18–25 age group, however, said that they would fa-
     For example, which attributes do consumers regard as         vour social scoring if certain transparency criteria were
     legitimate and justified predictor variables for the as-     met, such as disclosure of the calculated score and in-
     sessment of their creditworthiness or of their motor or      formation on the data used for scoring purposes (Price-
     health insurance premiums and which do they not? An-         waterhouseCoopers, 2018).
54

Areas for action:the state of research                                                                                                                                    53




Germany’s National Academy of Science and Engineer-                                   er as a means of having their premiums considerably
ing (acatech) had a representative survey conducted at                                reduced. In addition, basing premiums on modifiable
the end of 2017 (sample size 2,002) on the subject of                                 attributes such as careful driving tends to be accepted
technology with the main focus on digitisation (acat-                                 by the majority of respondents, whereas factors that
ech and Körber-Stiftung, 2018). In general, digital tech-                             drivers cannot influence through their driving behav-
nology was viewed rather sceptically by the majority                                  iour, such as whether their driving is done during the
of respondents. In the case of autonomous driving, it                                 day or at night, tend to be rejected as pricing variables
emerges that the main concern expressed by the major-                                 by the majority. In the case of health insurance, be-
ity of respondents relates to data security. In addition,                             tween half and two thirds of respondents consider it fair
the collection of personal data by the vehicle received                               that modifiable behavioural criteria such as whether a
a disapproval rating from almost two thirds of the re-                                person attends screening examinations or smokes or
spondents. Since telematics-based insurance tariffs                                   drinks to excess should be considered when premiums
involve the recording of comparable data, their accept-                               are set. The vast majority, however, believe that attrib-
ance is also in doubt.                                                                utes which cannot be changed, such as a family history
                                                                                      of particular medical conditions, should not be factored
A widespread uneasiness regarding the disclosure of                                   into the calculation of insurance premiums. In principle,
personal data seems to indicate an underlying scep-                                   more than a third of respondents would sign up for a
ticism about scoring-based business models founded                                    lifestyle-based health-insurance tariff if it would save
on big-data analyses. For example, two thirds of the re-                              them money (Müller-Peters and Wagner, 2017).
spondents in Germany fear that companies are collect-
ing excessive personal data through the Internet. The                                 Consumers recognisably tend to be prepared in principle
greatest concerns, each echoed by almost 80% of the                                   to make personal behavioural data available to insurers,
respondents, relate to the buying and selling of personal                             particularly if it can obtain them a price cut (Müller-Pe-
information, to a general lack of protection of personal                              ters and Wagner, 2017). Which attributes consumers re-
data and to the danger that personal information could                                gard as acceptable and unacceptable for consideration
come under surveillance (Centre for International Gov-                                in the calculation of insurance premiums appears to
ernance Innovation, 2017). It is not easily fathomable,                               depend on the type of insurance (motor or health insur-
however, why so many consumers nevertheless consent                                   ance) and the modifiability of the attributes (e. g. family
quite readily to the storage and processing of their data.                            medical history versus alcohol consumption).
The offices of Data Protection Commissioners are not
by any means inundated with thousands of complaints                                   It has so far remained a moot point, however, wheth-
from consumers putting up resistance against being                                    er individual factors such as being personally affected
half-coerced into consent. There is a considerable diver-                             (e. g. one’s state of health), socio-economic status, de-
gence between political ideals and practical action.                                  mographic variables and specific attitudes to things like
                                                                                      data privacy and technology as well as one’s locus of
Consumer acceptance and expectations of telemat-                                      control alter consumers’ attitude to and acceptance of
ics-based motor insurance and lifestyle-based health                                  scoring. If a high-definition image of consumer expecta-
insurance, including potential future developments in                                 tions and acceptance of scoring is to be obtained so that
these fields, was the subject of a survey conducted on                                tailor-made measures of consumer policy can be adopt-
behalf of Cologne University of Applied Sciences (TH                                  ed, it is important that specific consumer categories be
Köln).40 Almost half of the respondents (46% of a sam-                                identified.
ple of 834) could imagine having data on their driving
behaviour recorded and passed on to their motor insur-



40	The survey was conducted by an institute closely associated with the insurance industry; the findings, however, are essentially comparable with our own survey (see
    Part D).
55

54   Areas for action:the state of research




     Insurers who make initial forays into scoring often ad-         2. Knowledge and
     vertise only beneficial implications, that is to say a bo-
     nus system whereby policyholders, by behaving in par-
                                                                     competence
     ticular ways, can collect points with the prospect that a
     certain number of points will qualify them for material
     rewards or a reduction in their insurance premiums.             2.1. Consumers and algorithms: knowledge
                                                                     and attitudes
     The opposite scenario, that particular behaviour is lia-        According to a recent survey on a representative sam-
     ble to have adverse consequences, in other words a sys-         ple of 1,221 persons by the Bertelsmann Foundation
     tem of penalty points whereby a policyholder’s conduct          (Fischer and Petersen, 2018), a lack of knowledge about
     may, for example, increase his or her insurance premi-          algorithms and ambivalence or reservations about their
     ums, has not yet been incorporated into the building            use are prevalent among most of the resident popu-
     blocks of a telematics-based insurance tariff. Frequen-         lation of Germany. Since scoring models are based on
     cy of accidents has never yet been one of the variables         algorithms as a rule, excerpts from the findings of that
     that are used in calculating a person’s score. The bonus        survey are presented in the following paragraphs.
     programmes offered by the statutory health insurance
     scheme also entail only bonuses, and non-participation          Although three out of four respondents in the study say
     in measures does not result in penalties such as dear-          that they have heard the term ‘algorithm’, almost half
     er insurance premiums, and this indeed is in line with          of the sample cannot describe spontaneously what it
     the relevant legal provision, section 65a(3) of Book V          means. Of those who have heard at least once of algo-
     of the German Social Code. It therefore seems logical           rithms, however, more than half know nothing of how
     that consumers’ attitudes to and acceptance of behav-           algorithms basically work. Only one tenth of these re-
     ioural premiums will also vary in accordance with the           spondents claim to know how algorithms, as they un-
     prospective consequences. If a telematics-based motor           derstand the term, actually function. Whether the re-
     insurance tariff offers only benefits, such as lower insur-     spondents associate anything with the term ‘algorithm’
     ance premiums for adherence to statutory speed limits,          and know how algorithms work varies widely with age,
     its acceptance level will presumably differ from that of a      education level and gender: respondents with an Abi-
     tariff which also involves financial penalties, i. e. as well   tur, the German university entrance qualification, are
     as not awarding bonus points to drivers who exceed the          far more frequently able to express at least a vague un-
     statutory speed limit, the insurer also penalises them by       derstanding of algorithms than those with lower school
     charging them more. For this reason, another objective          qualifications, male respondents more frequently than
     of the representative public survey commissioned by             female respondents and persons under the age of 45
     the SVRV was to establish the extent to which the ac-           more frequently than the over-60s. The extent to which
     ceptance or rejection of a behavioural pricing system for       respondents are aware of the use of algorithms also var-
     motor and health insurance premiums would be affect-            ies from one area of activity to another: whereas more
     ed if the system involved both bonus and penalty points         than half of the respondents were aware – or claimed to
     (see Part D below).                                             be aware – that algorithms were used in individualised
                                                                     advertising on the Internet, and half, or rather just less
                                                                     than half, were aware that algorithms are used in facial
                                                                     recognition in the context of video surveillance and in
                                                                     the assessment of creditworthiness, only about a third
                                                                     are aware that algorithms are used in some regions to
                                                                     analyse staffing requirements and for police operations,
                                                                     in which they identify areas where the risk of burglaries
                                                                     is particularly high (predictive policing). And slightly
                                                                     fewer than one fifth of respondents were aware that al-
                                                                     gorithms can also be used by the judicial authorities to
                                                                     assess the probability of reoffending.
56

Areas for action:the state of research                                                                                     55




In the view of more than one third of the respondents        In line with a predominantly unfavourable attitude to
in the Bertelsmann study, the risks inherent in algo-        algorithms, almost two thirds of respondents across the
rithm-based decisions outweigh the opportunities they        whole education and age spectrum support tighter con-
offer, whereas fewer than a fifth see them primarily as      trol of algorithms. Measures designed to control the use
an opportunity (Fischer and Petersen, 2018). Almost half     of algorithms such as compulsory indication of algorith-
of the respondents are undecided as to whether risks or      mic decisions, disclosure of algorithms to independent
opportunities are preponderant. This suggests that a         experts and the introduction of an ethics commission
large percentage of the resident population of Germany       meet with the approval of the overwhelming majority of
has not yet formed a clear opinion on this matter. This      respondents (Fischer and Petersen, 2018).
should not come as a surprise, since there is no clear
evidence yet of the actual risk-benefit ratio for many al-   In general terms, the findings of the representative sur-
gorithm-based decisions, and even among experts the          vey described above seem to indicate that there are
question remains a source of controversy.                    currently wide gaps in Germany in people’s knowledge
                                                             of what an algorithm is and how it works. The majority
A firm opinion among respondents is recognisable when        of the population have scarcely looked into the subject
it comes to the question whether decisions should, as        of algorithms, which ties in with the fact that only a mi-
a matter of principle, be made by algorithms or by hu-       nority have a definite opinion on algorithms in general.
mans. A very large majority of respondents (79% of the       At the same time, in many areas of activity decisions as-
sample of 1,221) feel uncomfortable with algorithmic         sisted by or exclusively based on algorithms meet with
decisions and prefer human decisions. Broken down            a great deal of scepticism and rejection. Accordingly, if
into areas of activity, a more differentiated picture        algorithms in their various fields of application are to be
emerges: in spite of an overwhelming general rejection       better understood and their pros and cons more objec-
of exclusively algorithm-based decisions, a majority         tively assessed, it seems logical to pursue the aims of
would consent to the decision on efficient use and ad-       reducing the knowledge deficit and developing digital
ministration of storage spaces being left to algorithms.     literacy among the resident population of Germany.
Almost half, moreover, would be in favour of exclusively
algorithmic decisions on individualised online advertis-     In addition, a balanced social debate should be initiat-
ing and spellchecking in the field of word processing.       ed on the demonstrable implications of algorithms with
Most of the respondents, on the other hand, believe that     a view to addressing fears, rejection and challenges.
the assessment of creditworthiness, medical diagnoses        Equally, however, there is a need for education about
and identification of the probability of re-offending        the empirically substantiated potential and opportu-
should be undertaken exclusively by humans or at most        nities offered by new technologies and algorithms and
by humans taking decisions with the aid of algorithms.       hence by scoring too.
The final decision, they believe, should be taken by a
person. In short, particularly in sensitive areas such as
creditworthiness, criminal justice and health, the ma-
jority oppose the use, or at least the exclusive use, of
algorithm-based decisions.
57

Zur nächsten Seite