In [26]:
# import dependencies
import pandas as pd

# pandas_profiling finds essential values, provides statistics, and creates correlation matrixes
# https://github.com/pandas-profiling/pandas-profiling
import pandas_profiling
In [28]:
# import csv saved from the numeric_survey function
surveyDF = pd.read_csv('../data/survey04172018.csv')
In [29]:
surveyDF.head()
Out[29]:
work_zip home_zip gender_values age race_values income_values days_last_values time_between_values products_values spend_values max_spend_values how_find_values review_values price convenient atmosphere amenities
0 92614 92614 1 38.0 1.0 1.0 1.0 3.0 0.0 5.0 5.0 3.0 1.0 4 4 4 1
1 92660 92677 1 34.0 4.0 5.0 6.0 7.0 0.0 5.0 7.0 2.0 2.0 5 3 3 1
2 92612 92602 0 35.0 4.0 4.0 0.0 7.0 1.0 5.0 0.0 3.0 2.0 5 5 4 4
3 92620 92780 1 35.0 1.0 4.0 1.0 3.0 1.0 5.0 6.0 3.0 3.0 3 5 4 4
4 97205 97205 1 38.0 4.0 4.0 1.0 6.0 0.0 5.0 5.0 4.0 3.0 4 2 3 2
In [30]:
# run the pandas profiling report
pandas_profiling.ProfileReport(surveyDF)
Out[30]:

Overview

Dataset info

Number of variables 17
Number of observations 104
Total Missing (%) 0.0%
Total size in memory 13.9 KiB
Average record size in memory 136.8 B

Variables types

Numeric 15
Categorical 0
Boolean 2
Date 0
Text (Unique) 0
Rejected 0
Unsupported 0

Warnings

Variables

age
Numeric

Distinct count 27
Unique (%) 26.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 31.163
Minimum 0
Maximum 54
Zeros (%) 3.8%

Quantile statistics

Minimum 0
5-th percentile 21
Q1 27
Median 32
Q3 36
95-th percentile 44.4
Maximum 54
Range 54
Interquartile range 9

Descriptive statistics

Standard deviation 9.019
Coef of variation 0.28941
Kurtosis 3.9753
Mean 31.163
MAD 6.2408
Skewness -1.2281
Sum 3241
Variance 81.342
Memory size 912.0 B
Value Count Frequency (%)  
35.0 10 9.6%
 
32.0 10 9.6%
 
38.0 7 6.7%
 
33.0 6 5.8%
 
30.0 6 5.8%
 
39.0 6 5.8%
 
28.0 6 5.8%
 
31.0 5 4.8%
 
0.0 4 3.8%
 
24.0 4 3.8%
 
Other values (17) 40 38.5%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 4 3.8%
 
21.0 3 2.9%
 
22.0 4 3.8%
 
23.0 4 3.8%
 
24.0 4 3.8%
 

Maximum 5 values

Value Count Frequency (%)  
45.0 1 1.0%
 
46.0 2 1.9%
 
48.0 1 1.0%
 
50.0 1 1.0%
 
54.0 1 1.0%
 

amenities
Numeric

Distinct count 5
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.3269
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 2
Q3 3
95-th percentile 5
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.303
Coef of variation 0.55998
Kurtosis -0.8036
Mean 2.3269
MAD 1.1206
Skewness 0.60333
Sum 242
Variance 1.6979
Memory size 912.0 B
Value Count Frequency (%)  
1 38 36.5%
 
2 24 23.1%
 
3 20 19.2%
 
4 14 13.5%
 
5 8 7.7%
 

Minimum 5 values

Value Count Frequency (%)  
1 38 36.5%
 
2 24 23.1%
 
3 20 19.2%
 
4 14 13.5%
 
5 8 7.7%
 

Maximum 5 values

Value Count Frequency (%)  
1 38 36.5%
 
2 24 23.1%
 
3 20 19.2%
 
4 14 13.5%
 
5 8 7.7%
 

atmosphere
Numeric

Distinct count 5
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.0769
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 5
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.1962
Coef of variation 0.38877
Kurtosis -0.81443
Mean 3.0769
MAD 0.95858
Skewness -0.18541
Sum 320
Variance 1.4309
Memory size 912.0 B
Value Count Frequency (%)  
3 31 29.8%
 
4 29 27.9%
 
2 19 18.3%
 
1 13 12.5%
 
5 12 11.5%
 

Minimum 5 values

Value Count Frequency (%)  
1 13 12.5%
 
2 19 18.3%
 
3 31 29.8%
 
4 29 27.9%
 
5 12 11.5%
 

Maximum 5 values

Value Count Frequency (%)  
1 13 12.5%
 
2 19 18.3%
 
3 31 29.8%
 
4 29 27.9%
 
5 12 11.5%
 

convenient
Numeric

Distinct count 5
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.0673
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1.15
Q1 4
Median 4
Q3 5
95-th percentile 5
Maximum 5
Range 4
Interquartile range 1

Descriptive statistics

Standard deviation 1.1681
Coef of variation 0.28718
Kurtosis 0.83041
Mean 4.0673
MAD 0.87888
Skewness -1.2883
Sum 423
Variance 1.3644
Memory size 912.0 B
Value Count Frequency (%)  
5 49 47.1%
 
4 32 30.8%
 
3 10 9.6%
 
2 7 6.7%
 
1 6 5.8%
 

Minimum 5 values

Value Count Frequency (%)  
1 6 5.8%
 
2 7 6.7%
 
3 10 9.6%
 
4 32 30.8%
 
5 49 47.1%
 

Maximum 5 values

Value Count Frequency (%)  
1 6 5.8%
 
2 7 6.7%
 
3 10 9.6%
 
4 32 30.8%
 
5 49 47.1%
 

days_last_values
Numeric

Distinct count 8
Unique (%) 7.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.7308
Minimum 0
Maximum 7
Zeros (%) 2.9%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 1
Median 2
Q3 4
95-th percentile 7
Maximum 7
Range 7
Interquartile range 3

Descriptive statistics

Standard deviation 1.8074
Coef of variation 0.66186
Kurtosis 0.096134
Mean 2.7308
MAD 1.4512
Skewness 0.87487
Sum 284
Variance 3.2666
Memory size 912.0 B
Value Count Frequency (%)  
1.0 30 28.8%
 
3.0 21 20.2%
 
2.0 21 20.2%
 
4.0 13 12.5%
 
7.0 7 6.7%
 
5.0 6 5.8%
 
0.0 3 2.9%
 
6.0 3 2.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 3 2.9%
 
1.0 30 28.8%
 
2.0 21 20.2%
 
3.0 21 20.2%
 
4.0 13 12.5%
 

Maximum 5 values

Value Count Frequency (%)  
3.0 21 20.2%
 
4.0 13 12.5%
 
5.0 6 5.8%
 
6.0 3 2.9%
 
7.0 7 6.7%
 

gender_values
Boolean

Distinct count 2
Unique (%) 1.9%
Missing (%) 0.0%
Missing (n) 0
Mean 0.91346
1
95
0
 
9
Value Count Frequency (%)  
1 95 91.3%
 
0 9 8.7%
 

home_zip
Numeric

Distinct count 80
Unique (%) 76.9%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 88181
Minimum 1801
Maximum 98065
Zeros (%) 0.0%

Quantile statistics

Minimum 1801
5-th percentile 37475
Q1 91769
Median 92627
Q3 92805
95-th percentile 94576
Maximum 98065
Range 96264
Interquartile range 1036

Descriptive statistics

Standard deviation 18362
Coef of variation 0.20823
Kurtosis 13.919
Mean 88181
MAD 8442.4
Skewness -3.9127
Sum 9170835
Variance 337170000
Memory size 912.0 B
Value Count Frequency (%)  
92618 4 3.8%
 
92627 4 3.8%
 
92602 3 2.9%
 
92782 3 2.9%
 
92780 2 1.9%
 
92887 2 1.9%
 
92123 2 1.9%
 
91344 2 1.9%
 
91748 2 1.9%
 
92647 2 1.9%
 
Other values (70) 78 75.0%
 

Minimum 5 values

Value Count Frequency (%)  
1801 1 1.0%
 
8125 1 1.0%
 
10028 1 1.0%
 
20816 1 1.0%
 
21108 1 1.0%
 

Maximum 5 values

Value Count Frequency (%)  
94609 1 1.0%
 
94610 1 1.0%
 
95127 1 1.0%
 
97205 1 1.0%
 
98065 1 1.0%
 

how_find_values
Numeric

Distinct count 5
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 1.6923
Minimum 0
Maximum 4
Zeros (%) 16.3%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
Median 2
Q3 2
95-th percentile 3
Maximum 4
Range 4
Interquartile range 1

Descriptive statistics

Standard deviation 1.071
Coef of variation 0.63289
Kurtosis -0.78118
Mean 1.6923
MAD 0.89941
Skewness -0.031466
Sum 176
Variance 1.1471
Memory size 912.0 B
Value Count Frequency (%)  
2.0 36 34.6%
 
1.0 26 25.0%
 
3.0 22 21.2%
 
0.0 17 16.3%
 
4.0 3 2.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 17 16.3%
 
1.0 26 25.0%
 
2.0 36 34.6%
 
3.0 22 21.2%
 
4.0 3 2.9%
 

Maximum 5 values

Value Count Frequency (%)  
0.0 17 16.3%
 
1.0 26 25.0%
 
2.0 36 34.6%
 
3.0 22 21.2%
 
4.0 3 2.9%
 

income_values
Numeric

Distinct count 9
Unique (%) 8.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.2692
Minimum 0
Maximum 8
Zeros (%) 7.7%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 3
Median 5
Q3 6
95-th percentile 8
Maximum 8
Range 8
Interquartile range 3

Descriptive statistics

Standard deviation 2.2822
Coef of variation 0.53457
Kurtosis -0.70631
Mean 4.2692
MAD 1.841
Skewness -0.34922
Sum 444
Variance 5.2084
Memory size 912.0 B
Value Count Frequency (%)  
4.0 20 19.2%
 
5.0 20 19.2%
 
6.0 17 16.3%
 
1.0 12 11.5%
 
0.0 8 7.7%
 
7.0 8 7.7%
 
8.0 8 7.7%
 
3.0 8 7.7%
 
2.0 3 2.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 8 7.7%
 
1.0 12 11.5%
 
2.0 3 2.9%
 
3.0 8 7.7%
 
4.0 20 19.2%
 

Maximum 5 values

Value Count Frequency (%)  
4.0 20 19.2%
 
5.0 20 19.2%
 
6.0 17 16.3%
 
7.0 8 7.7%
 
8.0 8 7.7%
 

max_spend_values
Numeric

Distinct count 8
Unique (%) 7.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.4038
Minimum 0
Maximum 7
Zeros (%) 3.8%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 7
Maximum 7
Range 7
Interquartile range 2

Descriptive statistics

Standard deviation 1.6867
Coef of variation 0.49553
Kurtosis -0.094944
Mean 3.4038
MAD 1.3317
Skewness 0.46645
Sum 354
Variance 2.845
Memory size 912.0 B
Value Count Frequency (%)  
3.0 34 32.7%
 
2.0 23 22.1%
 
4.0 15 14.4%
 
5.0 9 8.7%
 
6.0 8 7.7%
 
7.0 7 6.7%
 
1.0 4 3.8%
 
0.0 4 3.8%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 4 3.8%
 
1.0 4 3.8%
 
2.0 23 22.1%
 
3.0 34 32.7%
 
4.0 15 14.4%
 

Maximum 5 values

Value Count Frequency (%)  
3.0 34 32.7%
 
4.0 15 14.4%
 
5.0 9 8.7%
 
6.0 8 7.7%
 
7.0 7 6.7%
 

price
Numeric

Distinct count 5
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.8269
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 2
Q1 3
Median 4
Q3 5
95-th percentile 5
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.0282
Coef of variation 0.26867
Kurtosis -0.24737
Mean 3.8269
MAD 0.82581
Skewness -0.62773
Sum 398
Variance 1.0571
Memory size 912.0 B
Value Count Frequency (%)  
4 38 36.5%
 
5 31 29.8%
 
3 23 22.1%
 
2 10 9.6%
 
1 2 1.9%
 

Minimum 5 values

Value Count Frequency (%)  
1 2 1.9%
 
2 10 9.6%
 
3 23 22.1%
 
4 38 36.5%
 
5 31 29.8%
 

Maximum 5 values

Value Count Frequency (%)  
1 2 1.9%
 
2 10 9.6%
 
3 23 22.1%
 
4 38 36.5%
 
5 31 29.8%
 

products_values
Boolean

Distinct count 2
Unique (%) 1.9%
Missing (%) 0.0%
Missing (n) 0
Mean 0.16346
0.0
87
1.0
 
17
Value Count Frequency (%)  
0.0 87 83.7%
 
1.0 17 16.3%
 

race_values
Numeric

Distinct count 6
Unique (%) 5.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.3269
Minimum 0
Maximum 5
Zeros (%) 2.9%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 1
Median 1
Q3 4
95-th percentile 4
Maximum 5
Range 5
Interquartile range 3

Descriptive statistics

Standard deviation 1.5229
Coef of variation 0.65448
Kurtosis -1.7472
Mean 2.3269
MAD 1.4675
Skewness 0.21882
Sum 242
Variance 2.3193
Memory size 912.0 B
Value Count Frequency (%)  
1.0 52 50.0%
 
4.0 38 36.5%
 
3.0 7 6.7%
 
5.0 3 2.9%
 
0.0 3 2.9%
 
2.0 1 1.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 3 2.9%
 
1.0 52 50.0%
 
2.0 1 1.0%
 
3.0 7 6.7%
 
4.0 38 36.5%
 

Maximum 5 values

Value Count Frequency (%)  
1.0 52 50.0%
 
2.0 1 1.0%
 
3.0 7 6.7%
 
4.0 38 36.5%
 
5.0 3 2.9%
 

review_values
Numeric

Distinct count 4
Unique (%) 3.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.0577
Minimum 0
Maximum 3
Zeros (%) 1.0%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 2
Median 2
Q3 2
95-th percentile 3
Maximum 3
Range 3
Interquartile range 0

Descriptive statistics

Standard deviation 0.60462
Coef of variation 0.29384
Kurtosis 0.81903
Mean 2.0577
MAD 0.38055
Skewness -0.29276
Sum 214
Variance 0.36557
Memory size 912.0 B
Value Count Frequency (%)  
2.0 69 66.3%
 
3.0 21 20.2%
 
1.0 13 12.5%
 
0.0 1 1.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 1 1.0%
 
1.0 13 12.5%
 
2.0 69 66.3%
 
3.0 21 20.2%
 

Maximum 5 values

Value Count Frequency (%)  
0.0 1 1.0%
 
1.0 13 12.5%
 
2.0 69 66.3%
 
3.0 21 20.2%
 

spend_values
Numeric

Distinct count 6
Unique (%) 5.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.6346
Minimum 0
Maximum 5
Zeros (%) 9.6%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
Median 3
Q3 3.25
95-th percentile 5
Maximum 5
Range 5
Interquartile range 1.25

Descriptive statistics

Standard deviation 1.4485
Coef of variation 0.5498
Kurtosis -0.59703
Mean 2.6346
MAD 1.1749
Skewness -0.019643
Sum 274
Variance 2.0982
Memory size 912.0 B
Value Count Frequency (%)  
2.0 29 27.9%
 
3.0 29 27.9%
 
5.0 15 14.4%
 
4.0 11 10.6%
 
1.0 10 9.6%
 
0.0 10 9.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 10 9.6%
 
1.0 10 9.6%
 
2.0 29 27.9%
 
3.0 29 27.9%
 
4.0 11 10.6%
 

Maximum 5 values

Value Count Frequency (%)  
1.0 10 9.6%
 
2.0 29 27.9%
 
3.0 29 27.9%
 
4.0 11 10.6%
 
5.0 15 14.4%
 

time_between_values
Numeric

Distinct count 7
Unique (%) 6.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.5192
Minimum 0
Maximum 7
Zeros (%) 14.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
Median 3
Q3 6
95-th percentile 7
Maximum 7
Range 7
Interquartile range 4

Descriptive statistics

Standard deviation 2.2076
Coef of variation 0.62729
Kurtosis -0.99057
Mean 3.5192
MAD 1.8277
Skewness -0.0041038
Sum 366
Variance 4.8734
Memory size 912.0 B
Value Count Frequency (%)  
4.0 20 19.2%
 
3.0 20 19.2%
 
6.0 19 18.3%
 
2.0 16 15.4%
 
0.0 15 14.4%
 
7.0 11 10.6%
 
1.0 3 2.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 15 14.4%
 
1.0 3 2.9%
 
2.0 16 15.4%
 
3.0 20 19.2%
 
4.0 20 19.2%
 

Maximum 5 values

Value Count Frequency (%)  
2.0 16 15.4%
 
3.0 20 19.2%
 
4.0 20 19.2%
 
6.0 19 18.3%
 
7.0 11 10.6%
 

work_zip
Numeric

Distinct count 75
Unique (%) 72.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 88987
Minimum 1730
Maximum 98065
Zeros (%) 0.0%

Quantile statistics

Minimum 1730
5-th percentile 90010
Q1 91910
Median 92648
Q3 92821
95-th percentile 94695
Maximum 98065
Range 96335
Interquartile range 911

Descriptive statistics

Standard deviation 16580
Coef of variation 0.18632
Kurtosis 17.789
Mean 88987
MAD 6980.8
Skewness -4.35
Sum 9254619
Variance 274910000
Memory size 912.0 B
Value Count Frequency (%)  
92660 10 9.6%
 
92618 7 6.7%
 
92612 4 3.8%
 
92606 3 2.9%
 
92656 3 2.9%
 
92821 3 2.9%
 
94080 2 1.9%
 
93550 2 1.9%
 
92614 2 1.9%
 
91910 2 1.9%
 
Other values (65) 66 63.5%
 

Minimum 5 values

Value Count Frequency (%)  
1730 1 1.0%
 
10019 1 1.0%
 
20815 1 1.0%
 
21108 1 1.0%
 
28262 1 1.0%
 

Maximum 5 values

Value Count Frequency (%)  
94720 1 1.0%
 
95054 1 1.0%
 
97205 1 1.0%
 
98052 1 1.0%
 
98065 1 1.0%
 

Correlations

Sample

work_zip home_zip gender_values age race_values income_values days_last_values time_between_values products_values spend_values max_spend_values how_find_values review_values price convenient atmosphere amenities
0 92614 92614 1 38.0 1.0 1.0 1.0 3.0 0.0 5.0 5.0 3.0 1.0 4 4 4 1
1 92660 92677 1 34.0 4.0 5.0 6.0 7.0 0.0 5.0 7.0 2.0 2.0 5 3 3 1
2 92612 92602 0 35.0 4.0 4.0 0.0 7.0 1.0 5.0 0.0 3.0 2.0 5 5 4 4
3 92620 92780 1 35.0 1.0 4.0 1.0 3.0 1.0 5.0 6.0 3.0 3.0 3 5 4 4
4 97205 97205 1 38.0 4.0 4.0 1.0 6.0 0.0 5.0 5.0 4.0 3.0 4 2 3 2