Applying Natural Language Processing to Comcast Consumer reviews

May 23, 2017 | Autor: Olabanji Shonibare | Categoria: Natural Language Processing, Sentiment Analysis, Topic modeling
Share Embed


Descrição do Produto

Case study: Comcast Consumer reviews Olabanji Shonibare and Melissa Davidson

March 1, 2017

Outline •

Introduction



Word co-occurrences



Topic modeling



Sentiment analysis

Introduction

Comcast Consumer Complaints comcast_consumeraffairs_complaints.csv



Raw complaint data about Comcast television and internet published at consumeraffairs.com between 04/08 and 09/16. •

comcast_fcc_complaints_2015.csv Raw complaints made to the FCC about Comcast between 04/15 and 06/15.

ConsumerAffairs

Location

Number of complaints (CA) FL CA GA IL PA NJ TX MI TN MD WA VA MA CO IN MN OR CT UT AL NH DE SC DC NM MS NY LA AZ WV KY AR MO VT OH WI NC ME NV KS ND ID HI WY PE ON NE BC AK 0

200

400

Count

600

800

Complaints by percentage (CA) CA Percentage for Areas with Comcast

WA

ND

MT

OR

MN

ME WI

SD

ID

MI

WY IA

NE

IL NV

UT

CO

NY

KS

PA

OH

IN

MD WV

MO

VA

KY CA OK AZ

NM

TN

NC

AR

DE

Percent [0.00272 to 0.00387) [0.00387 to 0.00443) [0.00443 to 0.00481) [0.00481 to 0.00571) [0.00571 to 0.00705) [0.00705 to 0.00931)

SC MS TX

NJ

VT NH MA CTRI

AL

GA

[0.00931 to 0.27972] NA

LA FL

FCC

Complaints by percentage (FCC)

FCC Percentage for Areas with Comcast

WA

ND

MT

OR

MN

ME WI

SD

ID

MI

WY IA

NE

IL NV

UT

CO

NY

KS

PA

OH

IN

MD WV

MO

VA

KY CA OK AZ

NM

TN

NC

AR

DE

Percent [0.000663 to 0.001310) [0.001315 to 0.001580) [0.001577 to 0.001680) [0.001682 to 0.002090) [0.002093 to 0.003100) [0.003103 to 0.005270)

SC MS TX

NJ

VT NH MA CTRI

AL

GA

[0.005271 to 0.303950] NA

LA FL

Complaints by zip code (FCC) FCC Comments by Zip Code 50

40

rank(freq.x) latitude

2000 1500 1000

30

−120

−100

−80

longitude

CA and FCC

Location

Complaints for FCC and CA south dakota oklahoma florida california georgia illinois pennsylvania tennessee michigan new jersey texas maryland washington virginia colorado massachusetts indiana oregon minnesota connecticut utah alabama mississippi south carolina district of columbia new hampshire delaware new mexico arizona louisiana new york west virginia kentucky arkansas missouri vermont ohio maine north carolina wisconsin nevada kansas north dakota idaho hawaii wyoming rhode island nebraska montana iowa alaska 0

250

500

Count

750

1000

Complaints by percentage (FCC and CA)

Both Percentage for Areas with Comcast

WA

ND

MT

OR

MN

ME WI

SD

ID

MI

WY IA

NE

IL NV

UT

CO

NY

KS

PA

OH

IN

MD WV

MO

VA

KY CA OK AZ

NM

TN

NC

AR

DE

Percent [0.00340 to 0.00522) [0.00522 to 0.00623) [0.00623 to 0.00690) [0.00690 to 0.00799) [0.00799 to 0.01037) [0.01037 to 0.01295)

SC MS TX

NJ

VT NH MA CTRI

AL

GA

[0.01295 to 0.30395] NA

LA FL

Word co-occurrences

Co-occurrence network in Comcast dataset texts (CA) fix



bad



tech





people

● week





credit

● issue

finally

home





1000

● supervisor

account



1500 2000

● receive

service

charge



n

technician

bill



● pay ● told

● call ●

● time



wait

minute

month



● day



● cable Internet



company



box





speak

phone





fee

customer





rep

2500

hour



tv●



cancel

3000

Co-occurrence network in Comcast dataset titles (FCC) ●

overage

charges

horrible





terrible



bad



fraudulent



● failure



pricing

practices



● unfair

service



xfinity



business



connection



issues

● customer ● ● poor

speed●

price





● cable ●





issue

● complaint



rental



bill

● ●







speeds

switch

data



services

slow

phone



n 25 50 75 100

throttling

internet tv

modem

● billing

caps

cap







usage



300gb



bait

Co-occurrence network in Comcast dataset text (FCC)



received

● account



services

cable





tv



paying

charge





speed



day

months



n 300 500

service



internet



pay



● home

700

modem



bill



month







time



issue



call

told

times



called



● phone



customer

900

Topic modeling

Topic modeling Topic modeling is a method for unsupervised classification of documents, similar to “clustering” on numerical data.

Topic model: Latent Dirichlet Allocation (LDA)

Top 10 terms in each LDA topic (CA) 1

2

3

4

service

service

customer

call

call

call

phone

bill

box

month

time

phone

bill

Internet

Internet

service

home

bill

month

day

customer

time

pay

cable

technician

customer

service

charge

receive

pay

hour

pay

cable

cable

cable

time

Internet

day

charge

channel

0.00 0.02 0.04 0.06

0.00

0.02

0.04

0.06

0.00

β

0.01

0.02

0.03

0.00 0.02 0.04 0.06

Top 10 terms in each LDA topic (FCC) 1

2

3

4

datum

speed

call

service

Internet

Internet

service

Internet

cap

service

time

month

service

modem

phone

bill

month

charge

Internet

cable

usage

pay

day

price

limit

month

account

package

pay

bill

customer

rate

stream

issue

issue

tv

customer

time

told

pay

0.00 0.01 0.02 0.03 0.04

0.00 0.01 0.02 0.03

0.000.010.020.030.040.05

β

0.00

0.02

0.04

0.06

Probability distribution for each topic (CA) 1

2

5000

4000

3000

Number of documents

2000

1000

Topic 1

0 3

2

4

3

5000

4 4000

3000

2000

1000

0 0.2

0.4

0.6

0.2

gamma

0.4

0.6

Probability distribution for each topic (FCC) 1

2

1000

Number of documents

500

Topic 1

0 3

2

4

3 4 1000

500

0 0.00

0.25

0.50

0.75

1.00

0.00

gamma

0.25

0.50

0.75

1.00

Sentiment analysis

Sentiment analysis It is a computational approach to identify people’s opinion towards an entity

Classification techniques •

Machine learning



Lexicon-based

Sentiment analysis Classification techniques • Machine learning • Lexicon-based

NRC ## # A tibble: 13,901 × 2
 ## word sentiment
 ## 
 ## 1 abacus trust
 ## 2 abandon fear
 ## 3 abandon negative
 ## 4 abandon sadness
 ## 5 abandoned anger
 ## 6 abandoned fear
 ## 7 abandoned negative
 ## 8 abandoned sadness
 ## 9 abandonment anger
 ## 10 abandonment fear
 ## # ... with 13,891 more rows

BING ## # A tibble: 6,788 × 2
 ## word sentiment
 ## 
 ## 1 2-faced negative
 ## 2 2-faces negative
 ## 3 a+ positive
 ## 4 abnormal negative
 ## 5 abolish negative
 ## 6 abominable negative
 ## 7 abominably negative
 ## 8 abominate negative
 ## 9 abomination negative
 ## 10 abort negative
 ## # ... with 6,778 more rows

AFINN ## # A tibble: 2,476 × 2
 ## word score
 ## 
 ## 1 abandon -2
 ## 2 abandoned -2
 ## 3 abandons -2
 ## 4 abducted -2
 ## 5 abduction -2
 ## 6 abductions -2
 ## 7 abhor -3
 ## 8 abhorred -3
 ## 9 abhorrent -3
 ## 10 abhors -3
 ## # ... with 2,466 more rows

AFINN lexicon •

This is utterly excellent! 3

Sentiment score = 3 •

"I continue to receive unwanted calls from Comcast despite my instructions." -2

Sentiment score = -2 •

“ I’m not happy and I don’t like it ” 3

Sentiment score = 5

2

state

Average AFINN score for reviews within each state (CA) West Virginia Mississippi Wisconsin Kentucky Kansas Indiana Oregon Delaware Louisiana South Carolina Virginia New Hampshire Ohio Washington Tennessee California Michigan Arizona Florida Minnesota Vermont Georgia Connecticut Colorado New Jersey Illinois Maryland Arkansas Nebraska Texas Massachusetts Pennsylvania New Mexico Utah Nevada Alabama Missouri Idaho New York Wyoming Maine North Dakota Hawaii North Carolina Alaska

sentiment negative

−3

−2

−1

Average sentiment score

0

Average AFINN score for reviews within each state (CA) WA

ND

MT

OR

MN

ME WI

SD

ID

MI

WY IA

NE

IL NV

UT

CO

NY

KS

PA

OH

IN

MD WV

MO

VA

KY CA OK AZ

NM

TN

NC

AR

DE

Sentiment [−3.000 to −0.833) [−0.833 to −0.735) [−0.735 to −0.683) [−0.683 to −0.616) [−0.616 to −0.549) [−0.549 to −0.516)

SC MS TX

NJ

VT NH MA CT RI

AL

GA

[−0.516 to −0.282] NA

LA FL

State

Average AFINN score for reviews within each state (FCC) Rhode Island Ohio Louisiana Nevada Connecticut Utah New Mexico Virginia Delaware Massachusetts Maryland New Hampshire Indiana Oregon New Jersey Texas Mississippi District Of Columbia Kentucky Missouri Colorado Georgia California Alabama Minnesota Tennessee Michigan Montana Pennsylvania Illinois South Carolina Florida New York Washington Arizona Vermont Maine West Virginia Kansas Arkansas North Carolina Iowa

sentiment negative positive

−1

0

Average sentiment score

1

Words with the highest contribution to sentiment scores (CA) care resolve support fine nice hope happy free hate leave

word

lost

sentiment

error

negative

ridiculous

positive

worst mistake terrible complain poor horrible refuse wrong charged cancel bad pay −4000

−2000

Contribution to sentiment

0

Words with the highest contribution to sentiment scores (FCC) resolve support increase care resolved fine promise hope save agree

word

unfair

sentiment

mistake

negative

error

positive

terrible lack drop poor ridiculous wrong complain bad refuse cancel charged pay −1500

−1000

−500

Contribution to sentiment

0

500

Negations that contributed the most to sentiment (CA) can't

don't

help win honor trust guarantee thank save recommend improve fulfill effectively punish awful cancel blame pay

250

200

150

100

50

0

−50

60

40 no

not

without

100

0

100

0

−100

care want help happy true recommend resolved like worth pay −200

help luck good matter success solution better warning problems problem

won't

Sentiment score * number of occurrences

15

10

5

0

−10

−20

−30

0

help honor support recommend like allow accept agree waste stop fight bore pay

success resolve luck loss killing problems losing interruption fail penalty warning −40

Words preceded by a negation

0

20

care want like appreciate recommend trust honor bother pay worry

Negations that contributed the most to sentiment (FCC) can't

don't

leave avoid

0

15

10

5 no

not

help good matter success luck improvement thanks resolve progress charges problems warning problem

without

80

60

40

20

0

20

0

−20

want resolved help honor care allow true fair like helpful −40

won't

success resolving approval trouble restriction problems interrupted fear worrying losing penalty fail warning

honor help restore extend allow

Sentiment score * number of occurrences

2

1

0

−1

−2

10

0

−10

−20

pay −30

Words preceded by a negation

0

cancel

60

pay

40

honor

20

want care like trust support honor help appreciate allow complain pay

help

Thank you Questions or comments?

Next step



Machine learning



Other packages: sentimentr, algorithmia, …

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.