Lead segmentation for animal science

Experience the great ROI and ease of use with airSlate SignNow's tailored solution for SMBs and Mid-Market. Transparent pricing and superior 24/7 support included.

airSlate SignNow regularly wins awards for ease of use and setup

See airSlate SignNow eSignatures in action

Create secure and intuitive e-signature workflows on any device, track the status of documents right in your account, build online fillable forms – all within a single solution.

Collect signatures

24x

faster

Reduce costs by

$30

per document

Save up to

40h

per employee / month

Our user reviews speak for themselves

Kodi-Marie Evans

Director of NetSuite Operations at Xerox

airSlate SignNow provides us with the flexibility needed to get the right signatures on the right documents, in the right formats, based on our integration with NetSuite.

Check 5000+ reviews

Samantha Jo

Enterprise Client Partner at Yelp

airSlate SignNow has made life easier for me. It has been huge to have the ability to sign contracts on-the-go! It is now less stressful to get things done efficiently and promptly.

Check 5000+ reviews

Megan Bond

Digital marketing management at Electrolux

This software has added to our business value. I have got rid of the repetitive tasks. I am capable of creating the mobile native web forms. Now I can easily make payment contracts through a fair channel and their management is very easy.

Check 5000+ reviews

be ready to get more

Why choose airSlate SignNow

Free 7-day trial. Choose the plan you need and try it risk-free.
Honest pricing for full-featured plans. airSlate SignNow offers subscription plans with no overages or hidden fees at renewal.
Enterprise-grade security. airSlate SignNow helps you comply with global security standards.

Start free trial

Lead Segmentation for Animal Science

In the realm of animal science, lead segmentation plays a crucial role in targeting the right audience for your research or products. With airSlate SignNow, you can streamline your document signing process to focus on what truly matters - your work.

Lead Segmentation for Animal Science

By utilizing airSlate SignNow for lead segmentation in animal science, you can enhance efficiency and accuracy in your document workflow. Take advantage of this powerful tool to simplify your processes and focus on the core aspects of your research.

Sign up for a free trial of airSlate SignNow today and experience the convenience of seamless document management for your animal science projects.

airSlate SignNow features that users love

Speed up your paper-based processes with an easy-to-use eSignature solution.

Edit PDFs
online

Generate templates of your most used documents for signing and completion.

Learn more

Create a signing link

Share a document via a link without the need to add recipient emails.

Learn more

Assign roles to signers

Organize complex signing workflows by adding multiple signers and assigning roles.

Learn more

Create a document template

Create teams to collaborate on documents and templates in real time.

Learn more

Add Signature fields

Get accurate signatures exactly where you need them using signature fields.

Learn more

Archive documents in bulk

Save time by archiving multiple documents at once.

Learn more

Connect airSlate SignNow to your apps

Check out airSlate SignNow integrations

Data accuracy, security, and compliance

airSlate SignNow is committed to protecting your sensitive information by complying with global industry-specific

Learn more about security

be ready to get more

Get legally-binding signatures now!

Start your free trial

FAQs online signature

Here is a list of the most common customer questions. If you can’t find an answer to your question, please don’t hesitate to reach out to us.

Need help? Contact support

Trusted e-signature solution — what our customers are saying

Explore how the airSlate SignNow e-signature platform helps businesses succeed. Hear from real users and what they like most about electronic signing.

This service is really great! It has helped...

5

anonymous

This service is really great! It has helped us enormously by ensuring we are fully covered in our agreements. We are on a 100% for collecting on our jobs, from a previous 60-70%. I recommend this to everyone.

Read full review

I've been using airSlate SignNow for years (since it...

5

Susan S

I've been using airSlate SignNow for years (since it was CudaSign). I started using airSlate SignNow for real estate as it was easier for my clients to use. I now use it in my business for employement and onboarding docs.

Read full review

Everything has been great, really easy to incorporate...

5

Liam R

Everything has been great, really easy to incorporate into my business. And the clients who have used your software so far have said it is very easy to complete the necessary signatures.

Read full review

How to create outlook signature

and finally I&#39;d like to make you aware of our latest survey platform called discover discover is a web-based browser-based survey creation platform that doesn&#39;t just have the regular question types such as single or multi-select content sum where you can set up skip logic photo control you can very easily set up Max diff or conjoint analysis and discover and the best thing is you can give it a try for free just head over to sawtosoftware.com discover yeah thanks for um you can actually see it on the on the other slide here so slash saw2software.com discover and you can take a look at this fantastic tool now if you have it open you&#39;ll see a button that says try free demo it&#39;s actually not a demo it&#39;s the free version that has all the capabilities I mentioned maxdif conjoint Etc the only limitation of the free version is that you can feel the survey with 50 respondents but can you can still go ahead and field it all the capabilities are there but it&#39;s perfect for you to play around with see how powerful this survey tool is it&#39;s fantastic and it&#39;s becoming extremely popular very fast very quickly with all these announcements I hope that we have all the people that are that are going to be listening we are all ready so I&#39;m going to turn uh turn the time over to Keith the screen is yours thank you mclash okay so we&#39;re going to talk today about the science and process of segmentation um I want to spend just a couple slides giving a little bit of background on what segmentation is I&#39;m going to spend most of our time though uh if anyone if you&#39;ve had experience with segmentation you&#39;ve probably uh learned that they can they can fail it&#39;s it&#39;s pretty common to do a segmentation study and for it not to be very successful and so I want to talk about there&#39;s a lot of things that make segmentation difficult and I think a lot of the things that make it difficult aren&#39;t even things that a lot of people realize and so they don&#39;t know there&#39;s even a problem to fix so they can&#39;t go about fixing it and then we&#39;ll talk of uh we&#39;ll we&#39;ll talk about some tips for doing segmentation well some tools and tips that you can use to uh hopefully have a higher success rate with your segmentation studies so just by way of background they&#39;re sort of uh you know an obligatory slide that says what it is we want to create groups such that segments respondents within a group or segment are similar to one another in some ways and respondents in different groups are dissimilar and the typical steps in a segmentation study we&#39;re going to write a questionnaire that&#39;s going to include the questions that we need it&#39;s going to it&#39;s going to include the basis variables that we&#39;re going to use to create the segments and the profiling variables that we&#39;re going to use to characterize or personalize them then we&#39;re going to create segments using those basis variables and some segmentation algorithms we&#39;re going to profile the variables use the segments using all the other variables in the study then we&#39;re going to go and be strategic about our segmentation we&#39;re going to prioritize the segments and strategize about how to how to appeal to them or which to try to appeal to and perhaps at the end We&#39;ll add a typing tool for shelf life but today I really want to focus on just one of these I want to talk about how we&#39;re going to create segments because in in my experience um maybe half the segmentation failures are in kind of in this bullet point right here they&#39;re not so much in the other one so although that that still leaves plenty of failure to go around and a reason to think about those other things but today we&#39;re just going to focus on this one I talk about basis variables and profiling variables and so typically when we talk about segmentation we&#39;ll talk about you know demographic segmentations or psychographic segmentations Behavioral or need space segmentations uh at least in my experience any one of these can make a can make for a good set of basis variables and and any of them can be good profiling variables and um and I I hear people poo poo demographic segmentations from time to time and and later on in the presentation we&#39;ll show you an example of a really uh successful demographic segmentation we did Once Upon a Time um as well I wanted to mention that uh there&#39;s nothing that would prevent someone from taking five of the demographics three of the psychographics and two behaviors and saying those are my basis variables uh we we do segmentate I would say probably 70 of the segmentations I do have have mixes of these rather than uh sticking to just one of them okay so what is it that makes segmentation difficult I&#39;m going to talk about seven seven different topics here there&#39;s a couple of really vexing combinatorial problems that we face when we do segmentation there&#39;s a cluster of problems that are all related or interrelated there&#39;s a there&#39;s a problem about sample size about having too many variables that leads to something called the cursive dimensionality there&#39;s some some little known problems that can affect segmentation adversely about uh and and we&#39;ll get we&#39;ll cover those as well masking variables and correlated variables guess what if you don&#39;t look for them you don&#39;t find them and they they harm your segmentation uh having unequal segment sizes can be a problem uh but there&#39;s a little bit of a way to address that that we&#39;ll talk about response biases can cause some real difficulties in attitudinal segmentations and then there&#39;s some client occult beliefs I think that we have to try to get our clients try to move our clients away from them because they lead us down they lead us in bad directions okay so let&#39;s talk about those combinatorial problems one is kind of in the nature of segmentation itself and and one is in the interactions that the analysts have with clients so let&#39;s talk about the first one first this is the the fiendish combinatorial problem let&#39;s assume that we we have a really small segmentation study we&#39;ve got 150 respondents and somehow we have reliable information that there&#39;s exactly four segments maybe we had a dream last night or a revelation or an ostradamus told us that we have four segments or whatever but somehow we we infallibly know that there&#39;s just four segments you&#39;d think that this might make our job doing segmentation easier turns out there&#39;s 2 times 10 to the 90 ways of dividing those 150 respondents into four segments there&#39;s a lot of solutions to look for in fact that&#39;s more ways than there are atoms in the universe in fact it&#39;s 500 million times more uh there are 500 million times more possible solutions to that little segmentation problem than there are atoms in the entire universe so that clearly we&#39;re never going to be able to look at more than a tiny fraction of the possible solution space we&#39;re not going to be able to investigate all of them and uh it&#39;s the obviously it&#39;s even more daunting in a typical segmentation study where we might have a thousand respondents because now there&#39;s 10 to the 602 ways of of dividing those respondents into segments so it&#39;s it&#39;s a it&#39;s a it&#39;s a substantial problem that was the fiendish combinatorial problem the the devilish one is let&#39;s assume that our study has 50 variables well some of these are going to be basis variables that we&#39;re going to use to define our segments and some are going to be profiling variables turns out there&#39;s 1.12 trillion ways to separate your variables into those two groups and this is where I&#39;ve seen scope creep come in a lot in segmentation studies especially when we&#39;re using judgment rather than Theory or analytics to to uh to guide our variable selection so maybe the client says well let&#39;s try segmenting on variables X1 through X20 and if they don&#39;t like the solution well let&#39;s try X1 through X10 and x 30 through x37 and if that doesn&#39;t work well we&#39;ve got lots and lots of other combinations to try and I&#39;ve gone through this this cycle of uh of trying to pick basis variables I I&#39;ve had I I think my record with a client was about 22 times cycling through this trying to find the right set of basis variables so this uh can be a really annoying problem okay next is the cluster of problems that have to do with sample size and the number of variables so let&#39;s just imagine a segment a segmentation study that has 20 basis variables and we want to allow there to be up to five segments um one one recent reviewer suggests that we need a sample size at least 10 times the number of basis variables times the number of segments so we&#39;d need a thousand people for this a more recent review did a more thorough analysis and and they thought that we you really need a hundred times as many uh respondents as you have variables so you take your number of variables you multiply it by 100 and and that&#39;s uh that&#39;s the sample size you should be shooting for uh 2000 in this case I mean a lot of our consumer studies that it&#39;s not too hard to get these kind of sample sizes and some of our B2B studies it can be really tough to to get even a tenth of that of those sample sizes so some so so there&#39;s there&#39;s going to be a pressure uh clearly that since the sample size depends on the number of variables there&#39;s going to be a lot of pressure to keep that number of variables small if we can another reason that we want to keep the number of variables small is um and and my friend Joe retzer pointed this out to me so you know if you have just one variable it&#39;s really easy to divide respondents into high and low scores right we could put we could if we have a sample of people we can pretty easily divide them in terms of their height or their education or their income we just we we can draw a median line and we got a high on a low group now right or a high medium and a low group when you add a second variable it becomes harder and when you add 5 or 10 or 20 variables the more variables you add the less likely it is that your segments are going to be able to produce high and low scores on all the variables it&#39;s just it&#39;s just not going to be feasible and so what you&#39;re going to end up with if you have lots of lots of variables you&#39;re going to have segments that don&#39;t look very different at all and the reason for that is something called the cursive dimensionality turns out that the volume of the segmentation space increases exponentially with the increase in the number of variables so if you think about it if you have just two variables or two dimensions then the volume of space gets squared but if you have 20 Dimensions it gets raised to the 20th power there&#39;s a nice quote I like about this problem having too many dimensions causes every observation in your data set to appear equidistant from all the others if the distances are all approximately equal then all the observations appear about equally alike and equally different and there&#39;s no meaningful clusters that can be formed I&#39;ve seen some hints that this maybe affects cluster analysis more than it does latent class analysis but it definitely affects both so and at least I&#39;m used I I wondered I didn&#39;t know why this was happening but it seemed to happen to be a lot it through a lot of variables in my segmentation and and uh and they wouldn&#39;t all be you know I I wouldn&#39;t have I would have segments that looked approximately alike there were very lots of a lot of very small differences I think that&#39;s because the curse of dimensionality is coming into play so a couple a couple different reasons uh to try to keep your number of variables small in addition to the number of variables the type of variables you&#39;re segmenting on matters because if you include in your segmentation variables that the the segments don&#39;t differ on all they do is they make the heart whatever cluster structure is in your data they make it harder to find they mask it and that&#39;s why Briscoe called these masking variables which seems like a really good name for them and uh certainly my friend Joseph White and I did a paper at the Sawtooth software conference which you can download uh by the way from our website in in 2021 where we show just how much harm masking variables could do in terms of recovering a known segmentation structure so these are really damaging they&#39;re really good to get out of your data before you include them in your basis variables but again if you don&#39;t look for them you you don&#39;t know they&#39;re there and they&#39;re just doing unseen damage to your segmentation another one and this is a little bit counter-intuitive it was to me it has to do with correlated variables because a lot of us came up you know we do factor analysis and we realized that multi-item scores are more reliable than single item scores and so on and so we you know we think in terms of you know having correlated variables or might you know it might actually make things better in fact it makes things worse um because having correlated basis variables also makes the cluster structure harder to find and and when Joseph White and I found that in 2021 we really scratched our heads because we we weren&#39;t aware that that was the case we thought hey it didn&#39;t it it you know we were surprised by it but it turns out some other authors had already found it a few years earlier so uh this is another thing to look out for make not only do we not want masking variables we also don&#39;t want correlated variables uh as our basis variables so we need to get those out as well another problem that that&#39;s I think pretty well known uh the first reference to it I could find was from 2006 but Joseph and I uh we found it as well some segmentation algorithms struggle much more than others to identify segments when the segment sizes are unequal so we had a data set where we knew because we programmed it in what the segment membership was and what we saw was that if the segment sizes were all equal a lot of methods did reasonably well at reproducing that cluster structure but when the segment sizes were very unequal really only a couple of the of them of the segmentation algorithms we looked at did a creditable job at all especially the two that worked especially well were Sawtooth software ccea program for cluster ensembles and uh and latent class clustering those were the two that really did by Far and Away the best uh I guess not far there was one other method a robust k-means method in r that worked not quite as well as those two but but better than the rest so so maybe those three methods were ways of handling uneven segment sizes because really you know we don&#39;t want to prevent unequal segment sizes from happening if they&#39;re if if the segments are unequal in size that&#39;s worth knowing but uh but we don&#39;t want to use an algorithm that that uh throws us off the right answers so we want to be careful there another thing that can cause problems with segmentation are response biases especially if we&#39;re doing a segmentation based on rating scales uh the fact that different people use rating scales differently we all know that there&#39;s High raters and low raters right um you know we know that people from different cultures use rating scales differently back in the days of mail surveys we know that right and left-handed people used rating scales a little bit differently and so it&#39;s not at all uncommon for an attitudinal segmentation to produce a three segment solution a segment of high raters a segment of low Raiders and a group of folks that are in between obviously that&#39;s that that&#39;s not a very interesting solution we&#39;re we&#39;re segmenting on the artifact uh you know the the artifact of how people use rating scales and it&#39;s masquerading as clusters as segment structure or cluster structure and that&#39;s obviously problematic if we had a good way of subtracting out that response bias it would be great but uh it&#39;s harder harder to do than you would think it&#39;s it&#39;s not just a matter of subtracting out the mean or something like that uh that sometimes sometimes you&#39;re throwing in for so much information away when you do that that you&#39;re uh you&#39;ve thrown out all the signal and retained the noise and so that&#39;s sometimes not the best thing to do either what I found is if we&#39;re going to use you know if we&#39;re going to use attitudinal if we want to do attitudinal measures rather than rating scale so we might want to use something more like a Max diff uh that&#39;s going to get rid of that response bias for us and that&#39;s something that my colleague Dean Tindall talked about um I think last month at our at our uh at our webinar okay and then I wanted to talk about a couple of occult beliefs that clients have and and the first one the second one listed there clients just don&#39;t seem to realize that that there&#39;s any limit to the number of variables you can include in cluster analysis and that&#39;s because mathematically there&#39;s not uh even if we had just you know 50 respondents we could have a thousand basis variables if we wanted and cluster analysis is going to work and it&#39;s going to give us segments uh they might be garbage but it&#39;s going to give us segments uh so so in you know mathematically there&#39;s kind of no real limit on the number of variables but given the things that we&#39;ve talked about you know the the cursive dimensionality in particular uh you know we really want we really want clients to get off this thought that well we can just throw everything in the garbage can and uh things will turn out Rosy because it&#39;s probably not the case another occult belief that I have to deal with a lot is that I I think a lot of people once upon a time when they learn cluster analysis they they almost learned it you know hey you&#39;ve got to do factor analysis and then uh you know take your Factor scores and cluster on them if this what&#39;s called dual clustering or um Factor clustering a lot of my clients seem to think they need to do that and and I think it&#39;s a really bad idea usually and uh so so uh and the reason is factor analysis just it isn&#39;t it it really doesn&#39;t produce what we needed to do for for for clustering and and so let&#39;s talk about what factor analysis is it&#39;s a psychometric tool for identifying the underlying dimensions in a set of variables and it&#39;s specifically used to test theories about what those underlying Dimensions or constructs are and theories about how to best measure them that&#39;s what factor analysis does that&#39;s what it&#39;s good at it&#39;s not good at identifying the variables that might differ across segments uh it&#39;s certainly not a solution for a garbage can of variables I got a client who tossed in you know 174 variables and they say well just cluster analyze it first and we&#39;ll we&#39;ll use the cluster scores that&#39;s typically not going to work very well for you and and therefore it&#39;s really not valuable for segmentation so let me show you an example here&#39;s a study where um I&#39;m ordinarily loath to use rating scales for segmentation but here&#39;s a study where we actually had some psychometricians uh design a segmentation the set of questions so they were they were relative you know much better design than your average set of questions would be and we got this factor analysis and you know the the client said well you know let&#39;s just use let&#39;s we&#39;ll do factor analysis and we&#39;ll use the top item from each factor we&#39;ll use those as our basis variables so those would be the ones in red here uh variable 10 1 11 4 and 3. however that that didn&#39;t produce a very good cluster solution when we when we cluster analyzed it and and uh what we&#39;ve what we really found was that the two variables that did the best job of showing differences among the segments were two of the items that were cross loaders in the factor analysis q1 underscore H and q1 underscore seven and I&#39;ve seen this happen again and again and again we picked the top item from each factor and we thereby missed the items that are best for cluster analysis and that&#39;s just because the the goals or objectives for factor analysis aren&#39;t the same as they are for cluster analysis so this the whole idea of doing this tandem clustering where you factor first and then cluster is just uh it can lead you down some bad alleys so it&#39;s it&#39;s good to kind of tell your clients that that&#39;s that&#39;s not not really a great idea Okay so we&#39;ve talked about the things that that make segmentation difficult in my experience that make it fail a lot of times and and I&#39;ll say I&#39;ve I one of the reasons I put this presentation together is I&#39;ve done hundreds hundreds and hundreds of segmentations and and I&#39;ve seen them fail uh for for a variety of reasons and so I wanted to try to you know put down on on slides here where a lot of that failure comes from so that maybe we can have more success in the future okay so now we&#39;re going to talk about what we can do given those problems those you know variable and Analysis related problems and ignoring for today all the other problems that could occur how might we solve some of these problems how could we do a better job of segmentation and so here&#39;s the things I&#39;m going to talk about in the following slides where I I think it&#39;s I always feel more comfortable if I can preview my data and try to figure out before I even begin uh if if there&#39;s you know how much structure I&#39;m looking for in the day to do I have well-structured data that&#39;s ready for clustering or do I have uh you know do I have fairly unstructured data where I&#39;m going to be essentially making stuff up when I when I segment we should plan our analysis to deliver on objectives and uh you know a lot of times my you know I&#39;ll hear from a client well you know we always use cluster analysis or we always use latent class analysis for our segmentations and and as I&#39;ll show you in a minute that&#39;s a mistake because because you don&#39;t always want to use one method for your segmentations because different kinds of objectives require different kinds of methods I think we want to precede our clustering with something called variable selection because we want to reduce the number of variables and and break the cursive dimensionality we want to reduce the stress on our sample size we want to eliminate any masking variables or redundant variables from our analysis because those cause problems those help us not see the true structure in the data and so we really want to work hard to get rid of them we&#39;d like to use measures that avoid response bias like avoiding rating scales if possible we&#39;d like to educate our clients to immunize them from those false beliefs because those false beliefs are going to lead us down some bad alleys and we want to use segmentation methods that are most likely to find structure uh like when Joseph White and I tested this the two that did the best were the cluster ensembles in in soccer software ccea package and latent class clustering those those both did the best job of finding structure so in terms of previewing my structure there&#39;s a really good segmentation book that I&#39;ll tell you about at the End by doing the car at all from 2018 and they said really we should be looking to see if you have a natural cluster structure kind of you know no matter what we do we&#39;re going to see this structure because it&#39;s going to keep coming out again and again and again or maybe we have to look a little bit harder and do some you know some robustness testing and some some reliability testing and find a reproducible cluster structure or maybe our our data is really hardcore unstructured and we&#39;re going to have to impose structure on it um later some other authors did a review and they they looked at 32 segmentation studies and they thought that six percent of them had a natural cluster structure that was easy to bind 72 of a percent of them had a had a reproducible cluster structure and 22 percent had unstructured data and I&#39;d have to say when I think about my experience looking at data structure for segmentations that&#39;s pretty much what I see uh in my practice as well if anything I find fewer studies that have a natural cluster structure and maybe a few more that have unstructured data but maybe I&#39;m just unlucky there&#39;s a couple of Statistics we can look at to find out how much structures in our data the one I really like is called the Hopkins statistic um and there&#39;s three there&#39;s some values there if the Hopkins statistic is less than 0.3 you&#39;ve got evenly spaced data throughout your your multivariate segmentation space if it&#39;s around in the neighborhood of 0.5 you&#39;ve got random data and if it&#39;s greater than or equal to 0.7 you&#39;ve got nice clumpy data that segmentation can work with and is you know you&#39;ve got you&#39;ve got happy segmentation algorithms because they&#39;ve got they&#39;ve got what they need to do their job alternatively you could look at the silhouette number that also tells you whether you&#39;ve got anywhere from you know having no structure at all to having a very strong structure and again this is just kind of a a preliminary thing I do before I I start clustering I look at this and let my client know you know give them a feel for you know how hard we&#39;re going to have to work to find our clusters or our segments the next bit of advice I have is make sure we match our objectives to our data conditions and to our you know and yeah so we want to match our objectives and our data conditions and so I&#39;ve got this little segmentation flow chart I put together this is kind of the map in my mind that I&#39;m of the decisions I go through so the first question I ask myself uh you know is is a choice model going to be the basis for my segmentation or not uh either it is or it isn&#39;t and that takes us down two paths uh if it is going to be the basis for my segmentation is it going to be the only basis or am I going to what want to put am I going to want to put my choice Model results plus some attitudinal stuff or plus some behavioral stuff in there and if it&#39;s the only basis variable then we can run latent class multinomial logic that&#39;s something you can run in Sawtooth software&#39;s Lighthouse Studio program you can run it in our Standalone CBC latent class program you can run it in R you can run it in latent gold um you can run it and there&#39;s other places as well right you run it in and load it and and all sorts of places like that Biogen I imagine if it&#39;s not the only basis variable then then as far as I&#39;m aware the only game in town is to run it in a software package called latent gold because that allows you to build segmentations based on Choice model variables uh simultaneously with other with other variables it&#39;s a particular kind of latent class multinomial budget model they have and that&#39;s kind of a neat thing down the other path you know if if we&#39;re not going to be using a choice model for our as the basis for our segmentation we should I my next question is is there going to be a dependent variable is there going to be some variable that we want to see differ across our segments some one particular variable and a lot of times it&#39;s like uh we&#39;re doing this for new product uh development and it&#39;s it&#39;s like a a purchase intent variable or something like this you know let&#39;s let&#39;s let&#39;s come up with segments that are going to be differentially uh you know that our new product is differentially going to appeal to and so if I have a dependent variable I might I might go with a supervised or tree based segmentation if I don&#39;t have a dependent variable I&#39;m going to be going with an unsupervised segmentation and if if I don&#39;t really want to have a supervising variable but I still want my segments to differ a little bit on some one variable of interest I could do something called a predictive segmentation if I&#39;m going the unsupervised or the predicted route um I should at least ask myself do I have a number of of uh of of respondents that&#39;s greater than a hundred times my my number of variables but whether I answer that yes or no because we don&#39;t really know at this point whether we have masking or redundant variables we should really be doing some variable selection and we&#39;ll talk a little bit about how you do variable selection in a couple minutes once you&#39;ve gone through that variable selection phase you&#39;ve gotten your you&#39;ve gotten down to a very much more limited set of variables and you&#39;re ready to start clustering and so the next question is are all my variables metric are they rating scales or binaries or uh you know uh counts or proportions or whatever or are some of them categorical if they&#39;re all metric I can go ahead and I can do I can either do cluster ensembles analysis in Sawtooth software or I can do latent class analysis you know maybe I want to do model based clustering in R is what it&#39;s called there um I have a choice if my variables aren&#39;t all metric if some of them are hardcore multi-category categorical variables I really shouldn&#39;t be doing cluster analysis at all uh you using uh cluster ensembles or kbans or hierarchical clustering I should really be doing latent class clustering and so that that&#39;s kind of the decision to be there but you can what you can see there&#39;s five ending buckets there of methods for doing cluster analysis and so anyone who says well I always do late in class well I don&#39;t always do late in class because sometimes I&#39;m doing a supervised segmentation or I don&#39;t always do you know latent class because what I really need to do is a latent class multinomial logic because I&#39;m using a choice model and so uh so I really think we need we need to think harder about how we&#39;re gonna you know rather than fit everything all segmentation studies you know they always have to they always have to be amenable to to latent class analysis I I don&#39;t think that&#39;s the case I think that&#39;s a that&#39;s a mistake it&#39;s a very limited way of thinking about cluster analysis or segmentation let&#39;s talk a little bit about supervision I mentioned that word on the previous slide and and I mentioned that we got supervised and unsupervised segmentations as well as one that&#39;s kind of in the middle that I call predictive segmentation so in an unsupervised segmentation this is what we usually have have when we&#39;re thinking about doing cluster analysis or latent class analysis we don&#39;t have a dependent variable that that supervises or guides the analysis and again we might do be doing cluster analysis or using hierarchical or k-means or partitioning around midoids or cluster ensembles if we&#39;re wise or we could be doing a model based clustering or late in class improvised segmentation on the other hand is we want to generate segments that are maximally differentiated with respect to some dependent variable and that dependent variable is said to guide or supervise the analysis and for this we&#39;re going to use a tree based method and there&#39;s several to pick from I&#39;m old so I started doing it back in the 80s I guess and the only algorithm I had at the time was something called Aid the automatic interaction detector it was a an old dossy program but it made trees for me and in fact it made one of the the most successful segmentations I&#39;ve ever done in my life this one here I&#39;ve disguised the variables a little bit but this is a tree based segmentation I did for a satellite TV company back in 1990 um they had the idea that they they could they could compete with cable companies by offering uh programming shot directly to people&#39;s homes but they didn&#39;t know who their target market was so we did a demographic segmentation we collected all sorts of demographic well to be fair we also collected psychographic stuff although it didn&#39;t end up you know being part of the segmentation and and what we what what this did is we used the people&#39;s anticipated spend on satellite TV programming as our dependent variable and then we just look for what the demographic or demographic or psychographic predictors were that best split that that sample into groups and and we ended up uh you know the sample split first on gender then on age and income and then finally on education so we could come up with some Target segments like over on the leftmost box there we see our best Target segment where the young men who were going to spend an average of you know 914 dollars a year on on their TV programming you know versus the lowest segment we had where the uh where the medium and low income women who are only going to spend on average 201 dollars and then there were several groups in between so we were able to and and not only did we Define these so now we&#39;ve defined these groups demographically they&#39;re going to be very easy to Target but we also had a lot of psychographic information about them and we you know we could figure out what it was they were wanting to spend this money on uh unfortunately that young male segment they they really wanted to watch all of their sports that were uh were televised and all the ones that weren&#39;t televised they wanted to watch sports drink beer and not work so they didn&#39;t really have money to pay for all that programming they were going to buy but they wanted it anyway and it turned out to be pretty this turned out I say this was a successful segmentation because the company launched they became DirecTV later on and and for many years if not still currently they were the leading satellite TV provider so I felt pretty good about this segmentation in between the supervised and unsupervised is something I do a lot called predictive segmentation so we want our segments to differ on some variable X maybe it&#39;s intent to purchase a new product but we don&#39;t want to do supervised segmentation and oftentimes you know we&#39;re not wanting to do supervised because we don&#39;t really have the sample size that you need to do a good you know you really need to have a thousand usually I I like to ask for at least a thousand respondents when I&#39;m doing those supervised segmentations and sometimes I can&#39;t get that so the way this works is we&#39;re going to look at all the correlations between variable X and all of our potential basis variables or alternatively I like to use random forests to predict X and and find my potential basis variables there based on the importance they have in the random Forest in any in either case I&#39;m going to take my basis variables that are highly related to X I&#39;m only going to use those as my basis variables so if I had I might have looked at 300 variables but I found 20 that are highly correlated to X I&#39;m going to use just those 20 in my segmentation and guess what my my resulting segments are very likely to differ with respect to X I&#39;ve just found this to be a really useful way to do segmentation in support of new product development and targeting I think variable selection is the thing that most the the most commonly analysts don&#39;t realize they have to do and we&#39;ve gone over the reasons for it right we we don&#39;t want to stress out our sample size we don&#39;t want redundant variables and we especially don&#39;t want masking variables in our data set because all of those things you know the the Redundant and the masking variables are really tend to uh as their names as the masking name suggests hide the structure data the the the data structure from you so you&#39;re likely to miss the segment the real segmentation structure in your data because you&#39;ve got all these noise variables in there and the thing is they don&#39;t just provide random noise they steer you in the wrong direction so it&#39;s important to do this you know and maybe maybe you know all that coaching worked and we&#39;ve the clients provided us a nice short list of basis variables uh that don&#39;t include redundant and masking variables but of course we don&#39;t know that so I still think we need to do variable selection how do we do it um at the at the soccer software conference in 2021 Evan owakowska and Joe retzer talked about something called by clustering I&#39;d encourage you to get the conference proceedings and look at their paper because I thought it was a really good one uh I had never heard of buy clustering but they uh well it&#39;s got some drawbacks it&#39;s also kind of a neat method so take a look at that there&#39;s automated variable selection methods we can use Joseph White and I tested one called clust bar cell uh that did a really really powerful job we were able to show that uh you know getting those masking and redundant variables out of the analysis allowed us to have a much better chance of finding the true cluster structure and data which was kind of cool and uh that was in our I think that was in our 2022 paper Joseph whites and mine in our Sawtooth software conference proceeding so check that one out really nice paper that Joseph put together and then uh last year because I I oftentimes face uh kind of garbage can sets of variables for my clients that are a mix of categorical and Metro metric variables after cluster cell worked so well for us I looked for a a variable selection program that would handle mixed variables and I found one in varsel LCM that worked really really well both of those methods are available in R by the way and then lastly um my colleague Joe retzer recommended that I look into unsupervised random forests and I had never heard of an unsupervised random Forest it didn&#39;t make intuitive sense to me but uh when I looked into it it looked pretty cool and actually we&#39;ve discovered that it works pretty well um next to varcel LCM for Mex for mixed metric and categorical variables it did the best job so what is unsupervised random forests what we&#39;re going to do is we&#39;re going to combine two data sets we&#39;re going to take our actual data file and we&#39;re going to append a new variable that we&#39;re going to code one and then we&#39;re going to take an equivalent data file but each variable has its values independently drawn from the same distribution that variable has in the actual data and by independently drawing the variables this breaks all the relations between the variables it breaks all the structure in the data and the second data file we&#39;re going to slap the new variable on and we&#39;re going to code it too and now we&#39;re just going to run random forests to predict that new variable using all the other variables and this is going to identify and give an important it&#39;s going to quantify the importance of each other variable in in terms of predicting the structure in the data uh you don&#39;t have to actually do all of these steps the program for doing unsupervised random forests in R does them for you and it it as its outputs it gives you a proximity Matrix that you could take and directly use in clustering if you wanted to but it also gives you a list of variable importances that you can use to manually do variable selection and then to go on and do uh whatever clustering you want in whatever program you want and I found that that just using those variable importances can really improve your segmentations if you&#39;ve got masking and redundant variables in there kind of cool my last my next bit of advice has to do with using robust methods we know that different segmentation methods produce different solutions you can run pick the same method and run it on the same data but with different random starting points and get different solutions um sometimes in some software packages if you just change the order of your respondents you can get different segmentation Solutions and because there&#39;s all this instability we know that we need we should really not use a simple thing like fast clust or quick cluster in SPSS or fast clust in SAS we we should really do methods that seek out uh you know convergent validity and the methods that that that do that are the software software&#39;s uh convergent cluster ensembles analysis ccea and model based clustering or latent class those seem to work really well there&#39;s another kind of a Brute Force robust version of k-means in r that also works pretty well but not quite as well as ccea or or model based clustering so that&#39;s my advice there and just because we talked a little bit about latent class analysis I just had this last little slide here about latent class analysis whereas cluster analysis is looking for people who have similar scores on the basis variables and saying that people who are similar with respect to their scores Belong Together latent class analysis is actually a probabilistic statistical model that has stat tests and and Native fit statistics you can use um whereas cluster analysis just a mathematical model with some statistics bolted on so I really latent class is really nice it assumes that there&#39;s hidden classes or segments that are that are mixed up in your data in such a way that they produce uh the the patterns of data that you see uh and what what the what what we need to do is to unmix that data and so latent class analysis uses an em algorithm uh to unmix the data by by identifying a very a new variable class membership that explains the observed data pattern and and it&#39;s met you know the quality of that assignment is measured by something called the Bayesian information Criterion which can help you identify the number of segments that you have so blatant class analysis is kind of cool uh before I before we turn it over to questions and for me close to to close things out here um I should mention Sawtooth analytics is someone who can do segmentation for you we do about 100 Consulting projects a year for clients or a little bit more than that I guess and a lot of them are segmentation so we&#39;re here to we&#39;re available to help if you need help with your next segmentation study I&#39;m Keith at sawtoothsoftware.com so feel free to drop me a line I just wanted to go through some references here got a lot of them that I referenced in the paper a really good one is this book called Market segmentation analysis understanding it doing it and making it useful and another nice thing about this book is you can get it for free at that website you can uh it&#39;s uh it&#39;s an Open Access books you can download it for free and it&#39;s it&#39;s well worth reading if you do a lot of segmentation with that I think it&#39;s time for questions yeah thanks Keith um great job and and uh if we if uh if the participants still have a little bit of a time what I&#39;m going to do I&#39;m going to kind of go through a couple of questions um we got a lot of questions and we have Dean and Christina answering some of them um but I think some of them would be good to kind of read out and and see if can help with some of them let me uh start with the first one which was by uh Mike bone uh he was talking about talking about the correlated variables um and the Damage they can do to make it difficult to uncover the cluster since there will always be some level of correlation between two variables what do you typically typically prefer as a standard threshold for deeming it as too high of a level of correlation to include this basis variable stricter stricter threshold like point a to one or looser like 0.4 to 0.6 what do you think you know I I don&#39;t know the answer to that because I usually use those automated variable selection procedures that um that get rid of the correlated variables for me what I know is that when Joseph and I did our tests with artificial data and known segment structure we are the very the what we considered correlated variables were in the range of uh 0.6 to 0.7 and and even those were detrimental to Cluster structure and even those were identified pulled out and removed by the uh the automatic variable selection technique so I I I really it&#39;s an interesting question it just you know this topic is so new to me I haven&#39;t really had time it would be interesting sometime to put together a data set that you know where does the problem start is it correlations of 0.6 like Joseph and I found what about 0.5 what about 0.4 you know at what point does it cause problems and at what point do the variable selection techniques catch it and uh I I think uh I I think we don&#39;t know either of those answers yet so there&#39;s a possible topic for someone to come present at the next Sawtooth software conference if you want to look into that okay thank you Keith uh great question by the way I didn&#39;t even think about that&#39;s a really good one let me move on with the next one uh by Jordan Lieberman um he&#39;s asking if you could explain a little bit better the difference between masking and correlated variables okay so a masking variable is one that the segments don&#39;t differ on at all maybe I&#39;ve got segments that are uh that are different with respect to five variables but when I look at eye color for instance my segments have the same distribution of eye color um that so so what&#39;s going to happen is if I include eye color as one of my basis variables the algorithm is going to try to make groups that are different with respect to eye color it&#39;s going to be one of the things one one of the bits of gravity operating in my segmentation method that that that my algorithm is going to try to pull segments apart in terms of eye color but since we uh because it&#39;s a masking variable however it&#39;s not related to the real structure in your data and so that effort that the cluster algorithm is going through of trying to trying to make segments differ on eye structure even though it&#39;s irrelevant to the cluster structure it&#39;s going to give you a different and less correct cluster structure whereas correlated variables or maybe I&#39;ve got a variable like um I don&#39;t know uh you know my propensity to buy a certain brand and and maybe that variable uh is is an important basis variable for me other variables that are related to that like like maybe Desiring the properties that that brand has or being in a certain age group that kind of likes that brand those might be correlated variables that you might think are going to reinforce the structure in the data but it turns out they don&#39;t so that&#39;s the difference between correlated and masking thanks Keith um here&#39;s a question that I think is a really good one I think a lot of people probably have this question in their heads this is by an anonymous attendee but uh either way um he or she is saying uh I&#39;m confused about the names Elsie late in class because when I use CBC I talk about latent class logic models but it seems that latent class analysis is different what&#39;s the difference between latent class analysis and latent class logic models that&#39;s a really good question in fact uh I thought I kind of thought this was going to come up and I had a slide in here earlier about that latent class describes an entire family of models there&#39;s unsupervised latent class models and then there&#39;s different latent class models depending on whether you have metric data the original latent class had categorical data and some purists still say you should really only use the words late in class when you&#39;re dealing with correlated data and everything else you should call mixture models but I&#39;m not quite such a stickler for for that kind of terminology but you&#39;ve so you&#39;ve got these on you know we&#39;ve we&#39;ve got these these models that are just based on the you know the the distributions of variables but then we&#39;ve also got latent class models that um there&#39;s a latent class version of regression that simultaneously finds segment membership and differences in regression coefficients the basically the unsupervised latent class the basic latent class it identifies segment membership that that produces differences in terms of your variables latent class regression differences in terms of regression coefficients latent class multinomial logic it&#39;s going to produce segments that differ in terms of their logic coefficients or their utilities and and you can see how that that might be valuable there&#39;s also I&#39;ve seen latent class versions of analysis of variance latent class versions of perceptual mapping even so you know back in the 90s in particular there was kind of a proliferation of latent class methods to do all sorts of things but but the ones that are useful for segmentation would be the you know the basic latent class or or mixing mixture model that you can use with unsupervised data or the latent class uh latent class regression or latent class multinomial logic that we could use with conjoint or Choice data thanks there&#39;s there are two people that asked this question so I&#39;m gonna I&#39;m gonna pick this one do you have a stance on the recommended number of of basis variables but the sample sizes I work at I really like to I know my clients often ask me this too and I I tell them boy I&#39;d really like to shoot for something between 10 and 20. you know I&#39;m not going to have a you know if I&#39;m only going to have two variables why bother to do a segmentation study you can run a crosstab right for two variables but but uh but above 20 I think we start running into that um that curse of dimensionality in a pretty bad way so I I really like I really like to see us in the 10 to 20 variable range okay um Alex Burns asked um this presentation focused on a segmentation uh from surveys do you have a perspective on best approaches to segmentation that are not survey data for example a customer purchase data with Associated demographics or would the would the approach be pretty much the same I I think it&#39;s for the past year uh you know 10 years or so here at Sawtooth software I&#39;ve been working with uh survey data but before that I did a lot of segmentations segmentations on client databases right so they&#39;ve got a big database of people it&#39;s got attitudinal they might have attitudinal stuff they&#39;ve at least got behavioral stuff and demographic stuff in there and and the points I&#39;ve made I think in this presentation would apply equally to the to that data as as to survey data but the algorithms don&#39;t care where the data comes from so I think the I think those factors are the same those success factors okay let&#39;s address a couple of uh more maybe one or two more and then we&#39;re gonna have to close and if you guys still have questions please please do reach out um send an email we&#39;re happy to answer them um let me take this one from Ethan because I think Ethan is talking about a typing tool so when choosing basis variables we&#39;re also thinking about that they can be used as allocation questions after the segmentation is done in other words a typing tool if I understand correctly latent class multinomial logic models that use Choice data that should be a barrier to adopt since it&#39;s a lot of effort to redo the choice exercise every time you want to categorize people what are your thoughts on that that&#39;s a good point and it is a lot of effort um although you don&#39;t really you typically don&#39;t need to ask the entire Choice experiment so for example if you if you&#39;ve done your segmentation based on Max diff uh we have I saw two software we have a Max diff typing tool and and what it does is it will come up uh you&#39;ll put your segmentation data in there and it uses something called a naive Bayes uh analysis and it&#39;ll come up with a small set typically four or five questions it&#39;ll come up with a new set of Mac stiff questions and like I say typically just four or five of them that you can ask in future surveys without reproducing the entire Max diff uh experiment every time you ask questions you get it down to this short list of four or five questions and you can then use that to classify people into your choice based segments I think it&#39;d be harder be a little bit harder if you did this as a choice based conjoint because uh our naive Bayes tool doesn&#39;t do that but you could program a naive-based tool uh in R to do it I think there have been a couple presentations at our conferences uh where people at least one where people have talked about using naive Bayes and and they&#39;ve run it in R so you might check that out okay I appreciate it thank you thank you we&#39;re going to uh stop here with the with the questions and uh thank you very much for the answers if you still have uh burning questions that have not been addressed here please do reach out um at the same time I&#39;d like to thank Christina Miller and Dean Tindall who here in the background were quietly answering the questions and I appreciate that um we had a lot of great questions a couple of points before I close again Keith mentioned that he leads our analytics Consultant consultancy Group now segmentation is very difficult and I hope that you guys learned enough right now the participants here that you feel a little more conf uh comfortable and more confident doing segmentation if you have a tough segmentation project come come to you please do reach out to Keith and Keith&#39;s group and they&#39;ll be more than happy to help you Keith&#39;s email address is very simple Keith at sawtosoftware.com and he&#39;s he and his team regularly partner partner up with other Consultants to help do some heavy lifting so don&#39;t do not hesitate to do that normally I get a lot of emails with this question so let me address it right now the recording of This webinar and the copies of the presentation will be available 24 hours after the event and we will email you the link to participants as well

Get legally-binding signatures now!

Start free trial

Lead segmentation for animal science

airSlate SignNow regularly wins awards for ease of use and setup

See airSlate SignNow eSignatures in action

Our user reviews speak for themselves

Why choose airSlate SignNow

Lead Segmentation for Animal Science

Lead Segmentation for Animal Science

airSlate SignNow features that users love

Get legally-binding signatures now!

FAQs online signature

Trusted e-signature solution — what our customers are saying

Related searches to make a sign

How to create outlook signature

Related links to learn sign language

Get legally-binding signatures now!