Print Heterogenous Conditional with airSlate SignNow

Get rid of paper and automate digital document managing for higher efficiency and countless opportunities. Sign anything from your home, quick and professional. Enjoy a greater way of doing business with airSlate SignNow.

Award-winning eSignature solution

Send my document for signature

Get your document eSigned by multiple recipients.
Send my document for signature

Sign my own document

Add your eSignature
to a document in a few clicks.
Sign my own document

Do more online with a globally-trusted eSignature platform

Standout signing experience

You can make eSigning workflows user-friendly, fast, and efficient for your clients and employees. Get your papers signed in a matter of minutes

Trusted reporting and analytics

Real-time accessibility along with instant notifications means you’ll never lose a thing. Check statistics and document progress via easy-to-understand reporting and dashboards.

Mobile eSigning in person and remotely

airSlate SignNow enables you to eSign on any system from any location, whether you are working remotely from home or are in person at your workplace. Each eSigning experience is versatile and easy to customize.

Industry rules and conformity

Your electronic signatures are legally valid. airSlate SignNow ensures the top-level conformity with US and EU eSignature laws and maintains market-specific regulations.

Print heterogenous conditional, faster than ever before

airSlate SignNow delivers a print heterogenous conditional feature that helps enhance document workflows, get contracts signed immediately, and operate seamlessly with PDFs.

Helpful eSignature add-ons

Take advantage of easy-to-install airSlate SignNow add-ons for Google Docs, Chrome browser, Gmail, and more. Try airSlate SignNow’s legally-binding eSignature capabilities with a mouse click

See airSlate SignNow eSignatures in action

Create secure and intuitive eSignature workflows on any device, track the status of documents right in your account, build online fillable forms – all within a single solution.

Try airSlate SignNow with a sample document

Complete a sample document online. Experience airSlate SignNow's intuitive interface and easy-to-use tools
in action. Open a sample document to add a signature, date, text, upload attachments, and test other useful functionality.

sample
Checkboxes and radio buttons
sample
Request an attachment
sample
Set up data validation

airSlate SignNow solutions for better efficiency

Keep contracts protected
Enhance your document security and keep contracts safe from unauthorized access with dual-factor authentication options. Ask your recipients to prove their identity before opening a contract to print heterogenous conditional.
Stay mobile while eSigning
Install the airSlate SignNow app on your iOS or Android device and close deals from anywhere, 24/7. Work with forms and contracts even offline and print heterogenous conditional later when your internet connection is restored.
Integrate eSignatures into your business apps
Incorporate airSlate SignNow into your business applications to quickly print heterogenous conditional without switching between windows and tabs. Benefit from airSlate SignNow integrations to save time and effort while eSigning forms in just a few clicks.
Generate fillable forms with smart fields
Update any document with fillable fields, make them required or optional, or add conditions for them to appear. Make sure signers complete your form correctly by assigning roles to fields.
Close deals and get paid promptly
Collect documents from clients and partners in minutes instead of weeks. Ask your signers to print heterogenous conditional and include a charge request field to your sample to automatically collect payments during the contract signing.
Collect signatures
24x
faster
Reduce costs by
$30
per document
Save up to
40h
per employee / month

Our user reviews speak for themselves

illustrations persone
Kodi-Marie Evans
Director of NetSuite Operations at Xerox
airSlate SignNow provides us with the flexibility needed to get the right signatures on the right documents, in the right formats, based on our integration with NetSuite.
illustrations reviews slider
illustrations persone
Samantha Jo
Enterprise Client Partner at Yelp
airSlate SignNow has made life easier for me. It has been huge to have the ability to sign contracts on-the-go! It is now less stressful to get things done efficiently and promptly.
illustrations reviews slider
illustrations persone
Megan Bond
Digital marketing management at Electrolux
This software has added to our business value. I have got rid of the repetitive tasks. I am capable of creating the mobile native web forms. Now I can easily make payment contracts through a fair channel and their management is very easy.
illustrations reviews slider
walmart logo
exonMobil logo
apple logo
comcast logo
facebook logo
FedEx logo
be ready to get more

Why choose airSlate SignNow

  • Free 7-day trial. Choose the plan you need and try it risk-free.
  • Honest pricing for full-featured plans. airSlate SignNow offers subscription plans with no overages or hidden fees at renewal.
  • Enterprise-grade security. airSlate SignNow helps you comply with global security standards.
illustrations signature

Your step-by-step guide — print heterogenous conditional

Access helpful tips and quick steps covering a variety of airSlate SignNow’s most popular features.

Using airSlate SignNow’s eSignature any business can speed up signature workflows and eSign in real-time, delivering a better experience to customers and employees. print heterogenous conditional in a few simple steps. Our mobile-first apps make working on the go possible, even while offline! Sign documents from anywhere in the world and close deals faster.

Follow the step-by-step guide to print heterogenous conditional:

  1. Log in to your airSlate SignNow account.
  2. Locate your document in your folders or upload a new one.
  3. Open the document and make edits using the Tools menu.
  4. Drag & drop fillable fields, add text and sign it.
  5. Add multiple signers using their emails and set the signing order.
  6. Specify which recipients will get an executed copy.
  7. Use Advanced Options to limit access to the record and set an expiration date.
  8. Click Save and Close when completed.

In addition, there are more advanced features available to print heterogenous conditional. Add users to your shared workspace, view teams, and track collaboration. Millions of users across the US and Europe agree that a solution that brings everything together in a single holistic enviroment, is what enterprises need to keep workflows functioning easily. The airSlate SignNow REST API allows you to integrate eSignatures into your application, internet site, CRM or cloud. Try out airSlate SignNow and get quicker, smoother and overall more efficient eSignature workflows!

How it works

Open & edit your documents online
Create legally-binding eSignatures
Store and share documents securely

airSlate SignNow features that users love

Speed up your paper-based processes with an easy-to-use eSignature solution.

Edit PDFs
online
Generate templates of your most used documents for signing and completion.
Create a signing link
Share a document via a link without the need to add recipient emails.
Assign roles to signers
Organize complex signing workflows by adding multiple signers and assigning roles.
Create a document template
Create teams to collaborate on documents and templates in real time.
Add Signature fields
Get accurate signatures exactly where you need them using signature fields.
Archive documents in bulk
Save time by archiving multiple documents at once.
be ready to get more

Get legally-binding signatures now!

What active users are saying — print heterogenous conditional

Get access to airSlate SignNow’s reviews, our customers’ advice, and their stories. Hear from real users and what they say about features for generating and signing docs.

This service is really great! It has helped...
5
anonymous

This service is really great! It has helped us enormously by ensuring we are fully covered in our agreements. We are on a 100% for collecting on our jobs, from a previous 60-70%. I recommend this to everyone.

Read full review
I've been using airSlate SignNow for years (since it...
5
Susan S

I've been using airSlate SignNow for years (since it was CudaSign). I started using airSlate SignNow for real estate as it was easier for my clients to use. I now use it in my business for employement and onboarding docs.

Read full review
Everything has been great, really easy to incorporate...
5
Liam R

Everything has been great, really easy to incorporate into my business. And the clients who have used your software so far have said it is very easy to complete the necessary signatures.

Read full review

Related searches to print heterogenous conditional with airSlate airSlate SignNow

heterogeneous treatment effects
causalml
"heterogeneous treatment effects" python
econml
average treatment effect python
causal forest python
causal tree python
double machine learning python
video background

Print heterogenous conditional

um great to see everyone and great to see everyone scattered all over the world so thanks to the people who are um attending especially at odd hours um so in in some ways it's i find that it's been getting harder and harder lately to give these talks on sort of machine learning and causal inference um relative to when i first started working on the problem and part of that is you know when i first started working on it there there wasn't that much work um that related you know theories of estimation and theories of identification with the recent advances by machine learning um more recently there's actually an explosion of work and so it's actually hard for me or anyone to even keep up nonetheless today i'm going to sort of go back to i'm going to sort of start a little closer to the beginning and talk about the first work that i've done in this area but i just want to highlight that the area is moving very quickly and there are a lot of of kind of frontier questions that i won't even be able to get close to and also i want to highlight something that i've been emphasizing in a lot of my kind of keynotes recently for machine learning audiences that there's more than one type of causal inference in economics and more than one kind of approach to causal inference in economics so today i'll be talking to the one that i found the easiest to connect to straight supervised machine learning but i just want to highlight for you there is a wider world of causal modeling and economics for example some of my other research is working on trying to estimate price sensitivity for consumers when they're making choices when shopping in the supermarket and use those models to optimize a firm's pricing decision i have also done work on auctions where i've used auction data to try to estimate from consumer behavior i mean from bitter behavior their preferences and then use those estimates to try to optimize market design so the using of the using models sort of for design and optimization is is a is another big branch of economics and also operations research and some level we're we're sort of moving both closer to that in our my own research now but also um i think from a research perspective there's a actually like a wide range of generative models and and so i think one thing that all that shares in common though is that we think about how the data is generated we think about whether it's even possible to learn what's needed to do a counterfactual um and and then we but then where we things start to get more rich and complicated as well what how do we estimate these models how do we translate a problem where in principle we could figure out what we need to know from the data to a strategy an empirical strategy for actually um carrying that out so that's kind of a just high level advertisement that there's a there's a wide world out there but i'm i'm going to start today sort of closer to i think what you've been um talking about and where a lot of the work has been to date and off to the races does that look good now yeah it looks so good perfect all right i really did restart like half an hour ago so sometimes these things happen okay so i'm going to talk about heterogeneous treatment effects and so heterogeneous treatment effects come up when you're trying to evaluate the impact of a treatment the applications can be things like evaluating a b tests and actually a number of tech firms uh which is always very exciting for me have told me how they've uh implemented some of my methods in their a b testing platform and it's especially exciting because that was actually one of my original motivations i used to be chief economist at microsoft and i worked on the bing platform right when they were first trying to figure out how to even um set up baby testing for their advertising platform and so one of the things that i noticed was that they would make a lot of decisions about whether to ship an algorithm based on the overall aggregate performance of the algorithm and actually the people who guarded the a b testing platform would say oh we we don't want people to be able to data mine our a b tests and the reason they didn't want to let them do that is that if you think about an engineer they get a bonus if something ships and if if they if it looks like their experiment didn't pass the a b testing it would be very tempting to go out and search through all the different subgroups and the data until you found one where it worked well and then say hey guys look it works well over here so please release my new algorithm you know just on these people over here and they were worried about this sort of x post mining of the data which would find something spurious that wouldn't replicate if you ran the experiment again um so generally people wanted to i think people recognize that it could be valuable to understand heterogeneous treatment effects but they felt like it was important to guard against that kind of snooping and that that same kind of concern also comes up in a drug trial so suppose that you know you really wanted your drug to be approved um it the average treatment effect of the drug in the trial looks like zero um and you look around in the data and you keep finding looking at all the different characteristics of people and finally say oh you know for women between age 50 and 55 who have these three pre-existing conditions it looks like my drug works great because you peeked at the data to try to find that group your results are unlikely to replicate so generally that's why in experimental design both in companies and in you know medicine you have to register your your experiment in advance and if you plan to test any hypotheses you want to state those in advance um when we haven't done that in fields like psychology and parts of economics and so on you have these replication crises where people post results from experiments that can't be replicated um so the so the the goal is to try to figure out how to estimate heterogeneous treatment effects without coming up with spurious results and finding results that can replicate and if it's something like the fda or even a you know a rigorous a b testing platform for a firm you want to make sure that your results are reliable and you want to make statements about uncertainty as well so just to motivate you know the confidence intervals you might think oh gosh tech firms have lots of data you know so why would you care about expressing uncertainty but in fact most tech firms are currently optimizing and so if you if you walk in as an intern and say hey i got this great new idea to make google better you know the chances are that you can't or the effect will be very low um and especially if you're trying to learn you know how does some new treatment work for some people rather than others actually the effect sizes are often small relative um to the noise so even for google even for bing you still have trouble um finding signal um when for for uh for new treatment effects and especially when there's noise and so it's important that you have a way to express whether the result you found is just spurious and sampling variation or whether it's it's a real effect um and so the challenge the challenges here are twofold then i want to have a use modern machine learning if possible to find heterogeneous treatment effects but i want to do it in a way that's replicable and reliable and allows us to express uncertainty other motivations for doing heterogeneous treatment effects so just broadly how would you use those results one is to systematically identify subpopulations um to say oh these people should get the drug and these people should not get the drug or i want to release my algorithm for this group of people you might also want to understand the mechanisms so if if somebody an engineer releases a new algorithm i want to understand well how does that work and and knowing for whom it works gives you a lot of intuition about how it works and especially if you're an environment for doing ongoing innovation that can be helpful similarly if you're a behavioral scientist and you know i'm working on an experiment now around fake news we're going to give interventions to see try to get people to stop sharing fake news if i want to know why it works it's helpful to know for whom does it work so if i know that it works well for highly educated older people that already helps me narrow down the set of mechanisms by which the treatment works and that can also help me then design the next experiment on the other hand if it doesn't work for a certain group then maybe i need to go back to the drawing board and come up with a different kind of treatment so it's it's important both for for i would call it understanding mechanisms is really scientific understanding and that's where we we really care about the heterogeneity itself and then the second goal is well i just want to make a decision and i want to decide who gets treated so this idea of making a decision is what we call like optimal policies so um and and so the optimal policies are different the objective of optimizing an optimal policy is different from heterogeneous treatment effects um in some core ways so just as an example you know i might have a a drug that works better for the control than the control for say everybody over 60. it might work even better for older people than it does for people in the six so it might work even better for the 70 plus people than the 60 to 70 people but if it works well for both my policy is they should both get the treatment so i don't really care to distinguish how well it works i just want to know that it works on the other hand if i'm trying to come up with mechanisms or figure out you know where should i how should i do new drug discovery or where should i invest further in the future in innovation then it really matters that it who it works amazingly well for and who it works just a little bit well for so these are really different scientific objectives different statistical objectives as a result even within the optimal policy space there are different kinds of questions you can ask so i might in some settings need to have a fairly simple treatment assignment rule that's customized to sub-populations even in the tech firm case there's a there's a cost in terms of milliseconds of response time as well as coding and maintaining code if i if a person comes in and i have to actually call an algorithm and have a function call that evaluates something and then returns the result for that individual it's actually going to be much faster if i just have a lookup table that says okay is this person in this segment i can just write an if a case statement that says if they're if if a then do this if b then do this if c then do this and if it's a simple case statement it'll evaluate more quickly be easier to maintain also easier to evaluate and monitor over time across those segments if i'm trying to design define a treatment assignment rule for doctors or polices they might need to memorize things like here's the situation where you do x or an ambulance technician might need to memorize a set of rules um we might also need to post on a website you know here are the eligibility criteria and i might need to display a tree that says well if you have this you know you know income or this former test score you would be eligible so i might need to come up with a simple rule in other cases i might be able to do something that's customized to an individual so if i'm doing sort of computer-aided decision making where uh i'm i'm gonna be able to you know put all the data in and get a recommendation back out um i could have a very complex decision rule one that's that's difficult to describe so if i'm in a doctor's office and i've got an algorithm that looks at all of your information and then comes back and says here's the recommended drug the doctor doesn't need to understand why it works any more than they need to understand why laboratory tests work they just need to know you know here's the recommendation and can i trust the overall system is is the overall system effective so in different settings you might have different you know values and constraints that would lead you to how to estimate a simpler or more rich assignment rule so big picture from the slide is something i think it can be really confusing when you try to read the literature um with papers one by one and there's this whole bucket of like heterogeneous treatment effects you don't really know why you're doing what you're doing so i just want you to take from this that actually you should start with your goal and then choose the appropriate method and question for your goal once you understand that you also should should ask yourself is the method i'm using optimized for my goal and the break great thing about machine learning is that you can specify your objective functions and optimize for them um and so you know we you want to make sure that you've you've thought through exactly how those line up to get the best results so you know you've i'm sure you've all coming from machine learning you've studied one of the simplest machine learning models regression trees where you partition the data so this is an example x1 x2 in a traditional regression tree what you would do is try to separate the data according to the um to make the outcomes similar within a box so you know we if you want to find rectangles where everybody has a success or rectangles where everyone has a failure and this is an example of supervised machine learning where the objective is just to optimize goodness of fit one of the things about when you optimize this you would do something like cross validation to determine the depth of the tree you have an objective for that cross validation and you know and so when you do that cross validation you're trying to optimize goodness of fit you can hold out either a test set or a fold that's used for cross validation use estimate a model on part of the data predict on the other part of the data and see how well you did that's a core tenant of of supervised machine learning one of the first challenges we faced in trying to to bring supervised machine learning techniques into causal imprints is that for causal inference problems you don't actually observe um the ground truth in the test set so we're not all walking around stamped on our head what's my treatment effect of the drug and if you put me in a test set you wouldn't know for me individually what's the treatment effect uh for me and so if you predicted that i had a high treatment effect in the test set you wouldn't be able to evaluate whether i did or didn't so we had to figure out ways to overcome that and the tldr on that is that in actually for different settings we've come up with different objectives that can be estimated in a test set but in general you're going to have to do some averaging in the test set in order to assess the the quality of your predictions um it's it's difficult to do it sort of one at the time so the first paper that we worked on here and this is it's kind of funny because when i started writing this paper i didn't actually know nearly as much about either about machine learning now and there was a lot of the theory that hadn't been worked out but the funny thing is this is still one of the most popular algorithms that we've come up with it's very very simple um and and it's one of the things that tech firms implement uh but it was also again inspired by what seemed to make the most sense for this category of applications so the idea of causal trees is to to do regression trees but where your objective is to optimize for heterogeneous treatment effects and just to highlight why heterogeneous treatment effects is a different objective than supervised learning there are many many situations where the factors that would predict outcomes are different than the factors that predict treatment affect heterogeneity so for example you know your parents income and education is going to be highly predictive of your educational achievement and in a big university you know richer people with more educated parents might be more likely to do well um you know in a course but the factors that make say a flipped classroom better than an in-person classroom might be different than parents income and education it might have to do with the way that you process information and so if i want to think about the treatment effect of having an in-person class versus a flipped class it might not be the same factors that produce that that relate to your the level of performance and so if i train to supervise learning model just trying to predict your outcomes it would load up on the factors that predict outcomes well it might be a completely different set of covariates about your learning style that um affect you know whether you do better or worse with being able to view videos at your own pace versus have to go with the pace of that annoying instructor and deal with their uh technical problems um so we we uh so it's a different objective we wanna and in a case where we have you know limited power it matters how you split your trees if you had tons of data and you know you would you wouldn't really the objective wouldn't matter much you can split on both characteristics and eventually you split on everything but if you have more limited power these trade-offs make a difference so we would argue that our goal for heterogeneous treatment effects would be to create subgroups that have that are say one group that has high treatment effects and other groups that have low treatment effects and so my goal might be the mean squared error of treatment effects i might take the expectation over a test sample of the squared loss which would be the difference between an individual's treatment effect and their predicted treatment effect now the problem with that is as i said this tau i is unobserved so we don't unlike if this was if this was a y if this was an outcome this would just be something we could read off the data but instead tau i is unobserved it's this isn't an infeasible criteria um and so in in all of these different papers a lot of what we boil down to is well what's what's a good operational criteria we can use in place of this infeasible criteria for the case of of trees the problem is actually relatively simple because in the end the this tau hat is going to be constant within each leaf because it's constant within each leaf then this squared loss will basically just have you know three terms one is the tau i squared but that doesn't that's just a constant it doesn't it's just you know it for any different estimator tau hat the tau i squared would be the same in each case so i don't actually need to estimate tau i squared and then the second term would be a tau i times tau hat like this is the minus 2 times tau hat times tau i but since tau hat is actually going to be constant within a leaf then in in in the end to estimate this objective i'll just need to know something about the average value of tau i within a leaf because it's going to be multiplied by something that's constant within a leaf so that's actually a fairly simple problem and so we can do a pretty good job estimating this objective for any given tree now in cases where that's not the case where our tau hat isn't constant over a group there are other ways to go about this and in various papers we we propose different objectives and then um my co-author uh stefan wagger has a separate paper with something which he calls the our learner objective um so so they're they're not having that be constant is not a deal breaker it's just it makes it particularly simple in this case which is one reason in a longer class i like to start with this because you can actually just directly derive how to estimate this objective um so a second thing that that we wanted to think about in this paper though is that we worry about you know data stooping and data mining like that we're going to look at our data and find spurious results and so our solution to that and this is something where my first idea i thought there must be 10 better ways to do this but it turned out the first idea actually may be relatively the best idea it's actually hard to improve on this is some kind of data splitting and so basically what we want to do is split the sample and find the heterogeneity on half the sample and then estimate the heterogeneous treatment effects on the other part of the sample and what that's going to do is it's going to avoid this case where we might overfit to an outlier so in principle like if brady was had like a very high outcome then if he has a high outcome it's also going to look like he has a high treatment effect and so i might want to find all the characteristics about him like you know people you know men of an age with glasses and i find enough characteristics about him and then i find some other people like him treated in control who have average outcomes and then that group is going to look like it ha if he's treated the group is going to look like it has high treatment effect and if he's control it's going to look like it has um you know a negative treatment effect because if i just have one guy or a couple guys with high outcomes um i can i'll build a group around them and i'll find the things they have in common and the way look at this i could i the this machine learning algorithm can find the four or five people with the highest outcomes if there's four or five treated people with high outcomes i'll build a group around them i can find something they have in common and then it's going to look like there are high treatment effects for that group but if i went to another sample i wouldn't replicate that and so that's what i'm concerned about and so the data splitting um helps with that the cost is that it means that you're going to build a shallower tree and have less personalized predictions but the benefit is that you'll have valid confidence intervals for your results with coverage rates that do not deteriorate as the data generating process gets more complex or is more covariates are added and that's basically because you'll do cross-validation you'll build a tree that is not too deep and then as long as you've defined groups if you've defined those groups in one part of the data i can test whatever hypothesis i want about those groups in the other part of the data and as long as there's you know 30 observations in the treated and control group in each leaf you know the law of large numbers kicks in and i can just do standard asymptotics um and so one reason i think that this is appealing to both a b testing people and to social scientists and others and medi medical people biostatisticians is that they don't have to make any theoretical arguments it's just completely clear that you can test hypotheses in a valid way from this so there's no arguing with your referees or your reviewers you know it's just it's just obvious that it works um so um and again it also it doesn't require any assumptions on the data generating process other than like if that it's a randomized experiment or we'll come back to unconfoundedness and it doesn't require any asymptotic theory other than having enough observations in each leaf so let me just give an example of this that shows the importance of data splitting this is an experiment i ran on the bing search engine where in in the context of an any trust investigation against google we re-ranked search results to try to see the effect of manipulation so in this experiment we took the first result in the third result and flipped them so the control group had the results ranked in the normal order the treatment group the first and third result were flipped and what we're looking at is the treatment effect on that first result in terms of clicks so how many fewer clicks does the result get when i take it to the third position and then the characteristics are the characteristics of the search query so is the query about celebrities does it look like it's an image is it a navigational query does it have something to do with wikipedia references and so on and so the average treatment effect was about um minus 0.125 which would say that you know that's um on a scale of zero to one it's saying that you're you're losing about a 12 percent of your of your 12 percentage points on the click-through rate which is about half your clicks so the on average from that manipulation but we find that in some boxes the average treatment effect is almost close to zero like image celebrity queries manipulating the results didn't matter and that's basically because like if you're trying to look for a picture of britney spears then you know you click on the picture of britney spears and it doesn't really matter you know what the other results are about while things that were classified to be high on wikipedia reference had larger than average treatment effects and we see the proportion of of the of the observations in each box and the standard errors and you can see how a tree like this would help you kind of gain insight about where manipulation might be important or not and this is the rest of the tree now what i want to show here and actually do i hear it's easier to show it in a picture is that what i've done here is i've said for there's a bunch of leaves in the trees so each of these pairs of bars it corresponds to a leaf in the tree and i've ordered them in order of the magnitude of the estimated treatment effect in the half of the sample i used to create the subgroups so if i use the same dda to create the subgroups and then do the estimation i would get the blue results and these are expressed in terms of deviation from the average treatment effect so this first blue bar would say that you know my i've got a leaf where where the deviation of the average treatment effect is minus 0.15 remember that the average treatment effect was about 0.125 so that's a very large deviation from the average um on the other hand the orange bars are closer to zero and that's showing that when i take those leaves that were created in one part of the sample and i go to the other part of the sample and i estimate average treatment effects for each of those leaves i get more moderate results i get results that are closer to the average and that's what you expect if you're over fitting to the original training data now one reason i use this example is this experiment had about 800 000 people queries in it and about 20 leaves so it would seem like that these overfitting considerations wouldn't be there but there's also a lot of covariates and the machine learning algorithm has lots and lots of ways to create different um partitions and so it's basically saying even with very large data sets you still have to worry about this problem so it's one reason that some form of sample splitting and i'll come to crossfitting later is something to really think hard about if you're ever trying to do inference if you're trying to if you wanted to say hey you know this particular leaf is different than the average and i think we should do something about it and it's not spurious then sample splitting is going to be a much more credible way to um to to illustrate that okay so let me pause there any questions on that and i think i would just say that you know this is not the kind of thing that you think about in machine learning a lot because a lot of times you're not testing hypotheses so this this idea that you would throw half the data away sounds kind of crazy like we've been doing you know regression trees for 30 years we haven't been throwing away half the data before but here now i'm asking about i want to i want to get unbiased estimates of something and test hypotheses and that's why that's where this would matter yeah did you have a question brady or did someone have a question uh well i don't see any in the chat yet but i do have a question it was just about you mentioned that there's no arguing with your reviewers in one bit it kind of fixes having to argue with your viewers i think it was around large numbers or central limit theorem or something what was that part sure um so basically you know in the results i'm going to show you next um about causal forest then you know we have a theorem that says as n goes to infinity you know under these conditions on the data generating function you can get a personalized estimate and put a confidence interval around them but if you're trying to do non-parametric estimation so i'm trying to get an unbiased estimate for you brady you know with glasses and you know a man of a certain height and everything else about you um the fact is that i'm probably not gonna have a lot of data about people just like you you're unique you're a special flower and i can't find you know a lot a large group of people just like you um but if i if i think about um you know a uh you know so even though asymptotically i can get a good a good confidence interval for you the coverage might be bad you might need a lot of data to get there what the in this in this causal tree kind of setting we're going to take half the data to construct the subgroups but then on the other half of the data if i have 10 subgroups i just have 10 numbers i'm estimating i'm not actually trying anymore to estimate the full tau of x surface and so to estimate those 10 numbers i don't need that much data and in fact if i say i want if one of my subgroups is min over 65 if i if if god gave you that group and said here's a data set with you know 10 000 people tell me the treatment and i have a randomized experiment tell me the treatment effect for the men over 65 all i need to have is like 30 men in the treatment group and 30 men in the control group in that in that segment and i can just take the simple averages and so the statistics around you know when can i test the hypothesis using a difference of means are very simple like i just need the law of large numbers to kick in for two sample means and the normal distributions so it's completely transparent and the only complicated part was that we used a data-driven subgroup but because we chose that subgroup in a different data set i there's no there's nothing different um in in how i apply that in the second separate data set sounds good and there are two there are two uh quick questions i think they're fast questions one is can you uh use trees to do individual treatment effect estimation and then yeah i'm going to talk about that next and yes you are trading off um because when you when when you do this sample splitting you're basically you know you're getting a smaller sample and you're you're you're not going to be but you're you're not going to be able you're going to you lose mean squared error because you're going to be using less data you won't be able to have an expressive a model um so losing half the data hurts both bias and variance from that perspective but it allows you to get consistent estimates with a good with a good confidence interval so honest estimation we split the data we take half the data to select the model like if we were running a lasso regression it would be like using half the data to um you know to to cross validate and then figure out what variables go in the regression and then taking the other half of the data and just estimating a simple regression with the selected coefficients in this case we're using half the data to build a tree and cross validate the tree and then once you know the leaves you go to the other half of the data and you estimate the effects in that in those leaves and so in in a randomized experiment within each leaf you're just going to estimate a treatment effect simple averages treated versus control i'm not going to talk now about how you would do it in an unconfounded experiment but we'll come back to that that's and honestly though if you have unconfoundedness the trees are a little bit less elegant because it can be it can be um harder to do it well um okay how do you do with highly unbalanced data um well in our s that's one of the reasons it was helpful to write dedicated software because we do want the splitting to take into account that you need enough treated and control observation in each leaf and so our splitting has two ways to do it first of all we just force there to be enough treated and controlling each leaf but second we also have a criteria part of the criteria that we use for optimization incorporate the fact that if you start having very few treated in a leaf you're going to have a very high variance estimate and that's going to be bad for the performance for our criteria function so our criteria function will will penalize a leaf that doesn't give you a good um so you have to look at the paper for more details about that okay so let me move on though to doing an individual um a more personalized treatment effect and again i actually have to be careful with the language here it's still a conditional average treatment effect conditional on your x's um but now i want to do it conditional on a single x rather than on a subgroup so um when we go from trees to random forests well for a predictive model the forest idea is just you build a bunch of trees and you average up the predictions you average them up by either bagging or subsampling the training bagging is sampling with replacement and sub sampling is is just taking a subsample of the data it turns out that from a predictive perspective it doesn't really matter that much whether you subsample or choose a bootstrap sample um but from a from a theoretical perspective sub sampling is going to be much better why is it much better well it's going to be much better because as long as i'm sub-sampling i might as well sub-sample twice so i might as well sub-sample one sample to build a tree and subsample a second sample that doesn't overlap to estimate the treatment effects or the make the predictions within the leaf and so we actually show that even for a predictive model that sub sampling and doing honest sub-sampling so taking one section to just to determine the structure of the tree and a second sub-sample to do predictions can perform better in a variety of contexts not always because there's some context where you're really sort of short on data and you really want to use all of your data to build a very expressive tree like if there's a very strong signal then you're not going to make any mistakes and you just want to be expressive as possible but in an environment where there's more noise and you're more likely to make mistakes and be prone to sort of overfitting considerations then this honesty can work a lot better it also can work better on the boundaries it has a variety of properties so causal forest or what we propose in this jazza paper we use subsampling to create alternative trees um plus a lower about bound on the probability each feature is sampled um we the causal tree has splitting based on treatment effects and we estimate treatment effects within the leaves and it's going to be honest because we're going to have two subsamples one for tree construction and one for estimating the treatment effect at each leaf and then we're going to output predictions for tau of x so um since the accuracy of the treatment effects again can't be directly verified during a test set um the statistical inference is going to be crucial for that end we're trying to achieve asymptotic normality so we would like our estimates to converge the truth and we'd also like to be able to do confidence intervals and so those are the things that we work on in the paper i should say that actually it turned out to be easier to show asymptotic normality than it is to show that you're centered on the truth um and what was one of the things that was important about our paper is that we could do both we could show that you know tau of x converges to tau and or asymptotically normal just so i'm not going to give you all the theory of that today it's i think it's hard to do that in an overview lecture so you'll need to dig into that i should mention that i have much longer tutorials um on my that are available i have longer videos longer tutorials longer slide sets so if you want to get into those details um one of the easiest ways to get to it if you just google susan athey machine learning aea that's for american economic association you can find a link to a drive that has lots of that material and i'm working on a website i know it should just be on my website but i go to i work in a business school where they won't let me build my own website and and take a year to put things up so apologies for that um so the uh so that's what we're trying to do and then let me now show you some examples of applying this method so you kind of get a sense of what you could see out of it so before i showed you a causal tree that had sub subsets and treatment effects now i'm going to be plotting treatment effects that can vary continuously with the covariates so in this case there was a randomized experiment where of asking people questions about their political beliefs or their beliefs about social issues so treatment a was you said hey how do you feel about assistance for the poor treatment b was how do you feel about welfare now if you're not from the u.s you might not know that those are the same thing welfare is assistance for the poor but there's been a lot of branding and language around welfare while welfare is bad and it's for lazy people while assistance of the poor doesn't have that same connotation and so the experiment was going to determine how much more did people like assistance for the poor framing than the welfare framing and this is um basically the outcome is about um whether you're in favor or not and we see that there's actually very large treatment effects so you know 25 percentage point increase in whether you like uh assistance to the poor relative to welfare but the effects are even that's for the the most liberal and least um rich people in the sample but the effects are substantially larger closer to 40 percentage points for the people who are most conservative and have the highest income and in the paper we're able to test the hypothesis that those are in fact different but so the idea here is that you know if you do if you estimate these causal forests you can do heat maps like this and try to just let the data tell you where the treatment affect heterogeneity lies and the fact that we use honest estimation means that we can trust it um so another example i'm working on now i haven't haven't published this yet but but soon soon um we're working on a retirement reform in denmark and in this case there were um they were bi it used to be that you could retire early and get retirement benefits and then over a period of years they made you have to be older and older and older to get these early retirement benefits and so you can see that the um the early this is the on the y-axis is how much what employment rates are and for the um oldest people they were able to they had access to the program and so as soon as they hit the age 60 the employment rate in that cohort went way down but later they didn't have access to those benefits and so a lot of people kept working until they were 62. now we we started looking for treatment effect heterogeneity in this group and this we have administrative data from denmark's we basically have all people in denmark and everything about them and about their spouses and about their children and about their health care and about their benefits for like their whole lives because that's what you do in scandinavia um which makes it easy to do research um if you get access to this data which of course is closely guarded and carefully protected um so i have a collaborators in denmark working with the data and so one of the things we see here is that that for people with lo with their uneducated on the left um there's a lot of them were retiring early before and are thus very affected by the reform while papers with a master's degree were not retiring early to start with and were more likely to be employed and the treatment effects of the reform are thus much lower they were working anyways and they kept working here's another example from income and again lower income people were often retiring while higher income people were not and that might be a little counterproductive because you give up a lot if you're high income and you retire but of course the jobs of low-income people are also less attractive so we also look at like characteristics of the job so then we apply the machine learning to this to try to understand generally what's the treatment effect heterogeneity look like and we compare using random force we call generalized random forest is our general causal forest algorithm and we see the blue is the histogram of estimated treatment effects with that versus an ordinarily squares regression when we get similar results from a lasso in the peach and so this is showing one of the advantages of the causal forest is that it does a better job fitting the functional form and it doesn't sort of accidentally extrapolate people that have negative treatment effects when we know that in general the treatment effects would be positive and here we're saying a positive treatment effect means that you're likely to work and you can see that all of our predictions are positive and it's intuitive that you know being taking away a retirement benefit shouldn't make you more likely to retire we also can see that our model we can look at the calibration of the model and in order to um assess whether it's performing well and so in addition to using statistical theory we can also try to go directly to a test set to assess its performance but one of the things that that i want to i highlighted before was that wait we i thought you said we don't observe things directly in a test set we don't observe things directly in a test set for any individual but so then a tactic you can use to assess the calibration of a causal model is to use your model to put people into groups so here we've put people into five groups according to the quintile of their treatment effect then we go to a test set and we assess using standard causal inference tools and here observational tools like we're using doubly robust treatment effect estimation tools under unconfoundedness we estimate what is the treatment effect in a test set for the for the people that we assigned to be in based on their covariates the high treatment effect group and we see that our model is in fact able to separate out people according to what we estimate in in the in the test set i want to highlight we can't test our own confoundedness assumption we made it in the training data we made the unconfoundedness in our test data so what we're really assessing is whether our model is under the under compounding consumption assumption are we distinguishing people with high treatment effects from low treatment effects is our model just spuriously saying hey you look high you look low but it's not accurate versus have we actually achieved our goal in the test set so way the first way is to put confidence intervals around them but if you're worried about sample size and asymptotics the second way is just to put people into groups and then calculate their average treatment effect in a test set it doesn't again doesn't test your your assumptions about unconfoundedness it's testing the performance of your model at personalization and it's sort of assessing whether your your asymptotics actually have kicked in and makes sense but this type of exercise is crucial when you're estimating heterogeneous treatment effects and i do this kind of exercise with all of my papers now then another thing we can do is we can try to look at the differences in the characteristics of of people who have high and low treatment effects so here just to see the kinds of of things we do with the output we can say okay here are people that our model says or high treatment effect there's um a high share of employee and the people that have the highest treatment effects and there's very there's um you know low share of managers so again the same managers keep working but employees quit if they have the opportunity and then we also see again that that people who are high wealth high income are less affected partly because they just keep working no matter what while it's the low income people who apparently don't like their jobs and try to quit as soon as they can as long as they have benefits to support them okay so we're coming close to the end um let me say i actually i'm not too surprised but i i didn't have much time to talk about estimating treatment of assignment policies i'll share the slides for this and i have some notes about that and then also if you go back to my google drive you can see longer sets of lecture notes around that and i also want to remind you that i have tutorials available as well our scripts i have a github full of data sets so you can take my tutorials and run them on data sets experimental data sets and observational data sets i have available in my public github that will allow you to test out some of these methods so let me pause there and take questions the first question is just about the link to your drive which yeah we'll post that yeah sorry i should have uh put that up uh beforehand i'll find it while other people are asking questions if they have any looks like someone's already put a jeremy's already put a link to a some drive folder great okay wonderful and then i can also um add in uh this is a this is a link to this is actually pretty outdated now already i guess because things move fast so this was january of 2018 but there's also um two days of videos there as well um did i um so what did the heat map allow us to conclude on the experience of welfare and assistance to the poor so the heat map was basically saying that rich people and conservative people are happy to give assistance to the poor but they don't like welfare while poor people and liberal people also feel that way but less strongly and so that again the treatment was which way were you asked the question and the heat map was showing the magnitude of the treatment effects and it was saying that the treatment effects were larger for conservative and rich people and this is something we don't cover in the course at all but so the to keep that in mind in your answer the question i have is on that last one was it statistically significant the difference between the two was that an important thing that you were able to show yeah so actually i would say that you know my my thinking and advice has evolved so the the heat map on the general social survey was something that we developed in our first causal forest paper in jazza and there we focused the main point of that paper was the asymptotic theory and the confidence intervals and so there we just we we estimated the model we took the model seriously we took our confidence intervals seriously and tested the hypothesis that those are different now if i was going to go back we have actually in that paper we actually emphasize that coverage of our confidence intervals can be poor in practice and that's because especially for people on the boundaries or if you just don't have enough data um and and what basically happens is again there's we don't have enough people just like you so the estimates are biased towards the mean so our variances are good but we're just not able to get an estimate for you that's really centered on the truth for you and that's really a fundamental problem with non-parametric estimation it's not special to us and it mean any kernels would have the same problem and people have actually tried to come up with ways to fix it for 20 years and haven't come up with good ways so now what we would tend to do is is before we would take seriously those hypothesis tests we would first do more calibration exercises to i could take that same heat map i could actually discretize it and i could hold out a test set and estimate the treatment effects in each cell separately and and then assess whether my model is well calibrated the calibration is saying are my estimates centered on the truth if they are centered on the truth then i trust my hypothesis tests but if they're not centered on the truth if they're flattened which they often are especially on the extremes then i'm not going to trust them i should mention one other thing that we've done to deal with these biases is we have a paper about local linear forests and in that paper we do we we do like a local linear regression it's based like a bias correction because one reason that my estimates are sort of biased towards a mean is that my leaves are not small enough in each of the trees but what i can do is if i estimate a regression model and understand the relationship between covariates and outcomes a little better i can do some bias correction to see like if you're on the edge of the leaf versus in the middle of the leaf i can correct for those differences and so if you're on the edges of the sparse part of the covariate space that's where you're going to get those biases most commonly and so the local linear corrections can do quite a bit on the boundaries it's like even if you have a very big data set you still never have enough data on the boundaries sort of and so the local linear corrections can help with that but sadly they have very bad they're they're very difficult computationally so even i even though i think they're much better i don't use them in all of my projects because they're too slow um and that would speeding them up would be would be great um okay so somebody was asking about bandits um so i guess actually i i'll i'll send you guys the slides i actually had a few slides about bandits i've been working more and more on bandits in my lab one of the points that is a simpler point to make is that actually bandit performance itself can be improved using the insights from causal theory because bandits are trying to solve a causal inference problem and they're actually creating confoundedness in the data and that if it's a contextual bandit i might be giving treatment a more to old people and treatment be more to young people when i go to if i go to analyze that data either inside the bandit or after the experiment if i'm not careful i can bias my own estimates and those can persist and actually you can have failure of bandit performance and that doesn't contradict existing theory because existing theory actually assumes you have the right functional form most of the time and and but if you don't have the right functional form and you have unbalanced data you can go very wildly wrong so we've been working on on that problem we've also been working on um trying to estimate policies after using bandits and so we've developed some new theory around that so again you can check out some of those papers on my website and i've got a little less teaching material on that but i i have some and then actually what would be the effect of getting the causal graph wrong conditioning on the collider and using it as a covariate so in general there's there's like really just a big division in in in topics like there's how do i identify the model what are my identifying assumptions are those assumptions right and then you know there's how do i estimate my models in general the innovations the computational innovations from supervised machine learning are about doing a good job estimating and then where the novel insights have been is well supervised learning has an objective and if i give it the wrong objective it's going to have bad properties and if i don't do things like sample splitting and what we call crossfitting in a forest where we um think or you know leave one out estimation of nuisance parameters if we don't do some of those things then we have bad statistical properties but the identification part like if have you have you made the wrong assumptions um no this doesn't help with any of that and those are really separate things and in economics we actually often have separate sections of the paper here's the identification section like this is about having infinite data here's the estimation section this is how what we do and then we often have a supplementary analysis section where we try to assess the validity of our assumptions by bringing in auxiliary data or finding supplementary facts or implications of theory that would support our assumptions and so the conditioning on a collider thing is really very s it's a important issue um it's just a separate than the estimation techniques and then i think we're at the time that i'm not sure if you have another thing to go to there's there's three other questions that were above those two but we can end here if you if you need to go that's fine um i can i can take one or two more so let's see um okay so what if you're not interested in variance estimation um so one of the things that's kind of fun it's not always true it would be even more fun if it was always true and i had a theorem about this but it's still kind of fun is that actually the the grf performance um the our honest random forest even in pure prediction tasks often but not always um will beat a traditional random forest algorithm not always um and so that's so it's worth considering these honest techniques even if you were not interested in the asymptotic theory and that's one one thing about theory is sometimes it can guide you to better algorithms and so some of the weaknesses of traditional random forests are that they over fit to outliers they do very bad on boundaries um and so on and so these we have sort of natural built-in protections against that in the honest estimation but there are cases where the performance will be um worse as well so you just don't know so if you use our grf package our performance is is as good or better than almost anyone on the web and so you can try it we have an option turn on honest on turn on us off and you know see what does see what works better um in terms of um in general though if you're interested in accurate predictions you you might also open up to you know neural nets for example but again whether neural nets work better than forests work better than lassos in a particular setting it really depends on the context in a lot of economic data we have monotonicity properties like you know we we have structure things are smooth we don't have like an ear over here that we're trying to find over there so actually the flexibility of the neural nets can sometimes hurt you now as neural nets actually get better and better at imposing and enforcing and utilizing structure that trade-off may change but i would say generally um you know a lot of our estimators i didn't get into them today you can have plug-in estimators where you estimate nuisance parameters like propensity scores with machine learning models and there it's just a matter of the data set to see what works better i'll also put out an advertisement for another paper that we've just published on using gans to create simulated data and so that's partly to address the problem that we never know the ground truth in in causal data but we we use again to and we train it again on a on a data set that we have then we create simulated data sets from the gan where you but where you know the ground truth you simulate both potential outcomes and then you can check which of the different causal inferences approach work better um using that simulated data and that's a way to assess and in our paper like in some cases neural nets work well sometimes random forests work well we try a couple different data sets and and so that's just to say there's no general rule about which one is going to work better let's see so is that uh is there something else i i missed our package in python anybody want to build it with me come sign up um we have all of the code for grf is in c plus and we just have an r wrapper around it and we don't have a lot of funding for this enterprise so um you know building a python wrapper would be awesome and we just haven't had the resources to do it all right well so there's several people thanking you in the chat thank you very much for giving this talk and answering everyone's questions great thanks so much it was really great to and thanks for all the questions i'm sorry it went fast but again please feel free to check out my my slightly slower paced other resources if you want to learn more thanks

Show more

Frequently asked questions

Learn everything you need to know to use airSlate SignNow eSignatures like a pro.

See more airSlate SignNow How-Tos

How do I sign PDF files online?

Most web services that allow you to create eSignatures have daily or monthly limits, significantly decreasing your efficiency. airSlate SignNow gives you the ability to sign as many files online as you want without limitations. Just import your PDFs, place your eSignature(s), and download or send samples. airSlate SignNow’s user-friendly-interface makes eSigning quick and easy. No need to complete long tutorials before understanding how it works.

How do you add an eSignature to a PDF?

Using files Portable Document Format makes eSigning more straightforward. When you use an advanced service like airSlate SignNow, the signing process becomes even more comfortable and fast. Sign up, upload a file, create your eSignature, and send the document for signing or download it right away. The interface is pretty simple and literally anyone can use it regardless of their computer skills.

How do I add an electronic signature to my PDF using a Signature Field in airSlate SignNow?

All you have to do is add fields and collect signatures from recipients. To get started, log in, open a document, and add a signature field by clicking on Signature Field. After that, send it to your recipient and they’ll be able to generate and attach their very own eSignature. They can choose between typing, drawing, or uploading a photo. All three ways are easy to do and are all legally-binding. airSlate SignNow is one of the best solutions on the market. Get started now!
be ready to get more

Get legally-binding signatures now!