Collaborate on Account Invoice for Engineering with Ease Using airSlate SignNow
Move your business forward with the airSlate SignNow eSignature solution
Add your legally binding signature
Integrate via API
Send conditional documents
Share documents via an invite link
Save time with reusable templates
Improve team collaboration
See airSlate SignNow eSignatures in action
airSlate SignNow solutions for better efficiency
Our user reviews speak for themselves
Why choose airSlate SignNow
-
Free 7-day trial. Choose the plan you need and try it risk-free.
-
Honest pricing for full-featured plans. airSlate SignNow offers subscription plans with no overages or hidden fees at renewal.
-
Enterprise-grade security. airSlate SignNow helps you comply with global security standards.
Discover how to streamline your process on the account invoice for Engineering with airSlate SignNow.
Seeking a way to streamline your invoicing process? Look no further, and adhere to these quick guidelines to effortlessly work together on the account invoice for Engineering or ask for signatures on it with our user-friendly platform:
- Сreate an account starting a free trial and log in with your email credentials.
- Upload a file up to 10MB you need to eSign from your PC or the cloud.
- Proceed by opening your uploaded invoice in the editor.
- Perform all the necessary actions with the file using the tools from the toolbar.
- Click on Save and Close to keep all the modifications made.
- Send or share your file for signing with all the necessary addressees.
Looks like the account invoice for Engineering process has just turned easier! With airSlate SignNow’s user-friendly platform, you can easily upload and send invoices for electronic signatures. No more producing a hard copy, signing by hand, and scanning. Start our platform’s free trial and it simplifies the whole process for you.
How it works
airSlate SignNow features that users love
Get legally-binding signatures now!
FAQs
-
How do I modify my account invoice for Engineering online?
To modify an invoice online, simply upload or pick your account invoice for Engineering on airSlate SignNow’s service. Once uploaded, you can use the editing tools in the tool menu to make any necessary modifications to the document.
-
What is the best service to use for account invoice for Engineering processes?
Among various platforms for account invoice for Engineering processes, airSlate SignNow stands out by its easy-to-use layout and comprehensive features. It optimizes the whole process of uploading, editing, signing, and sharing forms.
-
What is an eSignature in the account invoice for Engineering?
An eSignature in your account invoice for Engineering refers to a secure and legally binding way of signing documents online. This enables a paperless and efficient signing process and provides extra security measures.
-
How do I sign my account invoice for Engineering online?
Signing your account invoice for Engineering online is straightforward and effortless with airSlate SignNow. To start, upload the invoice to your account by pressing the +Сreate -> Upload buttons in the toolbar. Use the editing tools to make any necessary modifications to the form. Then, click on the My Signature button in the toolbar and choose Add New Signature to draw, upload, or type your signature.
-
Can I make a particular account invoice for Engineering template with airSlate SignNow?
Making your account invoice for Engineering template with airSlate SignNow is a quick and easy process. Just log in to your airSlate SignNow profile and press the Templates tab. Then, choose the Create Template option and upload your invoice file, or pick the available one. Once edited and saved, you can easily access and use this template for future needs by choosing it from the appropriate folder in your Dashboard.
-
Is it safe to share my account invoice for Engineering through airSlate SignNow?
Yes, sharing documents through airSlate SignNow is a secure and reliable way to work together with colleagues, for example when editing the account invoice for Engineering. With features like password protection, audit trail tracking, and data encryption, you can be sure that your files will stay confidential and safe while being shared electronically.
-
Can I share my files with peers for cooperation in airSlate SignNow?
Absolutely! airSlate SignNow offers various collaboration options to help you work with peers on your documents. You can share forms, set permissions for editing and seeing, create Teams, and track modifications made by team members. This allows you to work together on projects, reducing time and optimizing the document approval process.
-
Is there a free account invoice for Engineering option?
There are multiple free solutions for account invoice for Engineering on the web with various document signing, sharing, and downloading restrictions. airSlate SignNow doesn’t have a completely free subscription plan, but it offers a 7-day free trial allowing you to test all its advanced capabilities. After that, you can choose a paid plan that fully meets your document management needs.
-
What are the pros of using airSlate SignNow for electronic invoice management?
Using airSlate SignNow for electronic invoice management speeds up form processing and minimizes the risk of human error. Additionally, you can track the status of your sent invoices in real-time and get notifications when they have been seen or paid.
-
How can I send my account invoice for Engineering for electronic signature?
Sending a file for electronic signature on airSlate SignNow is quick and simple. Just upload your account invoice for Engineering, add the necessary fields for signatures or initials, then tailor the text for your invitation to sign and enter the email addresses of the addressees accordingly: Recipient 1, Recipient 2, etc. They will get an email with a URL to safely sign the document.
What active users are saying — account invoice for engineering
Related searches to Collaborate on account invoice for Engineering with ease using airSlate SignNow
Account invoice for Engineering
e everyone okay everyone just wait for some time okay still 10 p.m is not there let's wait for some time before I begin am I loud and clear you can hear me yeah today's class is going to be very very important in the history of snowflake so far I've never taught this topic and this topic is really really crucial be it any interview if you're are opening for any senior data engineering even for the data analytics role right good evening guys good evening everyone welcome back again to analytics with after a long time I'm coming here live right a lot of things has happened in the past few months I got married that the the first news and the second join new company tag analytics as a senior data engineer which you have seen from my LinkedIn po post and then the job guarantee program which is continuously running the new batches all those things because of that I was not able to get time to come here Live Now whatever I'm going to teach today so far I have not done anywhere in any of the bats so far in the last two three years no one has seen this or I have not implemented it anywhere on any platform only implemented from company point of view so today's class is all about clustering one of the most important features in Snowflake we'll deep dig down into the complete snowflake clustering concept what is clustering why it is required when to use it and what are the key metrics and how it is helpful for the organization when you are dealing with large data sets right so that's the overview and I will show you clustering how you can do it in the Snowflake and how with the help of ETL tool also medalian I will show you the complex workflow which will be mesmerized by seeing the spider web how while designing fact and dimension table type one dimension type two Dimension your clustering plays a crucial role so welcome back again to analytics Anand I'm here to take you on a journey on this clustering concept on snowflake so anytime you're stuck anywhere feel free to post it in the LinkedIn chat or YouTube chat from wherever you have joined right and those who are watching it later also you can comment in the uh comment section and I'll go back and check it you can reach out to my LinkedIn my number my email ID everything is there before setting up the context very very important if you go to the video description entire code entire document which I have created for you that is more than enough for your next three to five years of career down the line you won't be designing Beyond this complex workflow which I'm going to tell you it took me three months in order to design the workflow which I'm showing you straight away in the next one and one and a half hour class so be focused ask any doubt if you have and Beyond this you won't be implementing in any company if you work in the next 3 five years if you add this into your resume it will give you a Skyrocket boom to your resume trust me when I say that understand the Core Concepts of clustering after this today's session now what is the prerequisite to attend this class you should be well versed in SQL only thing is that you should be well versed in SQL you should understand what kind of joints I'm using what are the select statement I'm using how the joint condition that's all I want nothing else and how snowflake internal architecture Works which is my lecture number one and lecture number two right that you can go through it so if you have not subscribed to my channel go ahead and subscribe analytics with arand on YouTube you can also go through the website and make sure you go through each and every video of snowflake playlist Master uh sequel on snowflake which is the playlist is there right so this is the agenda for today right so do not practice with me just understand you all have to understand code code and everything I have given it to you data sets code file everything I have given to you right we just need to understand what clustering it that's the agenda for today so before wasting any time let me dig it down completely and let me share my screen quickly uh one second is zoom yeah here you go sh screen and then I'll take you straight away here let me know my if my screen is visible to you all you can hear me loud and clear is this it's relevant for data analyst it's relevant for data scientist it's relevant for data engineering it's relevant for entire data science Industry right so focus on this lecture right I want your Ultra attention anywhere if you are stuck feel free to ask right h okay can you see my screen yes you can see my screen so this is the repository as you are aware if you go to analytics by Anan Channel playlist YouTube playlist you will see the entire snowflake over there so entire documentation I have given it here also in the video description of this live session you can find it that direct link so this is the lecture number of 42 of mastering SQL on snowflake so where is 42 flor2 is clustering here all the code file everything I have given to you here you can see this all the data sets which I'm going complete business use case studies and the document also right this link is there in the description so before asking this question where I can get the data everything is there shared in the description of this video okay let me open the document straight away right you can see this I'll Zoom a bit so that it's visible for you all what second let me observe I think there's a slight delay when I'm watching it on my different screen 5 Second 10 seconds delay is which I can see it right okay so you can see this word file yes no let me know in the chat you can see the word file yeah it's visible for you all you can see the word file the complete screen which I'm sharing yeah so this is the reference link which I have taken right you know the snowflake is very good with when it comes to the documentation part when it comes to the implementation part things take a turn right so first of all let's understand clustering what do we understand by clustering I'll give you a very very realistic example let me not share my screen okay I'll stopped my screen sharing for a while now we have all been to grocery shopping right whenever the salary comes we go to dmart we do lot of shopping most of you have done have a habit of these days but we have S zato so we order I don't do the monthly shopping for those who do a monthly shopping you go to grocery store you go to a shopping mall you go to a dmart and and you buy all the stuffs you buy milk products you buy groceries you buy Cals you want some clothes right all those things now what happen when you buy all these things you go to separate segment Dairy section is different uh Dal rice section is different right uh biscuit section is different right snacks corner is different then your uh kitchen toiletry section is different what are this section basically when we talks about in Snowflake terminology these are nothing but Micro partitions which we can say in Layman terminology first I'll make you understand clustering is a very very clear crystal manner sometimes in interview this may be a question explain clustering to a 5-year-old guy explain cluster to a 10-year-old kid how you going to explain that to your interview so you have been to a demart shop where you have a separate separate section that is nothing but your micro partition when it comes to the snowflake already micro partition I have discussed in my snowflake playlist which you can go and what through now you buy certain things from there right you go to the counter right Bill comes 5,000 or 8,000 or 10,000 what is your groceries limit is there now you have taken that groceries and you entered your house where do you place those groceries by the way big big big chunks of uh carry backs you are carrying to your uh I mean family members are carrying few you are carrying few mothers are carrying few fathers are carrying if you all are going in a family to buy that groceries you all are carrying now what is the general habit when we enter our house we keep it at the dining table or we keep it at the chair right we keep it at the dining table we keep it at the chair and then we segregate it now when the number of items are less is it very easy to find suppose next day your father or mother ask or your sister ask Anan where have you placed this water bottle which we bought from the dmart of course I will tell easily right because it's a very small data sets it's a very small number I can take it easily now snowflake is very smart enough to find your data set in the last inserted order whatever the order you are maintaining if you are coming from a grocery store if you're going to the kitchen directly snowflake understand that automatically when you working on a very small data sets now imagine you have bought so many groceries from the shopping mall from the dart so you will few things you'll keep in the kitchen cupboard few things you'll keep in your bedroom cupboard few things you will keep you your dining table few keep things you will keep now next days you are not aware where you have kept things how you will find these things if I any unknown person if your friend is entering your house how he or she will know suppose if he or she needs a water bottle if if you have got one or two things it's very easy to straight away go to the kitchen and find it now when you are dealing with last data sets when you are doing a lot of transformation right in that case clustering takes a turn clusterings adds value to it now you will apply clustering only when your data set is close to 1 million more than 1 million records not below that because snowflake smart enough to create clustering on top of that right it creates micro partition so that very it's easy for the data injection process you can easily retrieve the data you will not apply clustering when you are working with below 1 million records that's the first key rule right not below 1 million records you will not apply it that's the first THB rule the first question which interview will ask you which you should ask to yourself why clustering and when to apply for the clustering when your data sets are very huge more than 1 million records then only clustering takes an action by default snowflake is very smart enough to implement clustering this is one thing right only it's applicable when you are working more than 1 million records now let me share my screen yeah now clustering in snowlake is a technique which is used to improve query performance now you are doing a complex joint multiple tables multiple facts and dimension table you are joining on joining key and that key is very very important so I can apply on that cck crusing I am telling specifically suppose if my friend is entering my house on my birthday party and I'm not available I'm I'm giving me clear instruction go to my bedroom open my cupboard in the first drawer you will find my wallet if I tell okay XY Z person go to my house and find wallet where he will find wallet he will go to my bedroom he can search dining hall he can search bedroom he can search living room he can search XYZ cupboard it will take time number of partitions which you will scan will be more time will be more snowflake works on pay as you go platform the more the number of time your query is running imagine if you're working on last data sets it will take a lot of time but if you tell your friend okay my wallet is there in my drawer in my bedroom on my studying table how much time does you say how many partition will be scan only one your study table but if you don't specify that thing to the snowflake how snowflake will know it will scan your whole house your living room your bedroom your dining table your cupboard living cupboard XYZ cupboard so this is what clustering is you are scanning the data you are limiting the number of partition the number of partition is less query becomes optimized cost will be less client will be happier right I will show you the entire thing how things in work the clustering in in Snowflake is a technique which is used to improve query performance by organizing data within a table based on specific column sir how do we know the column what do you think how you will specify the column anyone from the those who are watching us live do you have any idea on specific column it says but how do I know but how do I know on specific column how do I know a specific column will The Client tell me specific column no no one will tell you that's the job of a typical data engineer role it's you have to tell which specific column now on what criteria you will decide this specific column let's see let's see let's see it's written here it's written here right unlike traditional database snowflake tables are naturally maintained as micro partition we all know which means data is automatically distributed and stored in a highly optimized manner this is when when you are dealing with lar small data sets less than 1 million records but if your records is more than 1 million then snowflake is not that smart enough your data injection may take a lot of time so clustering helps this organization further when working with large data sets more than 1 million this is the threshold more than 1 million particularly for repetive queries that filter or join on specific column to the constraint that records has more than 1 million records got table me less than 1 million record no need of clustering snowflake is smart enough if you have a repeated ques if you have a filter condition you are using lot of filtering condition or joints on specific column you will apply clustering on that column clustering improves snowflake performance and cost efficiency we'll see the cost analysis also today in today's session especially for last data sets it is particularly avilable in the following scenario understand the scenario the first case date based filtering now we all know whenever we are ingesting the data sometimes we inest incrementally right and sometimes we do the full load also sales report you are analyzing the sales report for different week for different year region based filtering Regional segmentation I want the South Region Sales North Region sales east region sales then we can apply clustering type one and type two Dimension type one dimension where you are not maintaining any historical data you doing a full load you're completely truncating the table and you are loading a full load type two Dimension incremental loading you are maintaining a history also say customer address is changing over a period of time so whichever is the latest address that you are marking as indicator as true and the previous records you are marketing indicator as false type two Dimension right so enhancing fact Dimension which we see in today's class on relevant keys right so how clustering Works in Snowflake we have to understand this very very important the first important thing is clustering key we all know what primary key is primary key is a unique key or a combination of key which uniquely identify each records in a table that we all know say for example your Adar number your pan card number in the similar way you have to decide which is the column I need to take for clustering so a column or set of column you can have one column you can have two column also to organize the data physically now snowflak stores data in micro partition and clustering insens shoots that rows with similar values in the clustering key are stored close together by husband exactly now when you specify the clustering say for example date or reg if you are quering select star from table where region within a second within a flip of a second you will get a data set again data should be more than 1 million records your number of partition scanning will be less go to my bedroom open my cupboard open my drawer and get me the wallet that is what clustering means right when you have a very very big house suppose I'm staying in a villa for bhk apartment so many rooms so many boards I have that then it's very difficult for me to filter out my data if I specify my clustering key it becomes very very easy without clustering snowflake might need to scan large number of micro partition more the scanning more will be the workload more will be the cost your query won't be optimized with clustering snowflake can quickly locate relevant micro partition based on the clustering key reducing scan time and improving performance get it reducing scan time and improving performance automatic maintenance if your data set is increasing you don't have to worry at all I just not give a threshold 1 million records snowflake continuously manages the clustering for smaller table automatically so you don't have to bother for that if your data set is less than 1 million in any particular table you don't have to apply snowflake is not enough that's the beauty of the snowflake architecture for large table explicit clustering maintenance can be scheduled or manually triggered using the recluster operation getting my point so why clustering is important because it helps in cost diplom isation it helps in query efficiency it also helps in sorting when you are using window function and aggregation and aggregation right correct so this is the overview of clustering now let's let's understand from the business use case point of view suppose a retail business analyzing daily sales data right every day you flip cart Amazon Every Day sales data is getting your problem description queries fre filter on the sale date column we all are very querious how much sales I have made the last year how much sales I have made the next quarter previous quarter last week this week right without clustering all partitions are scanned even for a small date range right when you will cluster the table on sale date column so which is my clustering key here sale date queries for specific days week or months will scan only the relevant micro partition only the relevant micro partition only that micro partition will be scanned because they will be placed very very close together right next one say for example marketing team wants to segment customer based on region the frequently filtering by region leads to high partition scan clustering the table by region this reduces scam time for region specific queries again when you are using multiple filter statement multiple join condition whatever the query component you are using in the join condition that you need to specify as your clustering key now you all guys are well versed with type one and type two Dimension type one dimension means full load you are not maintaining in historical data you are doing truncate and load basically you not maintaining any historical data type two means you are maintaining a historical data so for example when you're dealing with type one dimension what it will do it will overwrite old data with the new data no history is maintained customer profile information where changes override previous value customer address suppose I don't want to keep my old address of customer only the latest address of customer I want to te then I will go with the type one Dimension this is the concept of data warehousing right so queries frequently join fact tables with a type one dimension using customer ID so your clustering kill will be in that case customer IG cluster the fact table on customer ID getting my point this reduces The Joint cost by ensuring data is collocated for faster lookup so internally what is happening let me show you that also internally what is what happening internally what is happening I'll show you that also see internally what it is happening from the snowfl documentation initially this was the micro partition which we all know there's a table where you have a type where you have a name where you have a country where you have a DAT so this is how the data is stored in micro partition one micro partition 2 micro partition 3 2 43 232 CC 2 43 2 32 three rows at a time three rows at a time ACC and then ukpd three rows at a time then 32 4 515 three three rows at a time it is throwing and date is coming here but when you alter a table how do you apply classing te because anyhow table will be there right it's not that you are developing a table from scratch later on the data you are seeing two billion records now the client has reached out to you can you optimize this then I will think of clustering so what I will do very very simple syntax alter table table name cluster by is a clause how do you add a primary key cluster by table called date and type I'm doing clustering on date and type I'm doing clustering on date and Ty now you see what happens you see what happens right clustering after new micro partition after recling all 2222 are together all 3 33 are together all 555 are together all one one1 are together so there will be a new micro partition file which will be created you see the date right you see the date R how it is happening 11 by2 11 by2 11 by3 11 by3 all are together right all are together so this is how reclustering happens right data and type are defined as the clustering key so two key I'm defining cluster by data and type when the table is reclustered new micro partitions are created after reclustering the same query only scans micro partition 5 11 by two so which micro partition will be scanned only micro partition 5 previously previously which micro partition you need to scan it for date type type is equals to two and date type equals to 11 by two how many micro partition one two three three micro partition you have to scan it before now there's only one micro partition imagine the workload you have reduced imagine the the workload you have reduced right imagine the workload you have reduced so the number of micro partitions micro partition 5 has reached a constant state it cannot be improved by reclustering and is therefore excluded when Computing dep and over this will see it later so this is how reclustering happens it right this is how reclustering happen all these links I have given already all this links I have given already you'll understand more when I will take you the example site right first of all understand type two Dimension where you maintain a historical data by creating new row for each change so there's a customer membership levels gym membership we all take a gym membership although we don't go it start it when you have the started and when ised sir take quarterly membership sir take yearly membership sir take monthly half yearly membership so membership status which is my effective date which is my when the membership I have taken and which is my end date which is going to end so this is what this is basically a type two denion where you are maintaining a historical data right now if we clustered this table on transaction date and the dimension table on effective table fact table Hall will have your what runtime data all your Dynamic data all your sales data and dimension table will have all your static data effective data this ensures queries that join based on date range for status validity perform efficiently basically we are doing all about query optimization the main goal of clustering is query optimization this should be your clearcut answer when asked in interview or if any person asked why why do we do clustering we do clustering in order to improve performance how do we improve performance that what we are going to see in today's class key Concept in clustering clustering key the first important concept is the clustering key on what key or combination of columns you want to apply this so clustering key is a column or set of column used to group related data physically within microp partition clustering improves query performance by reducing the number of now here the most important thing is the clustering depth clustering dep right so clustering depth measures how well the data within a table is organized based on the clustering key it indicates the average number of micro partition scanned per query for a given clustering those who are watching me live your clustering depth should be more or less for Your Efficient query performance quickly in the chat your clustering depth will be more or less your clustering depth should be more or less when you are designing this entire clustering key when you are designing this optimization performance is it will it be more or less quickly in the chat huh your clustering depth should be more or less when you are designing this entire query performance low [Music] dep lower clustering dep implem better clustering and fewer partition can what is the ideal number what is the ideal number what is the ideal number for clustering depth what is the minimum number how many partition you will scan what is the minimum number of partition which you will scan how many you will open one two 3 4 5 6 7even how many you will open but I'm explicitly telling you one one so ideally what will be the value for the clustering depth one okay not zero one zero one okay getting my point reclustering now many times it happens data sets is getting increased your clustering is not working properly your schema has been changed new column is getting added powerbi dashboard is changing new metrics is being added then you have to recluster over a period of time if the clustering key Effectiveness dorat over time due to data updates or insert reclustering reorganizes the data within the micro partition snowflake handled this process automatically using its automatic clustering features so when to use clustering the most important question now it's beneficial when when you have a lot of filter aggregate or join on specific columns second tables are very large millions and billions of rows query performance degrades due to excessive micro partition scanning excessive micro partition scanning for smaller data sets 100 kords clustering may not provide significant benefit because snowflake native microp partition and pruning are already efficient snowflake architecture has been designed so if you are dealing with 999k records less than 1 million do not go for clustering simple ja 1 million data sets bu screen you see this complex workflow you see this complex workflow can you see this complex workflow everyone on the screen I need your Ultra attention can you see this complex workflow right see dim stand for Dimension table do you see how many Dimension tables I have taken it here how many Dimension table before I'm generating the final view how many final Dimension basically I'm designing here a fact table transaction has how many records so which is the first table here you will do clustering sir transaction you are dealing with 1 billion records you are dealing with 1 billion records so the first table which I will apply clustering is on transaction second table which I apply clustering is on this two billion records third table which I'll apply clustering is on historic transaction asset will I apply clustering on De provider no sir it has 459 records will I apply a clustering on sub account yes sir 26 million 46 million 46 million dim product no this one no this one yes right this one yes this one yes so all these tables you have to apply clustering otherwise things will become a very very Hazard you know order to run this pipeline imagine this has 24 year of data sets this I have designed over a period of three months it has taken me right the entire complex workflow so if you don't apply clustering in order to run this it will take 8 to 10 days it has close to you can't even imagine how many data it has 24 years of data 272 terabytes of data it has imagine even for a one month if you run the code it will will take for a monthly data in order to create this views 1 hour 38 minutes without clustering but if you apply clustering it will take you 48 minutes imagine the power clustering holds right imagine the power clustering holds right imagine the power the clustering hold that's why you have to apply this clustering how do you apply this clustering I will tell you how you'll apply this cling suppose you have created this yes so what you'll take you'll take a SQL query you'll take a SQL query here it is and you will apply all the alter table command have a look here have a look here have a look here do you see this do you see this sir you have applied clustering on all these table do you see this on what basis you have decided on what basis of decided if I'm taking that particular joint condition let me have a look at the Joint condition then you will be able to see on what basis I have decided the clustering key right on what basis I decided a clustering key let's look at the join condition how complex The Joint condition is then we'll see how it has been designed the first joint condition let's see the first joint condition can you see this joint condition how many filters I have applied here can you see this can you see this sir you will apply on from time stamp you will apply on from time stamp you will apply on from time stamp you will apply on TR bsk key you will apply from time St and trans two columns sir you will apply clustering on whichever query you have used in your joint condition you must you should you have to apply otherwise things will take a uturn it will go on and on for eight hours 10 hours your query will keep on running and you will waste entire client money do you think how much money uh uh have gone into this while designing this entire workflow for the client for this 24 years data how much money do you think I have designed this you see the developer Anand is written here how much time do you think uh I have taken in order to how much cost client would have to bear in order to design this entire fact transaction table which you see it here the final table which I'm creating it for all these Dimension tables how much money you think in order to testing the entire 24 years of data you test with the one day data weekly data quarterly data monthly data the three months data six months data you get so many error you resolve it right so how much you think client has to be for this cost this is one final table which client wanted this is one final table which client wanted they will have the full-fledged information of all the columns which they will be having it of all the columns which they will having they need a onot information they need a onot information for all these column so that they don't have to query multiple table so how much uh do you think the amount would have been spent on this entire table any yes yes no any guess when you see this two billion records two billion records all are these I have to apply a filter here you see this I have to apply filter here in order to work on I have to apply filter here first November to 1 December I cannot run my entire workflow the entire datet it will take me months if I start going on right it will take me months and that two I was not having that much of access I was not given I was given the extra large Warehouse then only I can do this then then only I can do this then only I can do this given me a choice then only I can do this this entire pipeline was designed you see how many Dimension and fact table are here Associated this is how you work as a data engineer which no one will tell you on any YouTube channel no one can do this no one will show you this the complex workflow this is the complex workflow so far in my life I have designed in the entire eight years so in order to design this client I have calculators so 10 lakh Rupees client has to be here just to test this just to have this 24 years of data it took me three months imagine how many testing I would have done 10 lakh Rupees okay 10 lakh Rupees client has to be for this and of course client has to be here for this this is not an easy task you see so many Dimension table and each Dimension table you have to design it each Dimension table you have to design it say for example dim provider say for example dim provider first of all you have to design the dim provider right you see this dim product you see this entire thing you need to design dim provider I said all these things you need to design this you see this entire from The Source table you need to design it every Dimension table was first designed and then you have to integrate this into an internal a PIP plan this is what I teach when during my material class I'll be teaching this I'm not so far taught right thousands and thousands of millions and millions of dollars client has to bear for this cost now again it depends upon the use case right H so you see here so clustering I'll open that quickly so this is the ddl SQL query component I have taken alter table is the command how do you do that alter table command let's just focus on this I'll take you the snowflake also first understand this alter table is a command whichever database you want to design first of all the transaction transaction I'm clustering on three columns prom timestamp P bsk key and subon bsk key whatever the business key or primary key you are taking it if you have more than 1 million of record you have to apply prom time stamp is must wherever date you are taking you have to apply by default string order everywhere from time stamp is there you see this and this is the real work which you are seeing it right from time stamp from time stamp from time stamp from time stamp from time stamp from time stamp primary key whichever the key you are taking in the joint condition everywhere you need to apply it the moment you apply all this uh the moment you apply all this alar the moment you apply all this clustering on all this table and the moment you run this for monthly which was taking taking you 1 hour 40 minute straight down it will come half it will take 40 minutes under your large warehouse getting my point so whenever your data set is more than one will no one will come and tell you and the apply clustering it's your brain you have to think it's your expertise you have to tell the client we need clustering here we have to apply clustering here get my idea wherever the data set is more than 1 million record wherever you as you are using the complex join condition see here on what column you'll apply clustering who can tell on what column you'll apply clustering here see here on what column you'll apply clustering tell me on this which are the column you'll apply clustering quickly quickly which are the columns you'll apply clustering here from the chat what are the columns you'll apply clustering on come on come on quick quick quick in the chat what are the columns you will apply clustering on yes no come on tell surj what are the three columns you will apply clustering surj V Sumit David what are the three columns you will apply clustering on all what are the columns you will apply clustering on the First Column TR bsk sir the second column provider set bsk key the third column PR timestamp the third column fromom timestamp is it making sense to you now now you are telling snowflake you are telling snowflake focus on these three column store these three values together so that when the data filtering happen when the joint condition will happen then it will be clear then things will take turn right because you see I applied a minimum logic also here so many transformation Partition by whatever window function you are using whatever Partition by you are using whatever the filter Clause you using it whatever you are giving the joint condition you should use that as your clustering Keys you should use that as your clustering keys right you should use that as your clustering Keys clear right you should use that as your clustering key getting my point anytime if you have a issue reach out to your data architect he or she will help you you see this three column I have applied cluster by again depends on the business logic again depends on primary key foreign key concept uh Integrity constraint what are filtering you are doing it what is your business logic how the data is getting ingested type one dimension type everything has to keep in mind while designing this entire uh facts and dimension tables got the idea so this is how clustering works when to how to choose clustering key I have told you when you have a wide range of unique values order ID customer ID frequently filtered column just now you saw that wear Clause Group by join Clause window function Partition by right now you see this scenario very realistic scenario suppose you have a fact table you have a dimension table customer Dimension and you have a membership Dimension this is type one dimension where we are maintaining a history sorry where we are not maintaining a history and this is type two Dimension where we are not maintaining a history right correct here we are not maintaining a history here maintaining a history now in this example clustering on sale date optimizes date range query for now you have a fact table you have a fact table what do you think on what column you'll apply clustering H Vishal on what column you will apply clustering in this case on what column you will apply clustering in this case come on tell me on what column you will have clustering on this date sir sale date sir sale date sale date I will apply clustering on fact table which has all the factual data sale date right sale date next see this sale datee is it making sense to you now is it making sense to you or now now for type one dimension how do you join the fact and dimension table how do you join fact and dimension table with the help of primary key with the help of primary key with the help of primary key how I have joined this how I have joined this how I have joined provider and historical transaction how I have joined see this so many minimum from time stamp right look at this join condition right look at this join condition right provider basket key whatever the key column I have taken it get similar line here also you have to join based on that particular key you see this customer ID how the fact Dimension table will be customer ID very good V customer ID right now for type two Dimension what is type two Dimension sir you are maintaining a history when the membership starts when the membership end this is type two Dimension whenever you are maintaining a historical data whenever you are maintaining a historical data which is the column you will take for the clustering here which is the column you will take on the clustering here come on which is the column you'll take clustering here which is the column you will take the clustering here when type two Dimension you want to maintain a history effective date when the client has purchased gym membership when the client has purchased gym membership that and the customer ID these two column you will feed clustering on these two column you will feed clustering on getting my point you see it here everything has given to you code and customer ID and effective date right correct customer ID effective date clear right now quering sales for December 2024 will now efficiently scan only relevant partition this will see it in the lab only relevant partitions only relevant partitions right only relevant partition see the join condition how many join conditions are there let's go straight to the lab and let's see let's go straight to the lab and let's see let's go yeah here we go full analysis I'll give you the complete analysis Beyond this nothing is there in clustering H so I'm creating one Warehouse called demo Warehouse I'm creating one database called demo database I'm creating one schema called demo schema which you see it here right demo database demo schema and the warehouse I'm using is demo Warehouse medium I have created a file format CSV for loading the data sets so I have a sales table with date region product ID sale amount customer ID I have an inventory table I have a product table nowh clustering is applied see this no where clustering is applied I have a shipments table I have a customer table I have a transaction table I'm not going to complete detail of the table and now let's come I have an alert flock table and I have a clustering metric log where I will maintain all the log table name clustering Key average dep total partition scan overlap and the log timestamp when the clustering information was injested entire process I'm going to automate it entire process I'm not end date is not required surj type two Dimension type two Dimension you are maintaining a historical date end date is not required right when the membership was started you are taking a historical that's the reason you are taking a historical data that's the reason I'm taking not as ended only I'm taking as effective date when the membership got started now let's see suppose you have a sales table now you are filtering sales and inventory table by date region and product ID you see this see this forget about clustering now let's focus on this joint condition let's focus on this joint condition you see this joint condition what you doing you are taking sales table you are taking inventory table I'm taking an in joint with product table product ID I got one clustering G product ID now the sales data the sales data which I'm telling you for the demo purpose only has 100K records okay I'm not loading 1 million records just for the easy purpose but clustering will be applied only on 1 million records am I understand this sales it I'll show you 100K records okay 100k see this date region product ID sale amount and customer ID all the data sets has been given to you everything is here everything is here okay now understand first I'm joining on product ID I'm taking a date and I'm taking a region so on what basis you'll apply class stre if you are seeing more than million records in sale more than million records in inventory more than million records in product or just a million records in sales so I will apply clustering only on sales whichever has more than 1 million I'm again repeating do not this do this mistake I'm not teaching you from interview point of view this you will work in the industry later on then you'll thank me okay Anan s has taught me this nowhere in the history of YouTube no one will tell you this how clustering work in such great depth I'm teaching you straight away with realistic examples of Focus right sales sales has 1 million records say for example sales has 23 million records okay inventory has 200 K records product has 100 G on what on on which table you'll apply clustering first of all first of all is clustering required yes no is clustering required let me know in the chater is clustering required in this case sales has 23 million records inventory has 200 records uh product has 100K records is cluster ing required in this case yes no come on tell tell tell is clustering required in this case is clustering required in this case come on say yes no quickly in the chat shti is clustering required in this case is clustering required in this case yes no why David why why why why why the next question which which your data architect will ask why why why what should be your answer why tell me I've been repeatedly telling you from last 40 minutes why why it is required why it is required which table it is required will I do clustering on Sales inventory and product all three will I do clustering on all three Sales inventory and products will I do clustering on all three Sales inventory and products which table I'll do clustering and why clustering is required sales has 23 million records more than 1 million records huh which is my H so which table clustering will be required surj which table clustering will be required which table clustering will be recorded only sales only sales only sales if sales has 23 million records only sales what is the clustering key on sales what is the clustering key on sales answer is that on the screen what is the clustering key on sales what is the clustering key on sales what app is the clustering key on sales what is the clustering key on sales what is the clustering key on sales what is the clustering key on sales tell me what is the clustering key on sales come on come on come on what is the clustering key on sales what is the clustering key on sales what is the clustering key on sales what is the clustering key on sales say it away you seeing date the first of all is the date anything else anything else anything else anything else what the clustering key come on no all of you are wrong no answer is there on your screen what did I say how to choose the clustering key whichever you are using in the filter condition that you have to take a clustering key the first clustering key is the date the second clustering key is the product ID the third clustering key is the region as simple as that why it is taking so time for a very good work Manisha simple whichever you are using in the filter condition in the join condition I have used date I have used product ID and I have used region that's it you Define clustering on that you Define clustering on that simple suppose inventory has 1 million records inventory has 1.2 million records what is the clustering key for inventory what is the clustering key for inventory what is the clustering key for inventory now what is the clustering key for inventory date and product ID date and product ID product ID I'm taking it here and date of course date wherever date is there that will be there for sure short that will be there for date and product ID run this command simple you see the table after all you have seen data has gone to 23 million records client has reached out to you telling query is not optimized I'm not getting a report I'm not getting the proper data injection it's not happening proper just run one command alter table Sayes cluster by let's run this Bingo and now let's run this command and see the number of partition which it will scan it yes you have the query ID open the query ID open the query ID and do all the analysis open the query ID and do all the analysis right how many micro partition scan everything it will be show you here wait nothing no information as such for short query anything else they have shown nothing much easier let's run the next query and see what all analysis it gives historical data say for example this transaction let's understand the second table transaction huh transaction has 1 million record on what on What column you do the clustering on what column you will do the clustering RI on what column you'll do the clustering transaction transaction has 1 million records on what column you will do the clustering transaction date and customer ID sir you are using customer ID in the filter condition sir you're using date also right date also so alter table transaction cluster by transaction date and customer ID right frequently filter transaction by transaction date and customer ID that's why we are doing this that's why we are doing this done right you see the analysis here yes how many micro partitions can only one only one only one how much is a bite scan 0.7 MB how many partition only one only one because you clustered it so it's going into only that micro partition it's going just into that micro partition only one that's the beauty only one next geospatial analysis shipments are analyed by destination region and ship date see this I want to know I want to know my shipment dates my shipment dates see you order it on Flipkart and Amazon you want to know which is my first shipment which is my latest shipment how many shipments have been which is my region where my shipment is going on so which is the clustering you will be applying here if it is more than 1 million record your shipment date your shipment date and then region destination region not all column you have to be very very particular when you're applying you have to very very particular when you are applying right and you need to see the query sometimes you may take wrong clustering key so you have to optimize it you have to keep seeing which is the best clustering key based on the clustering depth I will tell you how you can do that number of partition scan is one okay let's come down to next large scale Dimension table say for example the customer table is frequently joined with the fact table and filters are applied to region and age group which you saw it here which you saw it here right which you saw it here two billion records so many filters I am applying here everything has a dimension provider right sub account dim product dim account if you see it here everything has been design here see here how team product has been designed provider product you are filtering it you are transforming column you're generating a primary key right this is the type two Dimension which I have designed then you renaming it and then you're doing the final view internally all this table first will get created you're see it here internally all these tables will get created look at this views look at this tremendous view how many views are running parall at a time right all these views are passing to the main pipeline internal delivery it will get triggered then it will come here then this will get triggered views and Views any there's a failure you'll get a message all this view whichever I have created fact table dimension table everything is here this is how you integrate it this is what a typical work of a ETL tool ETL data engineer right T data engineer everything I have designed you see the date also 15 July right now coming back to this clustering so whenever your data sets get changed you just need to Cluster it based on that particular key based on that particular key right customer see huh what is the clustering you will apply here what's the clustering you'll apply here region and age group that's it region and age group region there's no date here so region and age group region and age group right region and age group you'll apply you'll get all the details and everything right right now now the next important thing where I will check all the information whether my clustering is doing properly or not how do I check this how do I check this whether I'm getting the right clustering method from the snowflake side or not there's a uh query which you need to run from the clustering information this is system inbuilt function system dollar clustering information where you pass the complete table name and the clustering column it will tell you snowflake will tell you whether your clustering is happening properly or not clustering key columns contain High cardinality date product ID which might result in expensive reclustering that means snowflake is telling me an you have used quite many columns for the clustering you might have to reduce it right you might have to reduce it total partition count is one right average overlap there should not be any overlap more the number of overlap between the partition more the number of overlap between the partition more will be your query usage more will be your partition scan see there are so many overlap here 112 you see here there are so many overlap here 113 and 114 see the overlap here see the overlap here this is a very beautiful example which snowflake has given in the official documentation see the overlap see the overlap here so there should not be overlap it has to be the minimum overlap right it has but in the future how would I know which all columns will be using in a filter condition based on business logic Manisha business logic you be keep on changing you'll get a requirement from the business consultant business consultant will tell you you will see few metrics which is being added in the uh your particular table right as soon as your business grows based on that you will understand automatically if any new metrics is being added if you're using more filter condition if client want certain data based on the use case Manisha is totally dependent on the business use case right the only we'll go for that one sir what are we using and where Clause should be used and where Clause should be taking as where Clause sir what are we using and where CL no no no no in order to define the clustering key in order to define the clustering key you should filter you should see what all filter criteria you using it what all you are using in your joint condition what all you're using in your joint condition that is the most important context here that is the most important context here what is the joint condition here you see this you see this trans component basket key from time stamp so two I will use trans component basket key and from time stamp 46 million record sir you are having it okay 46 million having it I just write alter table alter table look at this beauty alter table where is that trans component where I have taken trans component yes it from time trans component B simple which ever you are taking it as a joint condition whichever you are using more filtering if you're working on terabytes of data if you're using filter filter multiple times in your powerbi report so during the data inest time you should specify that key as a clustering key simple logic simple log simple logic whichever you are using in your filter Clause whichever you are using in your join condition that is mandatory that is must if you have a datetime column that is must right that is must then again we have to do yes Manisha we have to do reclustering if any new columns has been applied then we have to do reclustering of course alter table again alter table alter table right to analyze query performance using the Query profile in snowl follow these steps so run a query go to the query ID see the Query profile what exactly it is doing it here you'll see all the information focus on the query ID open the query ID and see what all relevant information it is throwing it to you right how many part partion scan how much time it has taken how many overlap is there minimum number of overlap minimum clustering depth should be there minimum clustering depth should be there how do you calculate the clustering depth a very hot favorite interview question which I have just asked one of the candidate senior level candidate I was interviewing just so remember this formula right remember this formula how to calculate the clustering depth which I have shown you over why my screen is hanging H here it is number of micro partitions can divided by number of relevant rows how many rows count star total number of micro partition scan this you can get from the query information whenever you run a query you'll get this information number of micro partition which is being scanned from this Query profile right which is this partition scan is one here right because I've applied clustream okay I have applied class stream huh so this is the function I was telling you this is a fun function I was telling you system dollar sign clustering information you give the complete name and you specify the clustering key it will tell you whether the clustering whether you have applied is correct or not now I can pull all the relevant information code also I have given it to you code also I have given it to you now this is a Json file how do you pass Json I have taken a session lecture number 39 lecture number 39 how do you pass Json now intire thing it gives you a Json file so I'm creating a CT I'm creating a city where I'm passing this J and from this Jon I'm calculating the current time stamp at which I have this clustering now it may happen Manisha and others today you have applied clustering after six month again you reclustering it three years back someone had reclustering so where I'm storing this information I have to store somewhere so for that I will create a stored procedure for that I will create a stored procedure and Tas those who don't know about the stor procedure go through my lecture and read it I will create a store procedure I will create a table called clustering metric log clustering metrics log where I will have the information about the table name clustering Key average depth total partition scan how many overlap and the log time stamp everything information I will calculate it here and I will schedule it every five minutes just for your purpose I have scheduled before the class let me see if I have uh if if if I have some information to be shared with you all oh no R it might be there in the suspended state so I have created this procedure lock clustering metrics so whenever any clustering information will be added Whenever there will be alter table and we are adding clustering if automatically this will get triggered at that particular say once in a month I want to run this code once in a month so I will write this cron expression where I will schedule it every once in a month at the end of every month I want to see how many partitions has been inserted in that particular table have any change which has been maintained in that particular table right is there any change so what what I'm running I'm calculating the current time stamp table name clustering Keys what is the method what is the depth how many overlap and how many is the total partition count from this entire code code I'm not explaining because I've taught in the lecture number 39 how much is the let me run this and showcase it to you yes this is what the information is right this is what the information is you need to understand log time stamp log time stamp will give you the current time stamp which is the table name which is the clustering key which is the clustering method linear what is the average dep if the average depth is one it's better overlap should be zero hot favorite interview question what should be the overlap zero what should be the average depth one how many partition count lesser the number of this the more the Lesser it is the better your query optimization is highly optimized if it is one zero and one remember this number it's highly optimized fraction of second you will get a data if number of over then we have to do the analysis Manisha then we have to see how many filter conditions we are seeing it then you have to reach clustered it then you have to do then you have to check reclustering you have to do whether that particular column is giving if the average depth is coming 10 say for example average depth is coming 20 you have changed the number of order of columns say you're dealing with 50 columns right you have taken clustering on three column you added the four column average depth decreases by 14 again you added one more column five average dep decreases by seven so you have to see of course that again depends on of the business use case again as the business progresses what all schema we are changing it what all new columns we are bringing it's it's an evolving pattern it's an evolving pattern basically it won't evolve over a period of time it takes years right it takes years it takes minimum one year it's not like that business is often changing quite a lot once you have a table new column might be added age gender but these are not based on clustering I don't matter this we are not using it frequently right this we are not using it frequently basically date right your primary key your join condition which you're using in the filter filter CLA that you should be more focused so totally it depends on your business problem which you are targeting what powerbi report you need what powerbi report you need what metrix you are trying to find it right what metrics you are trying to find it based on that there's so many external factor which will be dependent on that which will be dependent on that for that you can do this right see this query performance metrix everything I have calculated here what all things I have calculated here let's see just now I was doing it let's see yes query username database name schema name which quer is costing me more what was the start time what was the end time how much time has been elapsed how many road has been produced who inserted the rows how many rows got updated how many partitions SC bites scan bites s every information I'm pulling it here every information I'm pulling it here right every information I'm pulling it here query performance metrics before clustering I'm uh I'm creating a view where I'm taking in the last seven days what all things has happened this date you can change whichever schema you want whichever table you want you feed that table it will give you all history geography biology everything it will give you about that table analyze it and then you decide which on what basis you have to decide your clustering key if you see the number of partition is in million definitely clustering these are the very small so clustering is not required on all this right because I'm doing on very small table if the number of data sets is 1 million more than 1 million then you should do it right you have a table called query history query running longer than 60 seconds so I want to first know who are my team member who is running query more than 60 seconds I want to know about my team member who which is the query ID who is the user who's that Rascal who's that guy who is that person who us using this what is the warehouse name what is the query text he's ring is his queries optimized or not everything I can pull the information from query history everything I can pull the information last 30 days information right so none of my query if I run this just now in front of you I run this none of my query none of my quer is running more than 60 seconds right every query is running less than 60 seconds if I change this right correct 60 seconds these are all 225 second how much is this close to 3 minutes right this is close to 2 minutes correct so all the quy which is running more than 60 second I will have it here so you have to analyze this you have to analyze this then from the automatic clustering history automatic clustering history this entire information what was the start date end date right how many total seconds your query ran what is the total amount you have spent after reclustering also you can entire information is available in the automatic clustering history you see this automatic clustering history which we see this here have a look resource usage select star from resource usage every information it will P you that's why in and out every code I have given you now if you do this on your own it will take you weeks time you'll get so many syntax error none of the code has any of the error you see the start time you see the end time total seconds hour minute second how much credit you and even the amount entire biog geography information you can pull on your table right whether the clustering is required not required how many data has been reclustered right it is showing zero because none has been reclustered I'm not running how much amount you have spent why $3.3 because no flex charges you 3.3 per credit per hour per credit per hour so whatever credit you have used multiply by 3.3 for your Enterprise addition so you'll get the total amount which you are spending as a USD right as a USD what are the actions which Manisha has asked to improve clustering how do you adjust the clustering key now if your business scenario changes you need to evaluate again you need to alter your table cluster by new call colum
Show moreGet more for account invoice for engineering
- General Contractor Proposal Software for Sales
- General Contractor Proposal Software for Support
- General Contractor Proposal Software for Accounting
- General Contractor Proposal Software for Research and Development
- General Contractor Proposal Software for Management
- General Contractor Proposal Software for Administration
- General Contractor Proposal Software for Customer Service
- General Contractor Proposal Software for Customer Support
Find out other account invoice for engineering
- Easily PDF sign and send online with airSlate SignNow
- Streamline your document management with Mac digital ...
- Add online signature to PDF free and streamline your ...
- Experience certified electronic signing of PDF with ...
- Easily apply a signature to PDF documents with airSlate ...
- Discover seamless access with Visme login
- Convert PDF to Word free without registration
- Discover the best contract signing platform for your ...
- Collect online autograph with airSlate SignNow for ...
- Discover cloud e-signing for documents that transforms ...
- Experience seamless digital agreements with smallpdf ...
- Attach a signature to a PDF for free with airSlate ...
- Unlock seamless signing with Google Workspace ...
- Easily validate a document electronically for seamless ...
- Simplify your documents with the Word signature section
- Easily sign your Excel file with airSlate SignNow
- Easily attach signature to PDF free online with ...
- Streamline your Google Workspace document signing with ...
- Applying signature to PDF made effortless with airSlate ...
- Easily add a virtual signature in Microsoft Word