Inside Out and Back Again Unit Test Inside Out and Back Again Final Test
How to take futile testing data and make information technology into its biggest nugget for developer productivity.
The act of staging either in plays, real estate, or software development has been an manufacture practice for many decades. The staging methods accept evolved over the years and continue to play a critical role in its associated fields. When it comes to software development, staging is the environment where software is tested end-to-stop before information technology reaches customers. It is a near-exact replica of a production environs, requiring a copy of the aforementioned configurations of hardware, servers, and databases — basically the whole ix yards — but on a smaller calibration.
All the same, many organizations have been facing bug with the quality of data available in test environments in recent years. At eBay, we faced a similar challenge. Our staging data was unusable. It lacked quality and, equally a result, the associated quantity. Prolonged misuse resulted in the data being futile and prosaic. Yes, there were terabytes of data, but they were not relevant for nearly test cases — an example beneath from screenshots taken in early 2020.
All organizations have a software release pipeline. The goal of a pipeline is to ensure that when the software eventually reaches production, it is thoroughly tested and ready to serve customers. A vital component of that is integration testing. We demand to ensure that the arrangement works end-to-terminate in all possible scenarios with zero compromises. Developers should non accept any impediments in writing and running these integration examination suites. To practice this, we need high-quality exam data. However, the abiding struggle to keep them up to date has raised the question, "Do nosotros really need to exam in staging?" We explored this question in detail and tried to answer them in a two-part series titled "The Staging Dichotomy." Coming dorsum to the data problem, let u.s. look into the various options available if we decide to skip staging.
Options
One choice would exist to create a separate zone in production that is not exposed to the public and is open only to internal eBay traffic. Developers tin can deploy the approachable software in this zone and run their entire suite of integration test cases before deploying to production. Nosotros indeed have a zone in eBay like this, and information technology is called pre-production. The issue hither, though, is that the information source behind pre-product is the same equally production. This means all your exam information creation and mutation happen alongside production data. When we tried this in the by, it concluded up beingness an analytics nightmare, where the continuous runs skewed production metrics. Creating a "exam" versus "customer" metrics dimension helped a footling. Nonetheless, the data corruption ran deep into production databases and became a real issue. Even with data teardown existence role of the examination suites, the massive scale of integration tests run continuously across the entire marketplace can flip the production data store into an egregious land.
Another choice is to build sufficient confidence in our unit and functional testing and straight test in product with actual users.
Context Matters
Both of the higher up approaches accept a key limitation, and this is where context matters. eBay is an e-commerce platform. Transactions are essential to any we practice. Furthermore, when there are transactions, in that location are payments involved. We are talking about actual items, transacting between genuine sellers and buyers with real money. The margin of error has to be minuscule. It is just not possible to execute all your exam cases in product. Even if nosotros get-go with a tiny amount of traffic, nosotros demand to ensure that all the dependent services work harmoniously to proceed the transactions accurate. These services are too chop-chop changing. The supposition that they will just work when put together in production is not worth the risk. Especially when payment is involved, even in the smallest quantities.
These conditions apply to the majority of eBay use cases. Others may not come across this equally a limitation. They tin skip staging, straight deploy to product as a canary and ramp upwards traffic. The whole period can work seamlessly. Fifty-fifty within eBay, few domains follow this model of bypassing staging and direct canary testing in production. Notwithstanding, they are restricted to read-merely utilize cases. The rest still demand staging to build confidence.
We reached consonance that a staging environs is indeed needed. Merely how exercise we fix the data problem?
Data
A common and well-established idea proposed to address data issues is to create quality data in large quantities before executing the test cases and tear them down in one case done. Most organizations accept well-divers APIs to create data; why not leverage them? In reality, though, this is easier said than done.
Again, context matters hither. It is most impossible at eBay to create the millions of permutations and combinations of listings required to execute thousands of test cases beyond the marketplace. Yous can create monotonous data in large quantities. Notwithstanding, creating a listing with multiple SKUs — with each SKU having a valid epitome, reserve price, 30-twenty-four hour period return policy, and an international shipping offer with an immediate payment choice — tin rapidly get out of hand. There is no straightforward API to create listings similar this, and nosotros need them to automate many of our Priority one (P1) use cases.
We take tried this many times in the past, and it did non work.At that place has to exist a better way. Nosotros had to look at it from a unlike perspective. An idea emerged, which now, in retrospect, seems quite obvious.
"Have a subset of production data and move it to staging in a privacy-preserving manner."
eBay has 1.5 billion listings in production. But a tiny (0.1%) subset of the listings, along with its dependency graph, should be sufficient to execute all the exam cases confidently. We take to make sure that the subset is well-distributed to cover the breadth of eBay inventory. The production criteria naturally yield themselves to high-quality data. Merely the most of import thing to us was privacy.
At eBay, we accept privacy very seriously. It has been our cadre pillar since the very beginning. Fortunately for united states of america, a listing and most of its associated attributes are public data. The seller and buyer's Personal Identifiable Data (PII), along with a few particular aspects like reserved toll, max bid price, etc., have to be anonymized and privacy preserved. To build this pipeline, we partnered with a privacy company Tonic.ai who exactly does the same.
At a high level, our pipeline looks like this.
The boxes labeled "Tonic" were developed by the Tonic team. Though many boxes appear in the pipeline, simply a few components are vital to the workflow.
Subsetting
At eBay, everything starts with a listing. The goal of subsetting is ii fold — identify the list IDs that are required to execute all our test cases and plot a course to fetch all the required and auxiliary information associated with those listings. To begin with, we took one domain (item page) and extracted all the regression test cases necessary to certify a release to production confidently. It included even the rare and complex data scenarios. From those examination cases, we formulated a set of SQL queries that ran against our Hadoop clusters. The queries included listings from all sites and beyond all categories based on hundreds of item and user flags. The concluding output is a listing of unique listing IDs that specifically target the domain test cases.
The to a higher place-extracted item IDs are fed equally input to the subsetter. The job of the subsetter is to plot a course, starting with the main item tabular array. To practise that, we topologically suit the tables to map their dependencies. Side by side, using the curated IDs equally queries, the algorithm goes upstream of the target table to fetch the optional auxiliary data (e.one thousand., bids of an detail. It is optional considering an item tin can be with no bids), followed by going downstream to fetch all required data (these are mandatory, e.chiliad., bids must take an particular). Once all downstream requirements are satisfied, the subsetting is complete. We phone call this referential integrity.
Anonymization
Once a fix of production tables is identified from which data will be copied, the workflow alerts our information security and privacy teams, and the pipeline is halted. It is a deliberate footstep to ensure that none of the data leaves the production zone without the review and approval of our security and compliance systems. It merely happens when a new table is recognized or an existing table is modified. So our daily runs (explanation comes beneath), configured only with previously approved tables, are mostly uninterrupted. There are a set of PII-related columns within a table that are past default flagged to be anonymized.
The get-go step for our information security team is to get over them and flag more columns as appropriate. They take their ain set up of criteria based on international compliance rules and policies, which by but looking at the data may not seem obvious. This process flagged approximately 27% of our columns.
The second step is to take a sample of anonymized information and verify if the standards are met. The data security squad's process is a mix of both automation and manual verification. Since the process is triggered but for new tables, it was not a hindrance. Establishing this tight feedback loop and stopping when in dubiousness setup helped u.s. ensure that privacy is always preserved.
The technical novelty of the anonymization is important to highlight. We cannot apply some random encryption to anonymize the information. We need Format-Preserving Encryption (FPE), where information in one domain maps to the same domain after encryption, and information technology should not be reversible (due east.thousand., encrypting a 16-digit credit card number yields another 16-digit number). In eBay'south context, this becomes very critical; or else, well-nigh exam cases would fail. Using Feistel network and cycle walking, we can create a bijection between any domain and itself, e.one thousand. the domain of 16 digit credit card numbers.
Merging and Mail Processing
The anonymized information moves from the product zone to the staging zone adhering to all our firewall protocols. Now comes the merger, whose master responsibility is to insert the subsetted anonymized production data into the corresponding staging tables. In actual implementation, there is much more nuance to it. For example, remapping previously migrated sellers to their new items is a complex and costly endeavor. A expert side event of the merger is that it helps identify schema differences between staging and production tables, which did exist due to prolonged staging misuse.
The pipeline does non stop at the merger. There is i more of import pace, which we call the "Postdump Processor." Once the data is inserted into the staging tables, this component fires a series of events. The goal is to orchestrate a sequence of jobs to penetrate and normalize the newly migrated information throughout the staging ecosystem. The Postdump Processor includes tasks similar notifying the search engine to index the new listings; mapping items to existing products; uploading listing images to staging servers and updating endpoints; using staging salt to hash user credentials; and a few more. We piggybacked well-nigh of the existing async events triggered when an item is listed on eBay. A few new ones were created just for the pipeline utilize cases. This mail service-processing step is what makes the data relevant.
Discovery and Feedback Loop
Now that high-quality data was made available in staging, a style to exclusively query them for all automation needs became paramount. We take existing APIs to fetch items, users, orders, transactions, etc. However, all of them were built with a customer and business intent in mind and not how developers or quality engineers would use them in their automation scripts. Just similar the difficulties of using existing APIs for data cosmos, in that location is no straightforward manner, for example, to fetch a bunch of items that have more than 10 SKUs and 40 images. It becomes an arduous process. To solve this, we created a Discovery API and UI tool (codenamed Serendipity), which makes it seamless to integrate with all automation scripts. The API only queries the migrated data that are watermarked with a special flag during migration. The filters in the API are targeted toward how engineers write examination cases without worrying about entity relationships or microservice decoupling.
The final aspect of the pipeline is to create a healthy feedback loop between its two ends — production ID curation and staging discovery. The way nosotros achieved it is by adding observability to the discovery API. When a fetch returns null or low results, it immediately signals the dataset curation organization to migrate those items in the following pipeline run by executing the corresponding production SQLs. Similarly, when new production requirements come in, developers tin request those filters in discovery, which translates to SQLs on the curation side. A cocky-serving pipeline, indeed.
Expansion
What started as a proof of concept with 1 domain, 11 tables, and a few thousand items has expanded to the whole marketplace. Today, nosotros accept over a million high-quality listings in staging, forth with its associated upstream/downstream dependencies. They serve the automation needs of a majority of our applications. Every day, 25,000items/orders are migrated from production to staging, and the information is spread across 200+ tables, 7,000 categories, and 20 unlike DB hosts. Beginning this yr, nosotros expanded the pipeline to NoSQL databases. This includes MongoDB, Cassandra, Couchbase and eBay's open-sourced NoSQL offering Nudata. The pipeline architecture is the same for NoSQL, with the curated listing IDs used equally keys for subsetting.
The pipelines themselves are parallelized at a macro-level — multiple pipelines running on defended machines, creating redundancy on failures, and at a micro-level — each component is multi-threaded when possible to execute faster. The pipeline runs every four hours and, on average, takes 65 minutes finish-to-end. We accept dedicated pipelines for migrating new tables, so it does not affect daily runs. Purging happens on item expiry, like to production. There are also daily purge jobs to clean up the auxiliary data.
Not all test cases will be covered by data migration. There are use cases for which data creation is required. They are transient data primarily associated with users. Migrating data for these scenarios will exist an unnecessary overhead. For this, we have created a new tool called the data creation platform, which again seamlessly integrates with automation scripts.
Virtuous Cycle
The presence of high-quality test data enabled us to create a virtuous flywheel. Now that awarding teams were empowered with expert data, we ready a high staging availability goal for them to maintain. This created a cycle that is continuously improving.
As outlined in the diagram above, the 3 edifice blocks of the staging flywheel keep each other in bank check and strive for comeback. The loftier staging availability goal becomes a motivation to keep staging infra reliable and stable. Subsequently, for infra to be fully useful, information technology requires quality information. Which again becomes a motivation for the data pipeline to exist on top of information technology. And with high-quality data, functional availability is even further improved. The cycle continues.
Wrapping Up
Today, xc% of all automated integration testing happens on staging. The pass rate is at 95%, compared to only 70% in 2020. Flaky tests are a big frustration point in software development. Even a minor improvement can have a multiplier consequence, and we saw that with the jump in the laissez passer rate. Also, in the past, teams pushed lawmaking to product with less conviction and executed a considerable number of sanity tests directly in product to validate functionality. Now that reluctance is gone. But around 5% sanity testing happens on production, with the rest condign integration tests executed on staging. Speaking virtually release velocity, nosotros reduced our native app (iOS and Android) release cycles from 3 weeks to one week, and staging was a fundamental enabler in achieving that.
It has been more than a year's journey to get to where we are today. More than that, information technology has opened endless possibilities, and that is what keeps united states excited. The staging data team'southward commitment is to continue evolving the virtuous cycle and, in that process, discover more opportunities to ameliorate developer productivity.
Nosotros at present believe that we take plant a stable solution to the test data trouble, and the bear on has been profound.
About the Author
Senthil Padmanabhan is a Vice President, Technical Fellow at eBay, where he heads the user experience and programmer productivity applied science across eBay'due south marketplace. Since joining eBay, Senthil has led several critical initiatives that were transformational to eBay'southward technology platform. He is the creator and implementer of many libraries, UI systems, and frameworks used throughout the eBay lawmaking base of operations. His leadership has paved the way for rapid software development and innovation across the organization, and he was recognized equally i of the 2022 honorees for Silicon Valley Business Journal's 40 nether twoscore.
Sign upwards for the free insideBIGDATA newsletter.
Bring together united states on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Source: https://insidebigdata.com/2022/04/15/solving-the-staging-data-problem/
0 Response to "Inside Out and Back Again Unit Test Inside Out and Back Again Final Test"
Post a Comment