bitpie安卓最新版下载|dadi
GitHub - containerd/accelerated-container-image: A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
GitHub - containerd/accelerated-container-image: A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
Skip to content
Toggle navigation
Sign in
Product
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
For
Enterprise
Teams
Startups
Education
By Solution
CI/CD & Automation
DevOps
DevSecOps
Resources
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
containerd
/
accelerated-container-image
Public
Notifications
Fork
67
Star
368
A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
License
Apache-2.0 license
368
stars
67
forks
Branches
Tags
Activity
Star
Notifications
Code
Issues
27
Pull requests
2
Discussions
Actions
Projects
0
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights
containerd/accelerated-container-image
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mainBranchesTagsGo to fileCodeFolders and filesNameNameLast commit messageLast commit dateLatest commit History360 Commits.github.github cici cmdcmd docsdocs pkgpkg scriptscript .gitignore.gitignore .golangci.yml.golangci.yml INSTALL.shINSTALL.sh LICENSELICENSE MAINTAINERSMAINTAINERS MakefileMakefile README.mdREADME.md go.modgo.mod go.sumgo.sum nfpm.yamlnfpm.yaml View all filesRepository files navigationREADMEApache-2.0 licenseSecurityAccelerated Container Image
Accelerated Container Image is an open-source implementation of paper "DADI: Block-Level Image Service for Agile and Elastic Application Deployment. USENIX ATC'20".
DADI (Data Accelerator for Disaggregated Infrastructure) is a solution for container acceleration including remote image and other features which has been widely used in Alibaba and Alibaba Cloud. By now, it has been already integrated by Alibaba Cloud Registry (ACR), and Alibaba serverless services (FC FaaSNet. USENIX ATC'21 / SAE / ECI, etc) which enter the Forrester leader quadrant.
At the heart of the acceleration is overlaybd, which is a new remote image format based on block device. Overlaybd backstore provides a merged view of a sequence of block-based layers in userspace and outputs as a virtual blocks device through TCMU.
It can be used for container acceleration by supporting fetching image data on-demand without downloading and unpacking the whole image before a container running. With overlaybd image format, we can cold start a container instantly.
The key features are:
High Performance
It's a block-device-based storage of OCI image, which has much lower complexity than filesystem-based implementations. For example, cross-layer hardlink and non-copy commands like chown are very complex for filesystem-based image without copying up, but is natively supported by overlaybd. Overlaybd outperforms filesystem-based solutions in performance. Evaluation data is stated in DADI paper.
High Reliability
Overlaybd outputs virtual block devices through TCMU, which is widely used and supported in most operation systems. Overlaybd backstore can recover from failures or crashes, which is difficult for FUSE-based image formats.
Native Support for Writable
Overlaybd can be used as writable/container layer. It can be used as container layer for runtime instead of overlayfs upper layer, or used to build overlaybd images.
Multiple File System Supported
Overlaybd outputs virtual block devices, which is supported to be formatted by multiple file system. It's convenient for user to choose ideal file system.
Accelerated Container Image is a non-core sub-project of containerd.
Components
overlaybd - Native
Overlaybd provides a merged view of block-based layer sequence as an virtual block device in user space.
overlaybd-snapshotter
It is a containerd snapshotter plugin for overlaybd image. This snapshotter is compatible for OCI image, as well as overlayfs snapshotter.
embedded image-convertor
We provide a modified CLI tool(ctr) to facilitate image pull, and custom conversion from traditional OCI tarball format to overlaybd format.
The convertor supports layer deduplication, which prevents duplication of layer conversion for every image conversion.
standalone userspace image-convertor
Standalone userspace image-convertor has similar functionality to embedded image-convertor but runs in the userspace. It does not require root privilege and dependence on tcmu, configfs, snapshotter, or even on containerd. which makes it much more convenient to run in a container.
What's more, standalone userspace image-convertor is faster than embedded image-convertor when used with our customized libext2fs. See USERSPACE_CONVERTOR for more details.
buildkit for overlaybd (Experimental)
It is a customized buildkit for overlaybd images. It fetches the data of base images on demand without pulling whole data and uses overlaybd writable layer to build new layers.
overlaybd - turboOCIv1
It is an overlaybd-based remote image format which enables the original OCI image to be a remote one without conversion. It is similar to SOCI, but provides block device interface, which has advantages than FUSE-based formats in performance and stability.
Getting Started
QUICKSTART helps quickly run an overlaybd image including basic usage.
See how to setup overlaybd backstore at README.
See how to build snaphshotter and ctr plugin components at BUILDING.
After build or install, see our EXAMPLES about how to run an accelerated container. see EXAMPLES_CRI if you run containers by k8s/cri.
See the PERFORMANCE test about the acceleration.
See how to convert OCI image into overlaybd with specified file system at MULTI_FS_SUPPORT.
See how to use layer deduplication for image conversion at IMAGE_CONVERTOR.
See how to use overlaybd writable layer at WRITABLE.
See how to use Prometheus to monitor metrics like latency/error count of snapshotter GRPC APIs at PROMETHEUS.
See how to use TurboOCIv1 at TurboOCIv1.
Welcome to contribute! CONTRIBUTING
Release Version Support
There will be an annotation containerd.io/snapshot/overlaybd/version in the manifest of the converted image to specify the format version, following is the
overlaybd release version required by them.
0.1.0: for now, all release versions of overlaybd support this.
0.1.0-turbo.ociv1: overlaybd >= v0.6.10
Overview
With OCI image spec, an image layer blob is saved as a tarball on the registry, describing the changeset based on it's previous layer. However, tarball is not designed to be seekable and random access is not supported. Complete downloading of all blobs is always necessary before bringing up a container.
An overlaybd blob is a collection of modified data blocks under the filesystem and corresponding to the files added, modified or deleted by the layer. The overlaybd backstore is used to provide the merged view of layers and provides a virtual block device. Filesystem is mounted on top of the device and an overlaybd blob can be accessed randomly and supports on-demond reading natively.
The raw data of block differences, together with an index to the raw data, constitute the overlaybd blob. When attaching and mounting an overlaybd device, only indexes of each layer are loaded from remote, and stored in memory. For data reading, overlaybd performs a range lookup in the index to find out where in the blob to read and then performs a remote fetching. That blob is in Zfile format.
Zfile is a new compression file format to support seekable decompression, which can reduce storage and transmission costs. And also the checksum information to protect against data corruptions for on-demand reading is stored in Zfile. In order to be compatible with existing registries and container engines, Zfile is wrapped by a tar file, which has only one Zfile inside.
Overlaybd connects with applications through a filesystem mounted on an virtual block device. Overlaybd is agnostic to the choice of filesystem so users can select one that best fits their needs. I/O requests go from applications to a regular filesystem such as ext4. From there they go to the loopback device (through TCM_loopback) and then to the user space overlaybd backstore (through TCMU). Backend read operations are always on layer files. Some of the layer files may have already been downloaded, so these reads would hit local filesystem. Other reads will be directed to registry, or hit the registry cache. Write and trim operations are handled by overlaybd backstore which writes the data and index files of the writable layer to the local file system. For more details, see the paper.
Communication
For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.
For sync communication catch us in the #overlaybd slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF slack.
Licenses
Accelerated Container Image is released under the Apache License, Version 2.0.
About
A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
Topics
containers
docker-image
containerd
container-images
remote-image
containerd-snapshotter
overlaybd
Resources
Readme
License
Apache-2.0 license
Security policy
Security policy
Activity
Custom properties
Stars
368
stars
Watchers
18
watching
Forks
67
forks
Report repository
Releases
19
v1.0.3
Latest
Feb 27, 2024
+ 18 releases
Packages
2
Contributors
22
+ 8 contributors
Languages
Go
93.4%
Shell
3.7%
Dockerfile
1.9%
Other
1.0%
Footer
© 2024 GitHub, Inc.
Footer navigation
Terms
Privacy
Security
Status
Docs
Contact
Manage cookies
Do not share my personal information
You can’t perform that action at this time.
GitHub - dportik/dadi_pipeline: An accessible and flexible tool for fitting demographic models with dadi using custom or published models (available here), conducting goodness of fit tests, and plotting.
GitHub - dportik/dadi_pipeline: An accessible and flexible tool for fitting demographic models with dadi using custom or published models (available here), conducting goodness of fit tests, and plotting.
Skip to content
Toggle navigation
Sign in
Product
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
For
Enterprise
Teams
Startups
Education
By Solution
CI/CD & Automation
DevOps
DevSecOps
Resources
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
dportik
/
dadi_pipeline
Public
Notifications
Fork
31
Star
61
An accessible and flexible tool for fitting demographic models with dadi using custom or published models (available here), conducting goodness of fit tests, and plotting.
License
LGPL-3.0 license
61
stars
31
forks
Branches
Tags
Activity
Star
Notifications
Code
Issues
0
Pull requests
0
Actions
Projects
1
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Security
Insights
dportik/dadi_pipeline
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
masterBranchesTagsGo to fileCodeFolders and filesNameNameLast commit messageLast commit dateLatest commit History286 CommitsExample_DataExample_Data Find-Best-ProjectionsFind-Best-Projections Goodness_of_FitGoodness_of_Fit PlottingPlotting Three_Population_PipelineThree_Population_Pipeline Two_Population_PipelineTwo_Population_Pipeline .gitignore.gitignore LICENSELICENSE Optimize_Functions.pyOptimize_Functions.py README.mdREADME.md dadi_Run_Optimizations.pydadi_Run_Optimizations.py logo.pnglogo.png View all filesRepository files navigationREADMELGPL-3.0 license
Overview of Page Contents
Purpose
Version
Dadi Pipeline Overview
Optimization Routine Overview
Examples of Usage
Test Data
Outputs
Designating Folded vs. Unfolded Spectra
Default Optimization Routine Settings
Why Perform Multiple Rounds of Optimizations?
My Analysis Crashed! What Now?
Reporting Bugs/Errors
Caveats
Citation Information
Publications
License
Contact
Purpose
Perform demographic model optimizations and comparisons with this accessible and flexible tool called dadi_pipeline.
This tool is designed to work with the Python package dadi and assumes you already have the package installed. You'll need to be familiar with how dadi works, and some of the basic syntax for writing dadi scripts with python. A good resource for all dadi-related questions is the dadi user group. Before attempting to use these scripts, read over the user manual for dadi and try running the program with the example files.
The most current version of this pipeline is designed to run in Python 3 and requires the Python 3 version of dadi (tested for v2.1+). You should use the most current versions of dadi and dadi_pipeline, and run them using Python 3.
Version
The current version of dadi_pipeline is v3.1.6, and it has been tested with dadi v2.1+. Please ensure you have the latest versions of both dadi_pipeline and dadi installed.
What's new in version 3.1+?
Version 3.1.6:
Added dadi-test-projections.py module to examine the number of segregating sites resulting from all combinations of projection sites: Find-Best-Projections.
Version 3.1.5:
Added 15 new 3D models that were developed for Firneno et al. 2020. Included is a new class of models involving an admixed (or "hybrid") origin for the third population. All new models are fully integrated in the Three_Population_Pipeline.
Version 3.1.4:
Added option to select which optimizer to use. Includes the following functions from the Inference.py module in dadi: optimize_log (BFGS method), optimize_log_lbfgsb (L-BFGS-B method), optimize_log_fmin (Nelder-Mead method), and optimize_log_powell (Powell's method). For usage details please see section: Default Optimization Routine Settings.
Version 3.1.2:
Issue causing crash when referencing optimized params list is fixed. Compatible with dadi v2.0.5.
Version 3.1.1:
The 2D island model set has been revised and updated, fixing issues with parameter assignment.
Version 3.1.0:
All scripts have been upgraded to Python 3 (tested with 3.7), allowing compatibility with dadi v2.0.
Prints parameter labels and perturbed starting parameters to screen for each replicate. This allows quick visual comparisons of the initial and final optimized parameters among replicates.
Dadi Pipeline Overview
In this main repository of dadi_pipeline is a general use script (dadi_Run_Optimizations.py) that can be used to run dadi to fit any model on an allele frequency spectrum/joint-site frequency spectrum containing one to three populations. This script will perform a general optimization routine proposed by Portik et al. (2017) and will produce associated output files. To use this workflow, you'll need a SNPs input text file to create an allele frequency or joint site frequency spectrum object. Alternatively, you can import a frequency spectrum of your own creation, editing the script appropriately (see dadi manual). The user will have to edit information about their allele frequency spectrum, and a #************** marks lines in the dadi_Run_Optimizations.py that will have to be edited. Any custom model can be used, and below are several examples of how to use various arguments to control the model optimizations.
The dadi_Run_Optimizations.py script and Optimize_Functions.py script must be in the same working directory to run properly.
If you'd like to use the optimization routine of this script to analyze larger sets of published 2D or 3D models, please look in the nested repositories (Two_Population_Pipeline, Three_Population_Pipeline). These are essentially modified versions of the dadi_Run_Optimizations.py script that are designed to perform the optimization routine across the available 2D or 3D models.
There are a considerable number of 2D models that can be selected from, and many 3D models too. A visual depiction of these models can be found in the Models_2D.pdf and the Models_3D.pdf files.
If you'd like to assess the goodness of fit for your demographic model, please look in the Goodness_of_Fit repository.
If you'd like to create a figure comparing the empirical SFS and model SFS for a demographic model (with residuals), please look in the Plotting repository.
For information on how to cite dadi_pipeline, please see the Citation section at the bottom of this page.
Optimization Routine Overview
The dadi_Run_Optimizations.py and associated 2D and 3D population pipelines are components of dadi_pipeline that each were designed to implement the optimization routine proposed by Portik et al. (2017). This optimization routine includes fitting the model using particular settings for a given number of replicates, then using the parameters from the best scoring replicate to seed a subsequent round of model fitting using updated settings. This process occurs across multiple rounds, which improves the log-likelihood scores and generally results in convergence in the final round.
In the dadi_Run_Optimizations.py script, the optimization routine contains a user-defined number of rounds, each with a user-defined or default number of replicates. The starting parameters are initially random, but after each round is complete the parameters of the best scoring replicate from that round are used to generate perturbed starting parameters for the replicates of the subsequent round. The arguments controlling steps of the optimization algorithm (maxiter) and perturbation of starting parameters (fold) can be supplied by the user for more control across rounds. The user can also supply their own set of initial parameters, or set custom bounds on the parameters (upper_bound and lower_bound) to meet specific model needs. As of v3.1.3, the user can also choose which optimizer to use. This flexibility should allow these scripts to be generally useful for fitting any model to any data set.
Examples of Usage
Let's assume you've supplied the correct information about your SNPs input file, population IDs, projection sizes, and are using the model in the script (sym_mig).
I will show several ways to use the main function for model fitting to highlight different options.
We will use always use the following function from the Optimize_Functions.py script, which requires some explanation:
Optimize_Routine(fs, pts, outfile, model_name, func, rounds, param_number, fs_folded, reps=None, maxiters=None, folds=None, in_params=None, in_upper=None, in_lower=None, param_labels=None, optimizer="log_fmin")
Mandatory Arguments:
fs: spectrum object name
pts: grid size for extrapolation, list of three values
outfile: prefix for output naming
model_name: a name to help label the output files - ex. "no_mig"
func: access the model function from within dadi_Run_Optimizations.py or from a separate python model script, ex. after importing Models_2D, calling Models_2D.no_mig
rounds: number of optimization rounds to perform
param_number: number of parameters in the model selected (can count in params line for the model)
fs_folded: A Boolean value indicating whether the empirical fs is folded (True) or not (False)
Optional Arguments:
reps: a list of integers controlling the number of replicates in each of the optimization rounds
maxiters: a list of integers controlling the maxiter argument in each of the optimization rounds
folds: a list of integers controlling the fold argument when perturbing input parameter values
in_params: a list of parameter values
in_upper: a list of upper bound values
in_lower: a list of lower bound values
param_labels: list of labels for parameters that will be written to the output file to keep track of their order
optimizer: a string, to select the optimizer. Choices include: "log" (BFGS method), "log_lbfgsb" (L-BFGS-B method), "log_fmin" (Nelder-Mead method; the default), and "log_powell" (Powell's method).
The mandatory arguments must always be included when using the Optimize_Routine function, and the arguments must be provided in the exact order listed above (also known as positional arguments). The optional arguments can be included in any order after the required arguments, and are referred to by their name, followed by an equal sign, followed by a value (example: reps = 4). The usage is explained in the following examples.
Example 1
Let's use the function to run an optimization routine for our data and this model.
We always need to specify the eight required arguments (in order) in this function, but there are other options
we can also use if we wanted more control over the optimization scheme. We'll start with
the basic version here. The argument explanations are above. This would perform three
rounds of optimizations, using a default number of replicates for each round. For details on these defaults, please see the Default Optimization Routine Settings section. At the end of each round, the parameters of the best-scoring replicate are used to generate the starting parameters for the replicates of the next round, and so on. This will help focus the parameter search space as the rounds continue.
#create a prefix to label the output files
prefix = "V1"
#make sure to define your extrapolation grid size
pts = [50,60,70]
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True)
Example 2
It is a good idea to include the labels of the parameters so they can get written to the
output file, otherwise you'll have to go back to the model each time you wanted to see their order. Notice that the optional arguments require using the = sign to assign a variable or value to the argument.
prefix = "V2"
pts = [50,60,70]
#here are the labels, given as a string
p_labels = "nu1, nu2, m, T"
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, param_labels = p_labels)
Example 3
Here is the same example but also including your own custom parameter bounds. Notice
the optional arguments can be placed in any order following the mandatory arguments.
prefix = "V3"
pts = [50,60,70]
p_labels = "nu1, nu2, m, T"
#Here are the custom bounds
upper = [20,20,10,15]
lower = [0.01,0.01,0.01,0.1]
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, param_labels = p_labels, in_upper = upper, in_lower = lower)
Example 4
You can also be very explicit about the optimization routine, controlling what happens
across each round. Let's keep the three rounds, but change the number of replicates,
the maxiter argument, and fold argument each time. We'll need to create a list of values
for each of these, that has three values within (to match three rounds).
prefix = "V4"
pts = [50,60,70]
p_labels = "nu1, nu2, m, T"
upper = [20,20,10,15]
lower = [0.01,0.01,0.01,0.1]
#Here are the optional arguments
reps = [10,20,50]
maxiters = [5,10,20]
folds = [3,2,1]
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, param_labels = p_labels, in_upper=upper, in_lower=lower, reps = reps, maxiters = maxiters, folds = folds)
Using these arguments will cause round one to have 10 replicates, use 3-fold perturbed
starting parameters, and a maxiter of 5 for the optimization algorithm steps. Round two
will have 20 replicates, use 2-fold perturbed starting parameters, and a maxiter of 10
for the optimization algorithm steps, and etc. for round three.
Example 5
It's also good run the optimization routine multiple times. Let's write a short
loop to do the above optimization routine five times. We will name the prefix based
on which point we are at, and include it within the looping. Note that when you use
the range argument in python it will go up to, but not include, the final number.
That's why I have written a range of 1-6 to perform this 5 times.
pts = [50,60,70]
p_labels = "nu1, nu2, m, T"
upper = [20,20,10,15]
lower = [0.01,0.01,0.01,0.1]
reps = [10,20,50]
maxiters = [5,10,20]
folds = [3,2,1]
for i in range(1,6):
prefix = "V5_Number_{}".format(i)
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, param_labels = p_labels, in_upper=upper, in_lower=lower, reps = reps, maxiters = maxiters, folds = folds)
Test Data
In the folder labeled Example_Data you will find a SNPs input file that will run with the dadi_Run_Optimizations.py script.
You will only need to edit the path to the file in the script, and then you will be able to run all five examples above. The
outputs for these examples are also contained within the Example_Data folder, in a separate folder labeled Example_Outputs.
Please test the script using these data to ensure everything is working properly before examining your own empirical data.
Outputs
For each model run, there will be a log file showing the optimization steps per replicate and a summary file that has all the important information.
Here is an example of the output from a summary file, which will be in tab-delimited format:
Model Replicate log-likelihood AIC chi-squared theta optimized_params(nu1, nu2, m, T)
sym_mig Round_1_Replicate_1 -1684.99 3377.98 14628.4 383.04 0.2356,0.5311,0.8302,0.182
sym_mig Round_1_Replicate_2 -2255.47 4518.94 68948.93 478.71 0.3972,0.2322,2.6093,0.611
sym_mig Round_1_Replicate_3 -2837.96 5683.92 231032.51 718.25 0.1078,0.3932,4.2544,2.9936
sym_mig Round_1_Replicate_4 -4262.29 8532.58 8907386.55 288.05 0.3689,0.8892,3.0951,2.8496
sym_mig Round_1_Replicate_5 -4474.86 8957.72 13029301.84 188.94 2.9248,1.9986,0.2484,0.3688
Designating Folded vs. Unfolded Spectra
To change whether the frequency spectrum is folded vs. unfolded requires two changes in the script. The first is where the spectrum object is created, indicated by the polarized argument:
#Convert this dictionary into folded AFS object
#[polarized = False] creates folded spectrum object
fs = dadi.Spectrum.from_data_dict(dd, pop_ids=pop_ids, projections = proj, polarized = False)
The above code will create a folded spectrum. When calling the optimization function, this must also be indicated in the fs_folded argument:
#this is from the first example:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True)
To create an unfolded spectrum, the polarized and fs_folded arguments in the above lines need to be changed accordingly:
#[polarized = True] creates an unfolded spectrum object
fs = dadi.Spectrum.from_data_dict(dd, pop_ids=pop_ids, projections = proj, polarized = True)
#and the optimization routine function must also be changed:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=False)
It will be clear if either argument has been misspecified because the calculation of certain statistics will cause a crash with the following error:
ValueError: Cannot operate with a folded Spectrum and an unfolded one.
If you see this, check to make sure both relevant arguments actually agree on the spectrum being folded or unfolded.
Default Optimization Settings
The optimization routine arguments offer a lot of flexibility, but the default settings can also be used. If only
the number of rounds is changed, here are the defaults for the optional arguments (reps, maxiters, folds)
based on the number of rounds selected:
Three rounds (as in Examples 1-3):
Argument
Round 1
Round 2
Round 3
reps
10
10
20
maxiter
5
5
5
fold
3
2
1
Two rounds:
Argument
Round 1
Round 2
reps
10
20
maxiter
5
5
fold
2
1
X Rounds (>3):
Argument
Round 1
Round 2
Round 3
Round X-1
Round X
reps
10
10
10
10
20
maxiter
5
5
5
5
5
fold
3
3
3
2
1
The default optimizer used is the Nelder-Mead method (Inference.py function optimize_log_fmin). This can be changed by supplying the optional optimizer argument
with any of the following choices:
log - Optimize log(params) to fit model to data using the BFGS method.
log_lbfgsb - Optimize log(params) to fit model to data using the L-BFGS-B method.
log_fmin - Optimize log(params) to fit model to data using Nelder-Mead. This is the default option.
log_powell - Optimize log(params) to fit model to data using Powell's method.
In Example 1 above, we could use the L-BFGS-B method instead by using the following command:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, optimizer="log_lbfgsb")
Or we could use Powell's method with the following command:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "sym_mig", sym_mig, 3, 4, fs_folded=True, optimizer="log_powell")
Why Perform Multiple Rounds of Optimizations?
When fitting demographic models, it is important to perform multiple runs and ensure that final optimizations are converging on a similar log-likelihood score. In the 2D, 3D, and custom workflows of dadi_pipeline, the default starting parameters used for all replicates in first round are random. After each round is completed, the parameters of the best scoring replicate from the previous round are then used to generate perturbed starting parameters for the replicates of the subsequent round. This optimization strategy of focusing the parameter search space improves the log-likelihood scores and generally results in convergence in the final round.
Below is a summary of the log-likelihood scores obtained using the default four-round optimization settings present in the 2D pipeline. This analysis was conducted for a particular model (nomig, the simplest 2D model) using the example data provided. You can clearly see the improvement in log-likelihood scores and decrease in variation among replicates as the optimization rounds progress.
For a compelling summary of scores across optimization rounds, please refer to Figure S4 provided in the supplementary material in the recent study by Rendon-Anaya et al. (2021) available here.
Please understand that it is possible for a single execution of the pipeline to get stuck on a local optima for any given model! This is why I strongly recommend running the pipeline multiple times for a given model. If several independent runs for this model each converge on similar log-likelihood scores in their last optimization rounds, you can be mostly confident that analyses are not getting trapped on local optima, and that the true log-likelihood has been obtained.
My Analysis Crashed! What Now?
For various reasons, sometimes an analysis can crash. In some cases, it is not desirable to re-start a model optimization routine from scratch. You can essentially pick up where you left off through a couple of simple actions. First, you will need to find the highest scoring replicate that occurred during the round that crashed. These parameter values will be used as input parameters. Second, the number of rounds and corresponding reps, maxiters, and folds arguments will need to be adjusted to start in the later rounds.
For example, let's say the program crashed fitting a model during round 2 (of 4). You can quickly sort the output file to find the best scoring replicate:
Model Replicate log-likelihood AIC chi-squared theta optimized_params(nu1, nuA, nu2, nu3, m1, m2, T1, T2, T3)
refugia_adj_1 Round_2_Replicate_1 -2227.89 4473.78 14271.56 180.26 0.7541,2.1299,12.2678,0.8252,0.6424,0.6868,0.3189,1.3291,0.9736
refugia_adj_1 Round_2_Replicate_7 -2283.95 4585.9 3131.07 182.59 0.6738,3.0342,5.7909,0.8692,0.3357,0.346,0.2572,2.2233,1.1109
refugia_adj_1 Round_2_Replicate_9 -2297.34 4612.68 4517.14 185.11 0.798,0.9479,6.9163,2.5229,0.3895,0.235,0.2362,1.2066,0.5539
In this example above, we want the parameters from Round_2_Replicate1:
0.7541,2.1299,12.2678,0.8252,0.6424,0.6868,0.3189,1.3291,0.9736
In the script running this model, we will need to change the following arguments:
#**************
#Set the number of rounds here
rounds = 4
#define the lists for optional arguments
#you can change these to alter the settings of the optimization routine
reps = [10,20,30,40]
maxiters = [3,5,10,15]
folds = [3,2,2,1]
These need to be changed to reflect the fact that we want to 'start' in round 3 and continue to round 4. We also want to use the best parameters from round 2 to seed round 3, so we will need to add a params variable. The arguments can be adjusted like so:
#**************
#Set the number of rounds here
rounds = 2
#define the lists for optional arguments
#you can change these to alter the settings of the optimization routine
reps = [30,40]
maxiters = [10,15]
folds = [2,1]
params = [0.7541,2.1299,12.2678,0.8252,0.6424,0.6868,0.3189,1.3291,0.9736]
Finally, in the actual call for the model we will need to add the optional flag in_params=params to let the routine know we are supplying the starting parameters to seed the replicates.
For example, add the in_params=params argument to this:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "refugia_adj_1", Models_3D.refugia_adj_1, rounds, 9, fs_folded=fs_folded, reps=reps, maxiters=maxiters, folds=folds, param_labels = "nu1, nuA, nu2, nu3, m1, m2, T1, T2, T3")
so that it looks like this:
Optimize_Functions.Optimize_Routine(fs, pts, prefix, "refugia_adj_1", Models_3D.refugia_adj_1, rounds, 9, fs_folded=fs_folded, reps=reps, maxiters=maxiters, folds=folds, in_params=params, param_labels = "nu1, nuA, nu2, nu3, m1, m2, T1, T2, T3")
That will allow you to more or less pick up where you left off. Please note that if running multiple models in a given script, changing the rounds, reps, maxiters, and folds arguments will affect all of them. So, it is best to isolate a single model to jump-start a crashed analysis.
Reporting Bugs/Errors
If you encounter any issues while using dadi_pipeline, it could be the result of a dadi-specific problem or an error in dadi_pipeline. I strongly recommend looking through the dadi user group to see if a similar issue has been raised previously. This community resource is extremely helpful for troubleshooting. Ryan Gutenkunst has always been extremely helpful and generous with his time when it comes to resolving issues with dadi. However, it is important not to bother him with questions related specifically to this pipeline, as it is an independent project. If you have questions about dadi_pipeline, or have found an issue you believe is specific to dadi_pipeline, please post on the issues page here and not on the dadi user group!
Caveats
The likelihood and AIC returned represent the true likelihood only if the SNPs are unlinked across loci. For ddRADseq data where a single SNP is selected per locus, this is considered true, but if SNPs are linked across loci then the likelihood is actually a composite likelihood and using something like AIC is no longer appropriate for model comparisons. See the discussion group for more information on this subject.
Citation Information
How to cite dadi_pipeline:
This demographic modeling pipeline was built with a novel multi-round optimization routine, it includes many original models, and it generates custom output files. Because of these important features, dadi_pipeline is best considered as an additional package. It was published as part of Portik et al. (2017). If you have used dadi_pipeline to run your analyses, please indicate so in your publication. Here is an example of how to cite this workflow:
To explore alternative demographic models, we used the diffusion approximation method of dadi (Gutenkunst et al. 2009) to analyze joint site frequency spectra. We fit 15 demographic models using dadi_pipeline v3.1.5 (Portik et al. 2017).
The main motivation behind the creation of this workflow was to increase transparency and reproducibility in demographic modeling. In your publication you should report the key parameters of the optimization routine. The goal is to allow other researchers to plug your data into dadi_pipeline and run the same analyses. For example:
For all models, we performed consecutive rounds of optimizations. For each round, we ran multiple replicates and used parameter estimates from the best scoring replicate (highest log-likelihood) to seed searches in the following round. We used the default settings in dadi_pipeline for each round (replicates = 10, 20, 30, 40; maxiter = 3, 5, 10, 15; fold = 3, 2, 2, 1), and optimized parameters using the Nelder-Mead method (optimize_log_fmin). Across all analyses, we used the optimized parameter sets of each replicate to simulate the 3D-JSFS, and the multinomial approach was used to estimate the log-likelihood of the 3D-JSFS given the model.
The above example explains all the parameters used to run the analyses. If you change any of the default options, you should report them here in your methods section. This can include changes to the number of rounds, replicates, maxiters, folds, or other optional features (such as supplying parameter values or changing the default parameter bounds).
There are additional features of dadi_pipeline that were developed for other publications. For example, several 2D and 3D models were published as part of Barratt et al. (2018) and Firneno et al. (2020). The goodness of fit tests were published as part of Barratt et al. (2018). Depending on what you include in your own analyses, you may also choose to cite these papers.
Here is a list of the publications mentioned above, for easy reference:
Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H., and C.D. Bustamante. 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5: e1000695.
Portik, D.M., Leache, A.D., Rivera, D., Blackburn, D.C., Rodel, M.-O., Barej, M.F., Hirschfeld, M., Burger, M., and M.K. Fujita. 2017. Evaluating mechanisms of diversification in a Guineo-Congolian forest frog using demographic model selection. Molecular Ecology 26: 5245-5263. https://doi.org/10.1111/mec.14266
Barratt, C.D., Bwong, B.A., Jehle, R., Liedtke, H.C., Nagel, P., Onstein, R.E., Portik, D.M., Streicher, J.W., and S.P. Loader. 2018. Vanishing refuge: testing the forest refuge hypothesis in coastal East Africa using genome-wide sequence data for five co-distributed amphibians. Molecular Ecology 27: 4289-4308. https://doi.org/10.1111/mec.14862
Firneno Jr., T.J., Emery, A.H., Gerstner, B.E., Portik, D.M., Townsend, J.H., and M.K. Fujita. 2020. Mito-nuclear discordance reveals cryptic genetic diversity, introgression, and an intricate demographic history in a problematic species complex of Mesoamerican toads. Molecular Ecology, 29: 3543–3559. https://doi.org/10.1111/mec.15496
Publications
Several publications have used the dadi_pipeline demographic modeling workflow, and many of them can be found here.
Published Articles
License
GNU Lesser General Public License v3.0
Contact:
Daniel Portik, PhD
daniel.portik@gmail.com
About
An accessible and flexible tool for fitting demographic models with dadi using custom or published models (available here), conducting goodness of fit tests, and plotting.
Resources
Readme
License
LGPL-3.0 license
Activity
Stars
61
stars
Watchers
3
watching
Forks
31
forks
Report repository
Releases
9
v3.1.7
Latest
Jun 24, 2022
+ 8 releases
Packages
0
No packages published
Contributors
2
Languages
Python
99.0%
R
1.0%
Footer
© 2024 GitHub, Inc.
Footer navigation
Terms
Privacy
Security
Status
Docs
Contact
Manage cookies
Do not share my personal information
You can’t perform that action at this time.
阿里巴巴开源 容器镜像加速技术DADI 上手指南-阿里云开发者社区
阿里巴巴开源 容器镜像加速技术DADI 上手指南-阿里云开发者社区
产品解决方案文档与社区权益中心定价云市场合作伙伴支持与服务了解阿里云
售前咨询
95187-1
在线服务
售后咨询
4008013260
在线服务
其他服务
我要建议我要投诉更多联系方式备案控制台开发者社区首页Serverless中间件微服务云原生可观测消息队列容器服务
探索云世界
新手上云云上应用构建云上数据管理云上探索人工智能云计算弹性计算无影存储网络倚天云原生容器serverless中间件微服务可观测消息队列数据库关系型数据库NoSQL数据库数据仓库数据管理工具PolarDB开源向量数据库热门Modelscope模型即服务弹性计算云原生数据库物联网云效DevOps龙蜥操作系统平头哥钉钉开放平台大数据大数据计算实时数仓Hologres实时计算FlinkE-MapReduceDataWorksElasticsearch机器学习平台PAI智能搜索推荐人工智能机器学习平台PAI视觉智能开放平台智能语音交互自然语言处理多模态模型pythonsdk通用模型开发与运维云效DevOps钉钉宜搭支持服务镜像站码上公益
开发者社区
云原生
容器服务
文章
正文
阿里巴巴开源 容器镜像加速技术DADI 上手指南
2021-04-14
3344
版权
版权声明:
本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《
阿里云开发者社区用户服务协议》和
《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写
侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
本文涉及的产品
容器镜像服务 ACR,镜像仓库100个 不限时长
简介:
阿里资深技术专家在阿里云开发者社区特别栏目《周二开源日》直播中,介绍刚于3月份开源的容器镜像加速器项目 DADI ,并带大家快速上手使用。本文为直播内容文字整理,看直播回放,请点击文首链接~
开发者小助手
目录
热门文章
最新文章
为什么选择阿里云什么是云计算全球基础设施技术领先稳定可靠安全合规分析师报告产品和定价全部产品免费试用产品动态产品定价价格计算器云上成本管理解决方案技术解决方案文档与社区文档开发者社区天池大赛培训与认证权益中心免费试用高校计划企业扶持计划推荐返现计划支持与服务基础服务企业增值服务迁云服务官网公告健康看板信任中心关注阿里云关注阿里云公众号或下载阿里云APP,关注云资讯,随时随地运维管控云服务售前咨询:95187-1售后服务:400-80-13260法律声明及隐私权政策Cookies政策廉正举报安全举报联系我们加入我们阿里巴巴集团淘宝网天猫全球速卖通阿里巴巴国际交易市场1688阿里妈妈飞猪阿里云计算AliOS万网高德UC友盟优酷钉钉支付宝达摩院淘宝海外阿里云盘饿了么© 2009-2024 Aliyun.com 版权所有 增值电信业务经营许可证: 浙B2-20080101 域名注册服务机构许可: 浙D3-20210002 京D3-20220015浙公网安备 33010602009975号浙B2-20080101-4
阿里巴巴开源容器镜像加速技术 - 知乎
阿里巴巴开源容器镜像加速技术 - 知乎切换模式写文章登录/注册阿里巴巴开源容器镜像加速技术阿里巴巴开源阿里巴巴开源作者 |陈博来源 | 阿里巴巴云原生公众号近日阿里巴巴开源了其云原生容器镜像加速技术,它推出的 overlaybd 镜像格式,相比于传统的分层 tar 包文件格式,实现了基于网络的按需读取,从而使得容器可以快速启动。该技术方案原本是阿里云内部 DADI 项目的一部分, DADI 是 Data Accelerator for Disaggregated Infrastructure 的缩写,旨在为计算存储分离架构提供各种可能的数据访问加速技术。镜像加速是 DADI 架构在容器及云原生领域的一次突破性尝试,该技术自 2019 年投产以来,已在线上部署了大量机器,累计启动容器次数超过 10 亿,支持了阿里巴巴集团及阿里云的多个业务线,极大提高了应用的发布和扩容效率。2020 年,团队在国际顶级会议发表了论文"DADI: Block-Level Image Service for Agile and Elastic Application Deployment. USENIX ATC'20"[1],并随后启动了开源项目,计划将技术该贡献给社区,通过建立标准并打造生态,吸引更多的开发者投入到容器及云原生性能优化这个领域上来。背景简介随着 Kubernetes 和云原生的大爆发,容器在企业内部的大规模应用已经越来越广泛。部署启动快是容器的核心优势之一,这个启动快是指本地镜像实例化的时间非常短,即“热启动”时间短。然而对于“冷启动”,即在本地无镜像的情况下,需要先从 Registry 下载镜像才能创建容器。业务的镜像经过长期维护和更新,无论是镜像层数还是整体大小都会达到一个较大的量级,比如可能达到数百 MB 或者几个 GB。因此生产环境中,容器的冷启动往往耗时数分钟,并且随规模扩大会导致 Registry 因集群内网络拥堵而无法快速地下载镜像。例如,在之前某年的双十一活动中,阿里内部一个应用因为容量不足触发紧急扩容,但因并发量过大,整体扩容耗时较长,这期间对部分用户的使用体验造成了影响。而到了 2019 年,随着 DADI 的部署上线,新镜像格式的容器在“镜像拉取+容器启动”上耗费的总时间比普通容器缩短了 5 倍,且 p99 长尾时间更是比后者快了 17 倍。如何处理存储在远端的镜像数据,这是解决容器冷启动慢这个问题的核心点。历史上业界对这一问题做出的尝试有:使用块存储或者NAS保存容器镜像,实现按需读取;使用基于网络的分发技术(如 p2p),将镜像从多个源头下载、或者提前预热到主机上,避免出现网络单点瓶颈。近年来,针对新镜像格式的讨论也逐渐被提上议题,根据 Harter 等人的研究表明,拉取镜像占用了容器启动时间的 76%,而只有 6.4% 的时间用来读取数据。因此,支持 On-demand Read 技术的镜像,已经成为默认的潮流风向。Google 提出的 stargz 格式,其全称是 Seekable tar.gz,顾名思义,可以有选择地从存档中搜寻并提取特定的文件,无需扫描或者解压整个镜像。stargz 旨在提高镜像拉取的性能,其延迟拉取技术(lazy-pull)不会拉取整个镜像文件,实现了按需读取。为了进一步提高运行时效率,stargz 又推出了一个 containerd 的 snapshotter 插件,在存储层面对 I/O 做了进一步优化。在容器的生命周期中,镜像就绪后需要挂载(mount),而分层镜像挂载的核心技术便是 overlayfs,它以一种堆叠的形式将下层的多个 layer 文件合并,并向上暴露出一个统一的只读文件系统。类比上文提到的块存储和 NAS,一般可以通过快照的形式进行分层堆叠,而跟 stargz 绑定的 CRFS,也可以看做是 overlayfs 的另一种实现。新镜像格式DADI 没有直接使用 overlayfs,或者说,它只是借鉴了 overlayfs 和早期联合文件系统(union filesystem)的思想,但提出了一种全新的基于块设备的分层堆叠技术,称之为 overlaybd,它为容器镜像提供了一系列基于块的合并数据视图。overlaybd 的实现十分简单,因此很多之前想做而不能做的事都可以成为现实;而实现一个完全 POSIX 兼容的文件系统接口则充满挑战,并可能存在 bug,这点从各个主流文件系统的发展历史上就可以看出。除了简单以外,overlaybd 对比 overlayfs 的其他优点有:避免多层镜像导致的性能下降,如 overlayfs 模式下大文件的更新会触发跨层引用复制,系统必须先将文件复制到可写层;或者创建硬链接速度很慢等问题。可以方便地采集 block 级别的 I/O 模式,进行录制以及重放,从而预取数据,进一步加速启动。用户的文件系统和宿主机 OS 可以灵活选择,如支持 Windows NTFS。可以使用有效的编解码器进行在线解压缩。可以下沉到云中的分布式存储(如 EBS)中,镜像系统盘可以跟数据盘使用同一套存储方案。overlaybd 具有天然的可写层支持(RW),只读挂载甚至可以成为历史。overlaybd 原理为了理解 overlaybd 的原理,首先需要了解容器镜像的分层机制。容器镜像由多个增量 layer 文件组成,在使用时进行叠加,这样在镜像分发时只需要对 layer 文件进行分发。每一层实质上都是与上一层的差异(包括文件的添加,修改或删除)的压缩包。容器引擎可以通过其 storage driver,按照约定的方式将差异叠加起来,然后以 Read-Only 的模式挂载到指定目录,该目录即称为 lower_dir;而以 Read/Write 模式挂载的可写层,挂载目录则一般称为 upper_dir。请注意,overlaybd 本身没有文件的概念,它只是将镜像抽象为虚拟块设备,并在其上装载常规的文件系统。当用户应用读取数据时,该读取请求首先由常规的文件系统处理,将请求转换为虚拟块设备的一次或多次读取。这些读取请求会被转发到用户态的接收程序,即 overlaybd 的运行时载体,最后转换为对一个或多个 layer 的随机读取。与传统镜像一样,overlaybd 在内部仍然保留着 layer 分层的结构,但每层的内容都是文件系统变更差异对应的一系列 data block。overlaybd 向上提供了一个合并视图,对 layer 的叠加规则很简单,即对于任意一个 data block,总是使用最后的变更,在 layer 中未发生变更的块均视为全零块;向下又提供了将一系列 data block 导出成一个 layer 文件的功能,该文件高密度非稀疏、且可索引。因此,对块设备某个连续 LBA 范围进行读操作,可能包含了原本属于多层的小块数据段,我们将这些小块数据段称为 segment。从 segment 的属性中找到层号,便能够继续映射到对这层的 layer 文件的读取上来。传统的容器镜像可以将它的 layer 文件保存在 Registry 或者对象存储上,那么 overlaybd 镜像自然也可以。为了更好的兼容性,overlaybd 在 layer 文件的最外层,包装了一层 tar 文件的头和尾,这样伪装成一个 tar 文件。由于 tar 内部仅一个文件,不影响按需读取。目前无论是 docker、containerd 或者 buildkit,对镜像的下载或上传默认都有 untar 和 tar 的流程,不侵入代码是无法逾越的,所以增加 tar 伪装有利于兼容性和流程的统一,例如在镜像转换、构建、或者全量下载使用时,都无需修改代码,只需提供插件即可。整体架构DADI 整体架构如图,以下分别介绍各个组件:1. containerd snapshottercontainerd 自 1.4 版起,开始初步支持一些启动远程镜像的功能,并且 K8s 已经明确将放弃 Docker 作为运行时的支持。所以 DADI 开源版本选择优先支持 containerd 生态,之后再支持 Docker。snapshotter 的核心功能是实现抽象的服务接口,用于容器 rootfs 的挂载和卸载等操作。它的设计替代了在 Docker 早期版本称之为 graphdriver 的模块,使得存储驱动更加简化,同时兼容了块设备快照与 overlayfs。DADI 提供的 overlaybd-snapshotter 一方面能让容器引擎支持新的 overlaybd 格式的镜像,即将虚拟块设备挂载到对应的目录,另一方面也兼容传统 OCI tar 格式镜像,让用户继续以 overlayfs 运行普通容器。2. iSCSI targetiSCSI 是一种被广泛支持的远程块设备协议,稳定成熟性能高,遇到故障可恢复。overlaybd 模块作为 iSCSI 协议的后端存储,即使程序意外 crash,重新拉起即可恢复。而基于文件系统的镜像加速方案,例如 stargz,则无法恢复。iSCSI target 是 overlaybd 的运行时载体。在本项目中,我们实现了两种 target 模块:第一种是基于开源项目 tgt,由于其拥有 backing store 机制,可以将代码编译成动态链接库以便运行时加载;第二种是基于 Linux 内核的 LIO SCSI target(又称为 TCMU),整个 target 运行在内核态,可以比较方便地输出虚拟块设备。3. ZFileZFile 是我们提供的一种支持在线解压的数据压缩格式。它将源文件按固定大小的 block size 切分,各数据块进行单独压缩,同时维护一个 jump table,记录了各数据块在 ZFile 中的物理偏移位置。如需从 ZFile 中读数据,只要查找索引找到对应位置,并仅解压缩相关的 data block 即可。ZFile 支持各种有效的压缩算法,包括 lz4,zstd 等,它解压速度极快,开销低,可以有效节省存储空间和数据传输量。实验数据表明,按需解压远程的 ZFile 数据,性能高于加载非压缩数据,这是因为传输节省的时间,大于解压的额外开销。overlaybd 支持将 layer 文件导出成 ZFile 格式。4. cache正如上文所说,layer 文件保存在 Registry 上,容器对块设备的读 I/O 会映射到对 Registry 的请求上(这里利用到了 Registry 对 HTTP Partial Content 的支持)。但是由于 cache 机制的存在,这种情形不会一直存在。cache 会在容器启动后的一段时间后自动开始下载 layer 文件,并持久化到本地文件系统。如果 cache 命中,则读 I/O 就不会再发给 Registry,而是读本地。行业领先3 月 25 日,权威咨询机构 Forrester 发布 2021 年第一季度 FaaS 平台(Function-As-A-Service Platforms)评估报告,阿里云凭借产品能力全球第一的优势脱颖而出,在八个评测维度中拿到最高分,成为比肩亚马逊 AWS 的全球 FaaS 领导者。这也是首次有国内科技公司进入 FaaS 领导者象限。众所周知,容器是 FaaS 平台的承载基础,而容器启动速度更是决定了整个平台的性能与响应延迟。DADI 助力阿里云函数计算产品,大幅度缩短容器启动时间 50%~80%,带来了全新的 Serverless 使用体验。总结展望阿里巴巴开源的 DADI 容器加速项目以及其推出的 overlaybd 镜像格式,有助于应对新时代下容器对快速启动的需求。项目组未来将协同社区一起,加快对接主流工具链,积极参与新镜像格式标准制定,目标是让 overlaybd 成为 OCI 远程镜像格式的标准之一。欢迎大家参与开源项目,一起贡献力量!后续工作1. Artfacts ManifestOCI Image 的 v1 Manifest 格式描述能力有限,无法满足远程镜像需求。目前 v2 的讨论没有实质进展,推翻 v1 也不现实。但是,可以借助 OCI Artfacts Manifest 使用 Additional Descriptor 来描述原始数据,兼容性上有所保证,用户更容易接受。Artfacts 也是 OCI/CNCF 在推广的项目,DADI 未来计划拥抱 Artfacts 并实现 PoC。2. 开放对多种文件系统的支持DADI 本身支持用户根据需要选择合适的文件系统来构建镜像,但是目前尚未开放相应的接口,默认使用了 ext4 文件系统。我们未来将完善相关接口并放开此功能,由用户根据自身需要,决定使用什么文件系统。3. Buildkit 工具链目前用户可以通过 buildkit 外挂 snapshotter 来构建镜像,未来将进一步完善,形成完整工具链。4. 数据领取在容器启动后对 I/O 模式进行记录,后续启动同一镜像时便可以重放该记录,对数据进行预取,避免临时请求 Registry,这样容器的冷启动时间将继续缩短一半以上。理论上所有无状态或幂等容器都可以进行录制和重放。发布于 2021-04-06 17:37开源Kubernetes阿里云赞同 1添加评论分享喜欢收藏申请
秒级启动万个容器,探秘阿里云容器镜像加速黑科技-阿里云开发者社区
秒级启动万个容器,探秘阿里云容器镜像加速黑科技-阿里云开发者社区
产品解决方案文档与社区权益中心定价云市场合作伙伴支持与服务了解阿里云
售前咨询
95187-1
在线服务
售后咨询
4008013260
在线服务
其他服务
我要建议我要投诉更多联系方式备案控制台开发者社区首页Serverless中间件微服务云原生可观测消息队列容器服务
探索云世界
新手上云云上应用构建云上数据管理云上探索人工智能云计算弹性计算无影存储网络倚天云原生容器serverless中间件微服务可观测消息队列数据库关系型数据库NoSQL数据库数据仓库数据管理工具PolarDB开源向量数据库热门Modelscope模型即服务弹性计算云原生数据库物联网云效DevOps龙蜥操作系统平头哥钉钉开放平台大数据大数据计算实时数仓Hologres实时计算FlinkE-MapReduceDataWorksElasticsearch机器学习平台PAI智能搜索推荐人工智能机器学习平台PAI视觉智能开放平台智能语音交互自然语言处理多模态模型pythonsdk通用模型开发与运维云效DevOps钉钉宜搭支持服务镜像站码上公益
开发者社区
云原生
容器服务
文章
正文
秒级启动万个容器,探秘阿里云容器镜像加速黑科技
2020-01-08
4502
版权
版权声明:
本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《
阿里云开发者社区用户服务协议》和
《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写
侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
本文涉及的产品
容器镜像服务 ACR,镜像仓库100个 不限时长
简介:
阿里云容器与存储团队展开合作,利用DADI加速器支持镜像按需读取和P2P分发,实现3.01秒启动10000个容器,完美杜绝容器冷启动的数分钟漫长等待,以及镜像仓库大规模并行分发场景下的网络拥堵。
阿里云容器与存储团队展开合作,利用DADI加速器支持镜像按需读取和P2P分发,实现3.01秒启动10000个容器,完美杜绝容器冷启动的数分钟漫长等待,以及镜像仓库大规模并行分发场景下的网络拥堵。
年关将至,各种年货节、秒杀商品、倒计时直播即将纷至沓来。这些业务的共同点都是流量瞬间暴增,必须在立刻筹备大量的服务器,并在极短时间内扩容容器承接线上流量,避免系统崩溃。除了需要集群节点的快速扩容,也对应用部署速度提出更高要求。
部署启动快常被认为是容器的核心优势之一:本地镜像实例化成容器的时间很短,即“热启动”;而在本地无镜像情况下的“冷启动”,需要先从镜像仓库下载镜像并解压缩后才能拉起容器,受网络和磁盘性能影响较大,耗时数分钟;大规模批量冷启动甚至可能导致Registry因网络拥堵而无法响应。
针对冷启动的痛点,阿里云推出一个全新存储引擎DADI加速器,将容器冷启动耗时缩短至数秒。方案沉淀自阿里集团内部大规模应用的数据访问加速经验,曾在双十一大促中为大规模容器集群扩容提供了秒级拉起能力。
本次测试场景是在 1000 台4核8G的节点组成的Kubernetes集群中进行,阿里云容器服务Kubernetes (ACK) 能在极短时间内扩容出 1000 台节点worker并加入到Kubernetes 集群中。ACK 的此能力在应对大促,秒杀,短时流量洪峰时具有亮眼的表现。
同时针对本次测试场景,利用Kubernetes 强大的扩展性和自定义控制器,加快在大规模集群中创建应用和删除应用的速度,保障了测试在极短时间内方便快捷的进行。
阿里云容器团队联合存储团队研发的DADI加速器在本次测试中启动10000个容器仅需3.01秒,10秒内启动了近60000个容器。
Figure 1 - 1万个容器的启动耗时为3.01秒,其中p999分位耗时2.97秒。
同时针对1万个容器的冷热启动进行对比,即在本地有无镜像缓存对启动时间的影响,热启动耗时2.91秒,其中p999耗时2.56秒
Figure 2 - 1万个容器的冷热启动耗时对比
DADI冷启动由于数据按需从P2P网络中获取,减轻了磁盘压力避免发生IO拥堵,因此长尾容器较少。
此外,还进行了限时摸高测试。在10秒的限制时间内利用1000台宿主机启动了59997个容器,在10.06秒时第6万容器启动完毕:
Figure 3 - 限时 10 秒摸高测试
注:上述图示数据,均在阿里云容器团队的容器服务ACK中进行。为方便获得每个容器的启动时间,采用C/S模式:worker中每个容器拉起后向测试的httpServer上报自己状态,以httpServer记录的请求时间作为容器启动耗时。
冷启动的关键优化,DADI大地加速器
一般而言,完整的容器应用镜像往往有数百M甚至上G的大小。在社区的容器Registry的实现中,镜像会以分层方式存储,每一层都是一个tgz包。当容器启动时, 容器引擎会从容器Registry拉取所有的层,在本地实现解压后,通过层次化文件系统构建完整的容器rootfs。而容器启动过程中所需要的数据可能只占镜像文件中极小一部分比例。本次测试所用镜像完整大小为894M,容器启动所需数据仅15M,占比约1.6%。如何能避免下载完整镜像到本地而直接获取到这1.6%启动数据是加速容器启动的关键。
为何DADI加速器能为大规模容器集群扩容提供秒级拉起的能力?其核心在于“按需读取”容器运行时所需数据,避免传统容器 “下载镜像 -> 解压镜像 -> 启动容器”的启动步骤,容器启动耗时从分钟缩短至数秒。这其中包括以下三点优化工作:
镜像格式优化: 为了避免下载+解压造成的高时延,DADI团队设计了一种新的镜像格式,内含索引,无需下载和解压完整镜像即可直接访问;
按需P2P数据读取: 为减轻批量扩容时对单点Registry带来的总体负载,DADI利用树形P2P网络对进行数据分发。即少数P2P根节点从Registry获取,其他节点(宿主机)之间可相互传输数据,批量扩容时可快速分发数据到所有节点;
高效的解压缩算法: DADI提供了一种新型的压缩文件格式,可按需单独解压用户访实际问的数据,且解压时间开销可忽略不计。
Figure 5 - 容器启动流程对比
利用DADI方案启动容器时,仅从镜像Registry下载几KB的镜像元数据,并创建虚拟设备Overlay Block Device挂载到容器工作目录上, Docker引擎会认为镜像已经加载完毕。在容器启动时所需的镜像数据则从本地缓存或者P2P 网络的上游节点按需下载。P2P网络可以充分缓解对Registry的访问压力。
Figure 6 – DADI P2P数据分发
随着Kubernetes 被越来越广泛地接受,阿里云ACK支撑了各行各业的企业级客户。此次ACK和DADI的深度整合,实现秒级启动万个容器,从容应对大规模应用扩容和发布,相关技术在未来也将成为Serverless容器的启动加速利器。
木环
目录
热门文章
最新文章
为什么选择阿里云什么是云计算全球基础设施技术领先稳定可靠安全合规分析师报告产品和定价全部产品免费试用产品动态产品定价价格计算器云上成本管理解决方案技术解决方案文档与社区文档开发者社区天池大赛培训与认证权益中心免费试用高校计划企业扶持计划推荐返现计划支持与服务基础服务企业增值服务迁云服务官网公告健康看板信任中心关注阿里云关注阿里云公众号或下载阿里云APP,关注云资讯,随时随地运维管控云服务售前咨询:95187-1售后服务:400-80-13260法律声明及隐私权政策Cookies政策廉正举报安全举报联系我们加入我们阿里巴巴集团淘宝网天猫全球速卖通阿里巴巴国际交易市场1688阿里妈妈飞猪阿里云计算AliOS万网高德UC友盟优酷钉钉支付宝达摩院淘宝海外阿里云盘饿了么© 2009-2024 Aliyun.com 版权所有 增值电信业务经营许可证: 浙B2-20080101 域名注册服务机构许可: 浙D3-20210002 京D3-20220015浙公网安备 33010602009975号浙B2-20080101-4
DADI: Block-Level Image Service for Agile and Elastic Application Deployment | USENIX
DADI: Block-Level Image Service for Agile and Elastic Application Deployment | USENIX
usenix_logo_notag_white
Sign In
Conferences
Attend
Registration Information
Student Grant Application
Diversity Grant Application
Grants for Black Computer Science Students Application
Program
Technical Sessions
Participate
Call for Papers
Instructions for Presenters
Sponsors
About
Conference Organizers
Past Conferences
Conference Policies
Code of Conduct
Questions
DADI: Block-Level Image Service for Agile and Elastic Application Deployment
Authors: Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu, and Windsor Hsu, Alibaba GroupAbstract: Businesses increasingly need agile and elastic
computing infrastructure to respond quickly to real world
situations. By offering efficient process-based virtualization and
a layered image system, containers are designed to enable agile
and elastic application deployment. However, creating or updating
large container clusters is still slow due to the image
downloading and unpacking process.
In this paper, we
present DADI Image Service, a block-level image service for increased agility
and elasticity in deploying applications. DADI replaces the
waterfall model of starting containers (downloading image,
unpacking image, starting container) with fine-grained on-demand
transfer of remote images, realizing instant start of
containers. DADI optionally relies on a peer-to-peer architecture
in large clusters to balance network traffic among all the
participating hosts. DADI efficiently supports various kinds of
runtimes including cgroups, QEMU, etc.,
further realizing ``build once, run anywhere''. DADI has been
deployed at scale in the production environment of
Alibaba, serving one of the world's largest ecommerce platforms. Performance
results show that DADI can cold start 10,000 containers on
1,000 hosts within 4 seconds.
Huiba Li, Alibaba Group
Yifan Yuan, Alibaba Group
Rui Du, Alibaba Group
Kai Ma, Alibaba Group
Lanzheng Liu, Alibaba Group
Windsor Hsu, Alibaba Group
Open Access MediaUSENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
BibTeX@inproceedings {254400,
author = {Huiba Li and Yifan Yuan and Rui Du and Kai Ma and Lanzheng Liu and Windsor Hsu},
title = {{DADI}: {Block-Level} Image Service for Agile and Elastic Application Deployment},
booktitle = {2020 USENIX Annual Technical Conference (USENIX ATC 20)},
year = {2020},
isbn = {978-1-939133-14-4},
pages = {727--740},
url = {https://www.usenix.org/conference/atc20/presentation/li-huiba},
publisher = {USENIX Association},
month = jul
}
Download Li PDFView the slidesPresentation Video
AttendRegistration Information
Student Grant Application
Diversity Grant Application
Grants for Black Computer Science Students Application
ProgramTechnical Sessions
ParticipateCall for Papers
Instructions for Presenters
Sponsors
AboutConference Organizers
Past Conferences
Conference Policies
Code of Conduct
Questions
TwitterFacebookYoutube
Privacy Policy
Contact Us
© USENIX
DADI: Alibaba Cloud's Open-Source Accelerated Container Image Technology - Alibaba Cloud Community
DADI: Alibaba Cloud's Open-Source Accelerated Container Image Technology - Alibaba Cloud Community
Community
Blog
Events
Webinars
Tutorials
Forum
Blog
Events
Webinars
Tutorials
Forum
Create Account
Log In
×
Community
Blog
DADI: Alibaba Cloud's Open-Source Accelerated Container Image Technology
DADI: Alibaba Cloud's Open-Source Accelerated Container Image Technology
Alibaba Container Service
July 28, 2021
6,003
0
This article discusses the evolution of Alibaba Cloud's cloud-native accelerated container image technology.
Recently, Alibaba Cloud made its cloud-native accelerated container image technology open-source. Compared with the traditional layered tar file format, the OverlayBD image format implements network-based on-demand reading so containers can start quickly.
This solution was originally part of the Alibaba Cloud Data Accelerator for Disaggregated Infrastructure (DADI). DADI is designed to provide various possible data access acceleration technologies for computing/storage separation architectures. Accelerated image delivery was a breakthrough of the DADI architecture in the container and cloud-native fields. Since its launch in 2019, the DADI architecture has deployed a large number of containers online and started more than one billion containers in total. It has supported multiple business lines of Alibaba Group and Alibaba Cloud and improved the application release and scaling efficiency significantly. In 2020, the DADI Team presented a paper named DADI: Block-Level Image Service for Agile and Elastic Application Deployment. USENIX ATC'20 [1] and then launched an open-source project. The team aims to contribute this technology to the community. Furthermore, the team wants to attract more developers to the container and cloud-native performance optimization field by establishing standards and building the ecosystem.
Background
With the uptick of Kubernetes and the cloud-native, containers have been widely used in enterprises. Fast deployment and startup are the core advantages of containers. Fast startup means that the instantiation time of the local image is very short; the "hot startup" time is short. However, if no image is available locally during a cold start, containers can only be created after downloading images from the registry. After the business image is maintained and updated for a long time, the number of layers and the overall size of images can be very large. For example, the image size can reach hundreds of megabytes or several gigabytes. Therefore, in the production environment, the cold startup of containers usually takes several minutes. As the size increases, Registry cannot download images quickly due to network congestion within the cluster.
For example, during one of the previous Double 11 Global Shopping Festival, an Alibaba application scale-out was triggered due to insufficient capacity. However, the scale-out took a long time because of the high concurrency. During this period, the user experience was affected. In 2019, after the release of DADI, the total time that the container in the new image format spent on "image pull and container startup" was five times shorter than a common container, and the time of longtail request with p99 standard was 17 times faster than the common container.
The remote processing of image data is the key to solve the problem of slow container cold startup. In the past, the industry has tried to solve this problem in various ways, such as using block storage or NAS to save container registry to implement on-demand reading or using network-based distribution technologies, including peer-to-peer (p2p). Downloading images from multiple sources or preloading images to the hosts in advance to avoid single-point network bottlenecks are other solutions. In recent years, discussion about the new image format has been gradually discussed more. According to the research of Harter and others [2], pulling images takes up 76% of the container startup time, while reading data takes only 6.4% of the time. Therefore, images with on-demand read have become the trend. Google's Seekable tar.gz (Stargz) [3] can selectively search for and extract specific files from the archive without scanning or decompressing the entire image. Stargz is designed to improve the performance of image pulling. Its lazy-pull technology does not pull the entire image file, achieving on-demand reading. Stargz introduced a containerd snapshotter plug-in that optimizes I/O at the storage layer to improve the runtime efficiency.
An image needs to be mounted after it is ready during the lifecycle of a container. The core technology for mounting a horizontal image is the overlay filesystem (OverlayFS). It merges multiple underlying layer files in a stacked form and exposes them to a unified read-only file system. It stacks files in the form of snapshots similar to the block storage and NAS mentioned above. CRFS associated with Stargz can also be considered as another implementation of OverlayFS.
New Image Format
DADI does not use OverlayFS directly. It just borrows the idea of OverlayFS and an early union filesystem. However, it proposes a brand-new block device-based hierarchical stacking technology called OverlayBD, which provides a series of block-based merged data views for container imaging. The implementation of OverlayBD is very simple, so many things can become a reality. However, implementing a fully POSIX-compatible file system interface is full of challenges and may have bugs, which can be seen from the history of various mainstream file systems.
OverlayBD has the following advantages over OverlayFS:
It avoids performance degradation caused by multi-layer imaging. For example, in OverlayFS mode, a large file update triggers cross-layer reference replication. The system must copy the files to the writable layer first. There are also problems with slow speed while creating a hard link.
It is easy to capture block-level I/O, record and replay I/O operations to prefetch data, further accelerating startup.
It supports many file systems and host OS, such as Windows NTFS.
It allows users to use a valid codec for online decompression.
It can be placed in distributed storage, such as EBS, on the cloud. Therefore, system disks and data disks of images can use the same storage solution.
It naturally supports the writable layer (RW), so read-only mounting can even be history.
OverlayBD Principles
To understand the principles of OverlayBD, you need to understand the layering mechanism of the container images first. A container image consists of multiple incremental layer files, which are superimposed when using it. By doing so, the system only needs to distribute the layer files when distributing images. Each layer is a compressed package that is essentially different from the previous layer, which involves file addition, modification, or deletion. The container engine can superimpose the differences using its storage driver and mounting the differences to the specified directory in read-only mode. This directory is called lower_dir. For a writable layer mounted in read/write mode, the mount directory is generally called upper_dir.
Note: OverlayBD itself has no concept of file. It just abstracts the image into a virtual block device and mounts a regular file system on it. When a user application reads data, the read request is processed by a conventional file system first, converting the request into one or more reads of the virtual block device. These read requests are forwarded to the receiving program on the user side, which is the runtime carrier of OverlayBD. Finally, it will be converted to random reads for one or more layers.
Like traditional images, OverlayBD still retains the layer structure internally, but the content of each layer is a series of data blocks corresponding to the differences in file system changes. OverlayBD provides a merged view for the upper layer. The superposition rules of layers are very simple. It says, for any data block, the last change is always used, and unchanged blocks in a layer are considered zero blocks. In addition, the capability of exporting a series of data blocks into a layer file is provided for the under layer, and the file is dense, non-sparse, and indexable. Therefore, reading a continuous logical block address (LBA) on a block device may include small pieces of data that originally belonged to multiple layers. These small pieces of data are called segments. Finding the layer number from the attribute of segments can map the reading of files on this layer. For traditional container images, their layer files can be stored in the registry or OSS. Therefore, the OverlayBD image can also do the same.
For better compatibility, OverlayBD encapsulates the header and footer of the tar file at the outermost layer to the layer file. By doing so, the file is disguised as a tar file. The tar file contains only one file, so reading on-demand is not affected. By default, either Docker, Containerd, or BuildKit has the decompression and compression processes for downloading or uploading images, which is insurmountable without code intrusion. Therefore, adding tar disguise is good for compatibility and process uniformity. For example, users only need to provide plug-ins without modifying the code when converting, building, or downloading images.
Overall Architecture
The figure above shows the overall architecture of DADI. All components are described below:
Containerd Snapshotter
Since version 1.4, Containerd initially supported some functions for starting remote images, and Kubernetes has explicitly abandoned the support for Docker as the runtime. Therefore, the DADI open-source version supported the Containerd ecosystem first and then Docker.
The core function of snapshotter is to abstract service interfaces for mounting and detaching the container root filesystem (rootfs). It is designed to replace a module called graphdriver in an earlier version of Docker, simplifying the storage driver and being compatible with block device snapshots and OverlayFS.
The OverlayBD snapshotter provided by DADI allows the container engine to support OverlayBD images to mount virtual block devices to the corresponding directories. It is also compatible with traditional open container initiative (OCI) tar images, enabling users to continue to run normal containers in OverlayFS.
iSCSI Target
iSCSI is a widely supported remote block device protocol with high stability and recoverable capability when encountering failures. The OverlayBD module is used as the storage backend of the iSCSI protocol. If the program crashes unexpectedly, it can be restored by restarting the module. However, image-based file system acceleration solutions, such as Stargz, cannot restore OverlayBD.
iSCSI target is the runtime carrier of OverlayBD. In this project, two target modules are implemented. The first one is tgt [4] based on OverlayBD, which can compile the code into a dynamic link library for runtime loading because of its backing store mechanism. Another one is the Linux kernel-based LIO SCSI target (also known as TCMU) [5]. The entire target runs in the kernel, which makes it easier to output a virtual block device.
ZFile
ZFile is a data compression format that supports online decompression. It splits the source file by fixed block size, compresses the data blocks individually, and maintains a jump table that records the physical offsets of each data block in the ZFile file. To read data from a ZFile file, users need to search the index to find the corresponding file location and decompress the relevant data block.
ZFile supports various effective compression algorithms, including LZ4 and Zstandard. It decompresses files very fast with low overhead, effectively saving storage space and data transmission volume. The experiment data shows that when decompressing a remote ZFile on demand, the performance is higher than loading uncompressed data because the time saved by transmission is greater than the extra overhead of decompression.
OverlayBD supports exporting layer files in ZFile format.
Cache
As mentioned above, the layer file is stored in the registry, and the read I/O of containers on the block device is mapped to the request to registry with the support for HTTP 206 Partial Content from the registry. However, due to the cache mechanism, this situation will not always exist. The cache downloads the layer files automatically sometime after the container is started and persists files to the local file system. If the cache hits, the read I/O is not mapped to the registry anymore, but it is executed locally.
Industry Leadership
On March 25, 2021, Forrester, an authoritative consulting organization, released the Function-As-A-Service Platforms (FaaS platforms) evaluation report for the first quarter of 2021. Alibaba Cloud stood out by virtue of its products and got the highest score in eight evaluation dimensions, becoming a global FaaS leader like Amazon AWS. This is also the first time a technology enterprise in China has entered the FaaS leadership quadrant.
Containers are the foundation of the FaaS platform, and the container startup speed determines the performance and response delay of the whole platform. Powered by DADI, Alibaba Cloud Function Compute (FC) reduces container startup time by 50% to 80% [6], bringing a new Serverless use experience.
Summary and Outlook
Alibaba's open-source DADI and its OverlayBD image format are helpful to meet the demand for the fast startup of containers in the new era. In the future, the project team will work with the community to speed up the connection to mainstream tool chains. Moreover, the team will actively participate in the formulation of new image format standards, with the goal of making OverlayBD one of the standards for OCI remote image format.
You are welcome to participate in the open-source project!
Follow-Up
Artifacts Manifest
The format description capability of Manifest V1 of the OCI image is limited and cannot meet the requirements for remote imaging. There is no substantial progress in the discussion of Manifest V2 currently, and overthrowing V1 is not realistic either. However, raw data can be described through additional descriptors of OCI Artifacts Manifest, which will guarantee compatibility, thus making it easier for users to accept. Artifacts is also the project that OCI and CNCF are promoting. In the future, DADI will embrace Artifacts to realize PoC.
Support for Multiple File Systems
DADI allows users to select an appropriate file system to build images based on their business needs. However, the corresponding interfaces are not open yet, and the ext4 file system is used by default. In the future, the team will improve the interfaces related to this function, so users can decide the file system they want to use according to their needs.
BuildKit Tool Chain
Currently, users can build images by mounting snapshotter to BuildKit, which will improve in the future to form a complete tool chain.
Data Prefetching
The I/O is recorded after the container is started so the record can be replayed when the same image is started later to prefetch the data and avoid temporary requests for Registry. By doing so, the cold start time for containers will be shortened by more than half. Theoretically, all stateless or idempotent containers can be recorded and replayed.
References
[1] https://www.usenix.org/conference/atc20/presentation/li-huiba
[2] https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter
[3] https://github.com/containerd/stargz-snapshotter
[4] http://stgt.sourceforge.net/
[5] http://linux-iscsi.org
[6] https://www.alibabacloud.com/blog/function-compute-image-acceleration-a-leap-from-minutes-to-seconds_597795
Container
Trends
Serverless
Kubernetes
cloud native
Data Accelerator for Disaggregated Infrastructure
Container Image
DADI
0
0
0
Share on
Read previous post:
Introduction to Alibaba Cloud Knative
Read next post:
Reverse Proxy to Access Alibaba Cloud Container Registry
Alibaba Container Service
115 posts | 26 followers
Follow
You may also like
How Can Container Startup Be Accelerated in the Event of Burst Traffic?
Aliware - July 21, 2021
OpenKruise v0.8.0 Offers the First Large-Scale Image Warm-Up Capabilities in the Kubernetes Community
Alibaba Developer - March 31, 2021
Function Compute Image Acceleration: A Leap from Minutes to Seconds
Alibaba Cloud Serverless - May 26, 2021
4-Step Method for Large-scale Container Deployment
Alibaba Clouder - November 23, 2020
iGetGet: Information Dissemination in Emergencies
Alibaba Clouder - April 3, 2020
Containers and Cloud Native Technology: Realizing the Value of Cloud
Alibaba Cloud Native Community - November 5, 2020
Comments
Post
Alibaba Container Service
115 posts | 26 followers
Follow
Related Products
Cloud-Native Applications Management Solution
Accelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn More
Container Service for Kubernetes
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn More
ACK One
Provides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn More
Function Compute
Alibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn More
More Posts
by Alibaba Container Service
See All
Fluid Supports Tiered Locality Scheduling
Enable ARMS Prometheus for a Registered Kubernetes Cluster
Use CLI to Install an ACK Registered Cluster
O&M and Component Installation for the Registered Cluster of ACK One
How to Provide Production-level Stable Diffusion Services Based on Knative
Connect Cluster Cost Analysis to an ACK Registered Cluster
Migrating Applications from the Application Center to ACK One GitOps
Event-driven Practice of Knative: Trigger Events Through EventBridge
Practices for Building GitOps Delivery Based on ACK One and ACR
Fluid: Sharing Dataset across Kubernetes Namespaces
-1.5%����
1103 0 obj
<>stream
x�+� � |
endstream
endobj
127 0 obj
<>stream
xڝ[Y��Hr~�_���F�5�C�5����cg@��D4�8����;���Wּ�B�����/�J�渉6?}����w��So� �M���q�#���$y�|�p��-�ak����;G��u[>�>����筊W����������sU����m�3i�h�٩4̍�e�:ڞ�x�jy�h|�c]cU���uuUc�6<��������%�(�qL߲3&���5S���j_�˟�w|�?�8�G��h��R*��F��:U�{!#�2�h�8���W��>�[Us�7&;_��L�y�2anS��} r{��T7u��W��&^�E�,�*��mb�����8K��y�(�4�m�&6a��%3ao�����P5nܰݙL���֤�+`�X��h��s��.P�lm�N�1�7k���=w������8U����y����~*� ~: p�\c�W�8C��O%&�� ~GG�I��2S�vQ�4��}}�ۡ'�!��HT�!�T���-�T<��"�H`m�-�tY�� :0'L.A��}1��p����G����I*�^�.^w�Td!�2Q��"/??�(rq(��atg�b�o��o�`�!�Q�T�|pCul�j�8uM������&�w�wO4��a���6[�>m�&��'}�����C��c�g��<