On Apache Bigtop and selling Books for Survival 04/27 Update SLTechnology News&Howtos

On Apache Bigtop and selling Books for Survival

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I haven't written a blog for almost a year, and I finally came back. Recently, due to the business needs of the company, I want to package the custom patch rpm based on the cdh distribution, so I started bigtop again, which is the tool for hadoop compilation and packaging rpm and deb. As there are basically no relevant materials and documents in the country, I feel it is necessary to share the ideas of reading the bigtop source code and modifying it.

I remember that a long time ago, the version of bigtop before 1.0.0 was packaged with make. In fact, this version before 0.9.0 should not appear in the official warehouse of apache and should be placed in incubator, but it is estimated that because it is developed by cdh, and Doug Cutting is the former chairman of the foundation, it is relatively easy to upgrade something without production from an incubator to a top-level project. Cloudera's official open source cdh-package on github should be based on bigtop 0.6.0, but since each of their git branches only updates rpm's spec file, it doesn't seem to work by default. And apache's bigtop does not have cdh-related avro,sentry,llama and other dependencies, so you can only read and modify the source code.

Solution 1: modify based on cdh-package, the advantage is close to cloudera, the amount of code that may need to be modified is less, the disadvantage is based on make, and the later maintainability and scalability are poor. I don't want to change things like Makefile.

Solution 2: modify based on apache bigtop, the advantage is that it is compiled with gradle, the maintainability and expansibility is good, and the disadvantage is that there are a lot of code modifications.

After thinking it over, I decided to stay close to the community, stay away from the capitalists and follow the broad masses of the proletariat, so I chose apache bigtop. In addition, cdh-package needs not only java1.7 but also java1.5, so. Let it be.

Of course, there are many holes to step on, the biggest of which is GFW. I would like to thank the government for protecting the minds of my men in their forties, away from pornography, gambling and drugs, and shielding the world with the great Great Firewall. The great Great Firewall has not only flower-season escort, but also standing escort, unperplexed escort, knowing destiny escort, ear-shun escort and ancient escort, octogenarian escort, Qiyi escort and many other configuration options to protect people from life to death from the erosion of foreign advanced technology.

So, if you want to compile hadoop and its surrounding ecology, listen to me, buy a foreign CVM, absolutely get twice the result with half the effort. At the same time, git or svn is required to ensure version control and error rollback of bigtop changes.

The following is based on bigtop 1.1.0 production and US CVM

Packaging and compilation related skills and talents are added:

Gradle, maven, ant, forrest, groovy, shell, rpm spec. In particular, the talents of shell and spec should be as full as possible. If not, look at the documents in rpm.org. While maven and ant are basically automatic spellcasting, do not need a little talent. In addition, the versions of maven, ant, and java themselves will not be repeated.

According to my understanding of the bigtop source code, it is divided into execution layer, compilation layer and script layer. The execution layer is the relevant definition file for gradle and gradle. The compilation layer includes maven, ant, ant,forrest,scala embedded in maven, and so on. The script layer is the spec file of rpm, the definition file of deb, and the compilation-related scripts they contain, such as do-build-components.

Define what to compile and its version, the definition of the download address, the definition of the file name is defined in bigtop.bom, and then call package.gradle to automatically download and configure the compilation directory, package directory, etc. The rpmbuild is then called through package.gradle to read the spec file, and the spec file reads the compilation script through internal definitions such as Source0, and finally creates all the required rpm packages through rpmbuild.

After the initial download and decompression of bigtop-1.1.0, you need to initialize the packages that bigtop depends on, and download protobuf,snappy and so on. After the completion of the user can compile apache hadoop and related, after compilation can be used, but not in line with my needs. Why, because cloudera 2B like to show that they are powerful, compatible, made an icing on the cake of 0.20-mapreduce. Since the previous cluster installed cdh's hadoop, the installed rpm relies on 0.20 installation packages, so if I package the cdh hadoop with native apache bigtop, there is no 0.20 package. If you use yum update after you have done repository, you will be prompted that there is a lack of 0.20 dependencies and you need to use-- skip-broken to install it. As a × × seat, this is not allowed to happen. In addition, according to colleagues' feedback, if cdh's hadoop uses apache's zookeeper to do ha, it will have the problem of not finding znode and cannot ha.

Therefore, the only solution is to find the spec file of cdh and type it exactly like cdh. In fact, it is not difficult to find, leave a problem to find out for yourself. However, directly fetched cdh spec files and packaging scripts cannot be used directly on apache bigtop. There are a lot of things that need to be modified, such as prelink, and need to build a set of busybox, of course, other packaging dependencies such as boost,llvm,thrift and so on. Also, cdh will build its compilation dependencies under / opt/toolchain, but apache bigtop won't have it, and you can solve it by building your own soft chain.

It says to lie down and take a nap for a while, and suddenly don't know what to write. If you are familiar with the talent mentioned before, it is really not difficult. If you are not familiar with it, it is quite difficult to understand and use it, and you will encounter all kinds of errors, especially if you report errors in the rpmbuild process, it is difficult to find the cause of the error.

As for the establishment of a yum warehouse, let alone describe it.

The key points of the whole project are scripting and speclanguage, and the gradle language is secondary.

Finally, in order to show that you are good, put out two screenshots.

My next milestone is to hit hortonworks's storm package and run on cdh hadoop. However, before achieving this goal, it seems that the company will send me to write hive and pig scripts. I am really not interested.

Finally, an advertisement, Nathan Marz (Storm author) book, "big data system Construction-Lambda Architecture practice" on the market. Translator: Ma Yanhui, Wei Dongqi, and me, you are welcome to buy it enthusiastically, and then criticize and correct it.

Purchase link

JingDong

Dangdang

Amazon

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.