As I sit here writing this, I can’t shake the feeling that we’re constantly being watched, monitored, and manipulated. Everywhere we go, everything we do, everything we say – it’s all being tracked and recorded. And who’s doing the tracking? Big Brother, of course. The powers that be who are more interested in control and surveillance than our privacy or freedom.
But it’s not just some shadowy government agency we need to worry about. It’s the big corporations too. They’re hoovering up our data like it’s going out of style, and they’re using it to target us with ads, manipulate us into buying things we don’t need, and even influence our political opinions. It’s all part of the game, and we’re the ones getting played.
In this post, we’re going to delve into the world of big data collection and explore the ethical concerns and dangers that come with it. We’ll take a closer look at how our data is being collected, who’s doing the collecting, and what they’re doing with it. So buckle up, because this is going to be a wild ride.

eps1.9_zer0-day.avi: Big Data Collection: How We’re Being Watched and Manipulated
Big Data. It’s the buzzword that everyone is throwing around these days. But do you really know what it means? Big Data refers to the large amounts of structured and unstructured data that are generated every day. This includes everything from the messages you send on social media to the GPS data from your phone.
To make sense of this data, companies use a variety of tools and processes to collect, organize, and analyze it. One of the most important steps in the process is the ETL (Extract, Transform, Load) process. This is where data is extracted from various sources, transformed into a usable format, and then loaded into a data warehouse or data lake.
Data lakes are a central repository for all types of data. They’re designed to store both structured and unstructured data in its native format, allowing for faster and more flexible analysis. The idea behind data lakes is that by storing everything in one place, it’s easier to access and analyze the data.
But what happens to all of this data once it’s been collected and analyzed? Who owns it? Who can access it? These are the questions that we need to be asking ourselves. The truth is, with Big Data, everything is connected. Every time you interact with a device or service, you’re leaving a digital footprint. And that footprint can be used to build a picture of who you are, what you like, and what you do.
In the world of Big Data, privacy is a luxury that few can afford. With every click, every search, and every like, we’re giving away a little bit more of ourselves. And who knows who’s watching?

eps3.7_dont-delete-me.ko: What is Big Data and how is it collected?
Okay, let’s get into the nitty-gritty of Big Data. It’s not just a buzzword thrown around by suits in boardrooms – it’s a very real and very powerful tool that has the potential to control and manipulate us all. But first, we need to understand what it is.
Big Data refers to large and complex data sets that are too difficult to manage and process with traditional data processing applications. It encompasses a variety of data types, such as text, images, videos, and sensor data. The data can come from a variety of sources, including social media, IoT devices, and financial transactions.
So, how is Big Data collected? There are three key steps in the process: Extraction, Transformation, and Loading (ETL).
Extraction: How Big Data is Collected and Stored
We live in a world where data is king. Every time you use your phone, browse the web, or make a purchase, you generate data. This data is collected and stored in massive databases called data lakes, waiting to be analyzed and used by corporations and governments alike. But how exactly does this data get collected and stored in the first place?
The first step in the ETL process is extraction, where data is gathered from various sources and brought into a central location for processing. This is where the data is harvested, often in large quantities, from sources such as web pages, social media platforms, and databases.
Think about how much data is generated every day, every hour, every minute. It’s mind-boggling. And yet, this data is valuable, not just for corporations looking to profit off of it, but for governments looking to keep tabs on their citizens.
To extract this data, various tools and technologies are used, including web scraping, APIs, and database queries. One popular technology used for big data extraction is Hadoop, an open-source software framework that allows for distributed processing of large data sets across clusters of computers.
Another key component of data extraction is data integration. This involves merging data from multiple sources into a single, cohesive dataset. Tools such as Talend, a data integration platform, allow for efficient and reliable data integration across various systems.
In order to store this extracted data, companies often use data lakes, which are large, scalable repositories for storing vast amounts of structured and unstructured data. These lakes are built on top of cloud-based object storage solutions such as Amazon S3, allowing for easy access and retrieval of the data.
But what happens to this data once it’s extracted and stored? That’s where the next step of the ETL process, transformation, comes in.
Transformation: Converting Raw Data into Valuable Information
Now that we’ve extracted the data we need, it’s time to transform it into something more valuable. The transformation stage involves cleaning, enriching, and structuring the data to make it usable for analysis.
First, let’s talk about cleaning the data. Raw data is often messy and filled with errors or inconsistencies, which can lead to inaccurate results. The transformation process involves identifying and fixing these issues, such as removing duplicate records, correcting misspellings, or filling in missing data.
Next, we have data enrichment. This involves enhancing the data with additional information to make it more valuable. For example, you might add geographic data to customer records to better understand their location-based preferences, or append demographic data to better understand your target audience.
Finally, structuring the data involves organizing it into a format that can be easily analyzed. This might involve transforming unstructured data (such as social media posts) into structured data (such as a spreadsheet), or reformatting data to fit a particular schema or data model.
Overall, the transformation stage is crucial for making sense of the raw data we’ve extracted. By cleaning, enriching, and structuring the data, we can turn it into valuable information that can inform business decisions and drive growth.
Loading – Getting Your Data into the Warehouse
You’ve extracted your data and transformed it into a format that’s ready to be loaded into your warehouse. But how do you actually get it there?
First, forget about using Hadoop for this phase. It’s not cost-effective and is better suited for the earlier stages of ETL. Instead, you want to use a tool like Talend for loading your data into the warehouse.
Talend is a powerful open-source data integration tool that can help you manage your data pipeline from start to finish. It allows you to create custom data workflows and automate the loading process, making it faster and more efficient than manually loading data.
Once your data is ready to be loaded, you need to decide where to store it. Cloud-based object storage like Amazon S3 is a popular choice because it’s cheap, scalable, and offers high durability.
From there, you can use a cloud data warehouse like Snowflake to access your data in S3 and perform complex SQL queries. Snowflake offers a flexible, pay-as-you-go pricing model that makes it affordable for companies of all sizes.
But why use S3 and Snowflake instead of a traditional on-premises data warehouse? For one, you don’t need to worry about maintaining expensive hardware or dealing with capacity constraints. With S3 and Snowflake, you can scale your storage and compute resources as needed, giving you the flexibility to grow your data infrastructure without breaking the bank.
In summary, loading your data into a warehouse involves using a tool like Talend to automate the loading process and choosing a cloud-based storage solution like S3 for cost-effectiveness and scalability. Then, using a cloud data warehouse like Snowflake to access your data and perform complex queries.
And that’s how Big Data is collected and processed. But why should we be concerned about it? In the next section, we’ll dive into the ethical concerns and dangers of Big Data collection and analysis.

eps4.0_the-data-collection-harvest: The Threats Lurking in the Shadows of Big Data
Your Privacy is at Risk
Have you ever felt like you’re being watched? Well, the reality is that you probably are. Big data collection has made it easier than ever for companies and governments to track your every move. Everything from your location to your shopping habits can be analyzed and used to build a profile of you. And once your data is out there, it’s difficult to get it back.
Discrimination and Bias
The algorithms used to analyze big data are only as good as the data they’re trained on. But what happens when that data is biased? Unfortunately, bias is all too common in big data. For example, a facial recognition algorithm might be less accurate at identifying people of color because it was trained on a dataset that was predominantly white. This can lead to serious consequences, such as false arrests or denial of services.
Surveillance and Control
Governments and corporations can use big data to gain unprecedented levels of control over our lives. They can monitor our online activity, track our movements, and even predict our behavior. This gives them an enormous amount of power to influence and manipulate us. The recent revelations about the NSA’s surveillance programs should be a wake-up call to anyone concerned about privacy and civil liberties.
Security Risks
The more data we collect, the more vulnerable we become to cyberattacks. Hackers can use big data to launch targeted attacks, such as phishing scams or ransomware attacks. And if a company’s data is breached, the consequences can be catastrophic. Personal information, financial data, and even national security secrets can all be compromised.
Lack of Transparency
Finally, one of the biggest ethical concerns surrounding big data is the lack of transparency. Companies and governments are often secretive about what data they’re collecting and how it’s being used. This makes it difficult for individuals to understand the risks they’re exposed to, or to challenge the collection and use of their data.

eps3.8_stage3.torrent – Protecting Your Digital Identity in a Big Data World
Listen, we’ve covered a lot of ground today, and one thing’s for sure – big data collection is a serious issue that we can’t afford to ignore. We’re living in a world where corporations and governments are collecting and storing massive amounts of personal information, and we need to be cautious about what we’re sharing and who we’re sharing it with. It’s not just about privacy and discrimination – it’s about our security and the potential for abuse.
That’s why I recommend using anti-tracking software like TOR, and alternative search engines like duckduckgo instead of Google. We need to take control of our data and demand greater accountability from those who collect it. We can’t just sit back and hope for the best – we need to be proactive and take action to protect ourselves.