ThinkingBob brought ‘The Big Data Debate’ to Club Workspace and in so doing, made their first ever appearance at our Clerkenwell venue. The Big Data Debate brought a flood of techies into Club Workspace, all of whom were eager to learn more about big data.
Before we get into the nitty gritty, Big Data deserves an explanation. Big Data is a term used to refer to data-sets that have grown so large that ‘normal’ software finds it awkward or impossible to process them. ‘Big Data’ is, by definition, data that is too big for normal procedure.
James from LexisNexis
James, the Head of Strategic Analysis at LexisNexis, explained that he is someone who uses a system that deals with big data exceptionally well.
LexisNexis curate and license authoritative content, and disperse it to those who need it. For example, they collate a whole heap of data, package up all of the data on a certain subject - ‘startups in London’, perhaps! - and send it out to those who need that information.
LexisNexis cast a wide net, and pick up data over 4000 sources, some of which are quite niche. Over 25 million documents are filed by LexisNexis every day, and they sort them into four different taxonomies.
Therefore, if LexisNexis didn’t have a ‘big system’ to sort their ‘big data’, they couldn’t provide their service.
Adam from Mongo DB
Adam from Mongo DB explained the technical wizardry that goes on when you’re creating a system that deals with big data.
He first explained that vertical scaling gets very expensive, very quickly. Horizontal scaling is the way to go, it’s the Mondo way! Adam dispelled the myth that all developers think they need is a shema and an index, it won’t scale up!
What Adam recommends is to have one active node, and two subsequent nodes that replicate data. Those three nodes equal a shard. The process of adding more shards, that is horizontal scaling.
Adam warned developers that shards can ‘compete’ if they’re not properly built. He’s seen situations in the past where the second shard has deleted all of Shard One’s data! To avoid this, it’s best to build all of the shard that you’ll need in one installment. Rolling out subsequent shards when 3 are already build, for example, can cause problems.
Thank you to the team from ThinkingBob who put on a great event, and filled the Club with new faces! Thank you, of course, to Adam and James for sharing their expertise.