Skip to main content

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 12630))

Abstract

This paper explains the basic concepts of social big data and its integrated analysis. First, we will explain the outline and examples of the real-world data, open data, and social data that compose social big data. After we will describe interactions among the real-world data, open data, and social data, we will introduce basic concepts of an integrated analysis based on “Ishikawa concept.” Furthermore, after explaining the flow of integrated analysis in line with the basic concept, a data model approach for integrated analysis will be introduced. Based on that, integrated hypotheses and integrated analysis will be specifically explained in another paper “Social Big Data: Case Studies” in this issue through several use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 20K12081, Tokyo Metropolitan University Grant-in-Aid for Research on Priority Areas, and Nomura School of Advanced Management Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Ishikawa .

Editor information

Editors and Affiliations

Appendix Social Big Data Model

Appendix Social Big Data Model

Our Social Big Data model (SBD hereafter) model uses a mathematical concept of a family, a collection of sets, as a basis for data structures. Family can be used as an apparatus for bridging the gaps between data management operations and data analysis operations.

Basically, our database is a Family. A Family is divided into Indexed family and Non-Indexed family. A Non-Indexed family is a collection of sets.

An Indexed family is defined as follows:

  • {Set} is a Non-Indexed family with Set as its element.

  • {Seti} is an Indexed family with Seti as its i-th element. Here i: Index is called indexing set and i is an element of Index.

  • Set is {<time space object>}.

  • Seti is {<time space object>}i. Here, object is an identifier to arbitrary identifiable user-provided data, e.g., record, object, and multimedia data appearing in social big data. Time and space are universal keys across multiple sources of social big data.

  • {Indexed familyi} is also an Indexed family with Indexed familyi as its i-th element. In other words, Indexed family can constitute a hierarchy of sets.

Please note that the following concepts are interchangeably used in this study.

  • Singleton family \( \Leftrightarrow \) set

  • Singleton set \( \Leftrightarrow \) element

If operations constructing a family out of a collection of sets and those deconstructing a family into a collection of sets are provided in addition to both family-dedicated and set-dedicated operations, SBD applications will be described in an integrated fashion by our proposed model.

SBD is consisted of Family data management operations and Family data mining operations. Further, Family data management operations are divided into Intra Family operations and Inter Family operations.

  1. 1)

    Intra Family Data Management Operations

    1. a)

      Intra Indexed Intersect (i: Index Db p(i)) returns a singleton family (i.e., set) intersecting sets which satisfy the predicate p(i). Database Db is a Family, which will not be mentioned hereafter.

    2. b)

      Intra Indexed Union (i: Index Db p(i)) returns a singleton family unioning sets which satisfy p(i).

    3. c)

      Intra Indexed Difference (i: Index Db p(i)) returns a singleton family, that is, the first set satisfying p(i) minus all the rest of sets satisfying p(i).

    4. d)

      Indexed Select (i: Index Db p1(i) p2(i)) returns an Indexed family with respect to i (preserved) where the element sets satisfy the predicate p1(i) and the elements of the sets satisfy the predicate p2(i). As a special case of true as p1(i), this operation returns the whole indexed family. In a special case of a singleton family, Indexed Select is reduced to Select (a relational operation).

    5. e)

      Indexed Project (i: Index Db p(i) a(i)) returns an Indexed family where the element sets satisfy p(i) and the elements of the sets are projected according to a(i), attribute specification. This also extends also relational Project.

    6. f)

      Intra Indexed cross product (i: Index Db p(i)) returns a singleton family obtained by product-ing sets which satisfy p(i). This is extension of Cartesian product, one of relational operators.

    7. g)

      Intra Indexed Join (i: Index Db p1(i) p2(i)) returns a singleton family obtained by joining sets which satisfy p1(i) based on the join predicate p2(i). This is extension of join, one of relational operators.

    8. h)

      Select-Index (i:Index Db p(i)) returns i:Index of seti which satisfy p(i). As a special case of true as p(i), it returns all index.

    9. i)

      Make-indexed family (Index Non-Indexed Family) returns an indexed Family. This operator requires order-compatibility, that is, that i corresponds to i-th set of Non-Indexed Family.

    10. j)

      Partition (i: Index Db p(i)) returns an Indexed family. Partition makes an Indexed family out of a given set (i.e. singleton family either w/or w/o index) by grouping elements with respect to p (i: Index). This is extension of “groupby” as a relational operator.

    11. k)

      ApplyFunction (i: Index Db f(i)) applies f(i) to i-th set of DB, where f(i) takes a set as a whole and gives another set including a singleton set (i.e., Aggregate function). This returns an indexed family. f(i) can be defined by users.

  2. 2)

    Inter Family Data Management Operations Index-Compatible

    1. a)

      Indexed Intersect (i: Index Db1 Db2 p(i)) union-compatible

    2. b)

      Indexed Union (i: Index Db1 Db2 p(i)) union-compatible

    3. c)

      Indexed Difference (i: Index Db1 Db2 p(i)) union-compatible

    4. d)

      Indexed Join (i: Index Db1 Db2 p1(i) p2(i))

    5. e)

      Indexed cross product (i: Index Db1 Db2 p(i))

  3. 3)

    Family Data Mining Operations

    1. a)

      Cluster (Family method similarity {par}) returns a Family as default, where Index is automatically produced. This is an unsupervised learner.

    2. b)

      Make-classifier (i: Index set:Family learnMethod {par}) returns a classifier (Classify) with its accuracy. This is a supervised learner.

    3. c)

      Classify (Index/class set) returns an indexed family with class as its index.

    4. d)

      Make-frequent itemset (Db supportMin) returns an Indexed Family as frequent itemsets, which satisfy supportMin.

    5. e)

      Make-association-rule (Db confidenceMin) creates association rules based on frequent itemsets Db, which satisfy confidenceMin. This is out of range of our algebra, too.

Please note that the predicates and functions used in the above operations can be defined by the users in addition to the system-defined ones such as Count.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ishikawa, H., Yamamoto, Y. (2021). Social Big Data: Concepts and Theory. In: Hameurlain, A., Tjoa, A.M., Chbeir, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII. Lecture Notes in Computer Science(), vol 12630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62919-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-62919-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-62918-5

  • Online ISBN: 978-3-662-62919-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics