Social Big Data: Concepts and Theory

Ishikawa, Hiroshi; Yamamoto, Yukio

doi:10.1007/978-3-662-62919-2_3

Hiroshi Ishikawa¹¹ &
Yukio Yamamoto¹²

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 12630))

449 Accesses
4 Citations

Abstract

This paper explains the basic concepts of social big data and its integrated analysis. First, we will explain the outline and examples of the real-world data, open data, and social data that compose social big data. After we will describe interactions among the real-world data, open data, and social data, we will introduce basic concepts of an integrated analysis based on “Ishikawa concept.” Furthermore, after explaining the flow of integrated analysis in line with the basic concept, a data model approach for integrated analysis will be introduced. Based on that, integrated hypotheses and integrated analysis will be specifically explained in another paper “Social Big Data: Case Studies” in this issue through several use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdulle, A., Wanner, G.: 200 years of least squares method. Elem. Math. 57(2), 45–60 (2002). https://doi.org/10.1007/PL00000559
Article MathSciNet MATH Google Scholar
AmeBlo. https://ameblo.jp/. Accessed 2019
Apache Spark. https://spark.apache.org/. Accessed 2019
BIODIC. Grid Square System. http://www.biodic.go.jp/english/kiso/col_mesh.html. Accessed 2019
Blogger. https://www.blogger.com/. Accessed 2019
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. STS. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29854-2_9
Book MATH Google Scholar
DBpedia. https://wiki.dbpedia.org/. Accessed 2019
Deezer. https://www.deezer.com/. Accessed 2019
Delicious. https://del.icio.us/. Accessed 2019
Digg. http://digg.com/. Accessed 2019
Endo, M., Shoji, Y., Hirota, M., Ohno, S., Ishikawa, H.: Best-time estimation for regions and tourist spots using phenological observations with geotagged tweets. Int. J. Inf. Soc. (IJIS) 9(3), 109–117 (2017)
Google Scholar
Endo, M., Hirota, M., Ohno, S., Ishikawa, H.: Best-time estimation method using information interpolation for sightseeing spots. Int. J. Inf. Soc. (IJIS) 10(2), 97–105 (2018)
Google Scholar
Facebook. https://www.facebook.com/. Accessed 2019
FarmVille. https://www.zynga.com/games/farmville. Accessed 2019
Flickr. https://www.flickr.com/. Accessed 2019
G suite. https://gsuite.google.com/. Accessed 2019
Google Cloud Vision API. https://cloud.google.com/vision/?hl=en. Accessed 2019
Google Photo. https://photos.google.com/. Accessed 2019
Hara, S., Yamamoto, Y., Araki, T., Hirota, M., Ishikawa, H.: Discrimination of crater with central hill by machine learning using Kaguya DEM. J. Space Sci. Inf. 8, 1–10 (2019). (in Japanese)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction, 2 edn. Springer, Cham (2009)
Google Scholar
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)
Article MathSciNet Google Scholar
IEEE, Towards a Definition of Internet of Things (IoT). https://iot.ieee.org/images/files/pdf/
IEEE_IoT_Towards_Definition_Internet_of_Things_Revision1_27MAY15.pdf. Accessed 2019
Google Scholar
Instagram. https://www.instagram.com/. Accessed 2019
Ishikawa, H.: Social Big Data Mining. CRC Press, Boca Raton (2015)
Book Google Scholar
Ishikawa, H., Endo, M., Sugiyama, I., Hirota, M., Yokoyama, S.: Is it possible for the first three-month time-series data of views and downloads to predict the first year highly-cited academic papers in open access journals? Int. J. Inf. Soc. (IJIS) 8(2), 59–66 (2016). ISSN 1883-4566
Google Scholar
Ishikawa, H., Kato, D., Masaki, E., Hirota, M.: Generalized difference methods for generating integrated hypotheses in social big data (invited paper). In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES 2018) (2012)
Google Scholar
Ishikawa et al.: Social Big Data Practiced in Full Stack JavaScript and Python Machine Learning Library-From Basic Concepts and Techniques to Collection, Analysis and Visualization. Corona Ltd. (2019). (in Japanese)
Google Scholar
Ishikawa, H., Yamamoto, Y., Hirota, M., Endo, M.: Towards construction of an explanation framework for whole processes of data analysis applications: concepts and use cases. In: Proceedings of the Eleventh International Conference on Advances in Multimedia, MMEDIA 2019 (Special tracks: SBDMM: Social Big Data in Multimedia) (2019)
Google Scholar
JAXA DARTS. https://www.darts.isas.jaxa.jp/index.html.en. Accessed 2019
Jimenez, F.: Intelligent Vehicles: Enabling Technologies and Future Developments. Butterworth-Heinemann, Oxford (2017)
Google Scholar
Line. https://line.me/. Accessed 2019
Mechanical Turk. https://aws.amazon.com/jp/mturk/. Accessed 2019
MEDES. Welcome to the International Conference on ManagEment of Digital EcoSystems (MEDES).http://medes.sigappfr.org/. Accessed 2019
Mickens, R.E.: Difference Equations. CRC Press, Boca Raton (2018)
MATH Google Scholar
Micro Workers. https://Ttv.Microworkers.Com/index/template. Accessed 2019
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. arXiv:1301.3781 .Accessed 2019
Minecraft. https://minecraft.net/. Accessed 2019
Ministry of Internal Affairs and Communications 2019. Open data. http://www.soumu.go.jp/johotsusintokei/whitepaper/ja/h25/html/nc121210.html. Accessed 2019
Ministry of Land, Infrastructure, Transport and Tourism Meteorological Agency, Information on phenological observation (2019). https://www.data.jma.go.jp/sakura/data/index.html. Accessed 2019
Mitomi, K., Endo, M., Hirota, M., Yokoyama, S., Shoji, Y., Ishikawa, H.: How to find accessible free wi-fi at tourist spots in Japan. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10046, pp. 389–403. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47880-7_24
Chapter Google Scholar
Nobumoto, K., Kato, D., Endo, M., Endo, M., Hirota, M., Ishikawa, H.: Multilingualization of restaurant menu by analogical description. Proceedings of the 9th Workshop on Multimedia for Cooking and Eating Activities (CEA 2017) (2017)
Google Scholar
Nobumoto, K., Hirota, M., Kato, D., Ishikawa, H.: Multilingualization of cooking by analogical description. J. Jpn. Soc. Fuzzy Theory Intell. Inf. 31(1), 526–533 (2019). (in Japanese)
Google Scholar
ODbL. https://opendatacommons.org/licenses/odbl/index.html. Accessed 2019
Office. https://www.officeppe.com/. Accessed 2019
Olshannikova, E., Olsson, T., Huhtamaki, J., Karkkainen, H.: Conceptualizing big social data. J. Big Data 4(1), 1–19 (2017). https://doi.org/10.1186/s40537-017-0063-x. Accessed 2019
OpenStreetMap. https://www.openstreetmap.org/. Accessed 2019
Peng, D.: Reproducible research in computational science. Science 334(2), 1226–1227 (2011)
Article Google Scholar
Photobucket. https://photobucket.com/. Accessed 2019
Pinterest. http://www.pinterest.com/. Accessed 2019
RingCentral. https://www.ringcentral.com/. Accessed 2019
Salton, G., et al.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MathSciNet Google Scholar
Skype. https://www.skype.com/ .Accessed 2019
Slashdot. https://slashdot.org/. Accessed 2019
Smith, D.D., Eggen, M., Andre, R.S.: A Transition to Advanced Mathematics. Brooks/Cole Pub Co., Lexington (2014)
Google Scholar
Spotify. https://www.spotify.com/. Accessed 2019
Strang, G.: Introduction to Linear Algebra. Wellesley-Cambridge Press, Cambridge (2016)
MATH Google Scholar
Strang, G.: Linear Algebra and Learning from Data. Wellesley-Cambridge Press, Cambridge (2019)
Google Scholar
TripAdvisor. https://www.tripadvisor.com/. Accessed 2019
Tsuchida, T., et al.: Semantic operation for area and landmarks using Word2Vec. In: Proceedings of DEIM Forum 2016. (in Japanese)
Google Scholar
Twitter. https://twitter.com/. Accessed 2019
W3C, Linked Data. https://www.w3.org/wiki/LinkedData. Accessed 2019
W3C, SPARQL Query Language for RDF. https://www.w3.org/TR/rdf-sparql-query/. Accessed 2019
WeChat. https://www.wechat.com/. Accessed 2019
Weibo. http://jp.weibo.com/. Accessed 2019
WhatsApp. https://www.whatsapp.com/. Accessed 2019
Wikipedia. https://www.wikipedia.org/. Accessed 2019
Wikipedia. ChaCha. https://en.wikipedia.org/wiki/ChaCha_(search_engine). Accessed 2019
Wikipedia. DBpedia (2019). http://wiki.dbpedia.org/. Accessed 2019
Wikipedia. Flickr (2019). https://en.wikipedia.org/wiki/flickr Accessed 2019
Wikipedia. Instagram (2019). https://en.wikipedia.org/wiki/instagram. Accessed 2019
Wikipedia. Mahalo (2019). https://en.wikipedia.org/wiki/mahalo.com. Accessed 2019
Wikipedia. Tim Barners-Lee. https://en.wikipedia.org/wiki/tim_berners-lee. Accessed 2019
Wikipedia. Twitter (2019). https://en.wikipedia.org/wiki/twitter. Accessed 2019
WordPress. https://wordpress.com/
Wu, G., Shen, D., Sabuncu, M.R.: Machine Learning and Medical Imaging. Academic Press, New York (2016)
Google Scholar
Yamamoto, Y.: The beginning of DARTS lunar and planetary science, Japan’s lunar and planetary exploration and science data archive, No.3, PLAIN News 190, 2009 (in Japanese)
Google Scholar
http://www.isas.jaxa.jp/docs/PLAINnews/190_contents/190_2.html Accessed 2019
Yamamoto, Y., Ishikawa, H.: Anomaly detection with hotelling t-square method for raw housekeeping telemetry. In: International Symposium on Space Technology and Science (2019)
Google Scholar
Yamamoto, Y., Ishikawa, H.: Data management in Japanese planetary explorations for big data era. In: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (WIMS2020) (2020)
Google Scholar
Yelp (2019). https://www.yelp.com/. Accessed 2019
YouTube (2019). https://www.youtube.com/. Accessed 2019

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 20K12081, Tokyo Metropolitan University Grant-in-Aid for Research on Priority Areas, and Nomura School of Advanced Management Research Grant.

Author information

Authors and Affiliations

Tokyo Metropolitan University, Tokyo, Japan
Hiroshi Ishikawa
Japan Aerospace Exploration Agency (JAXA), Kanagawa, Japan
Yukio Yamamoto

Authors

Hiroshi Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Yukio Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Ishikawa .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
IFS, Technical University of Vienna, Vienna, Austria
A Min Tjoa
University of Pau and the Adour Region, Anglet, France
Richard Chbeir

Appendix Social Big Data Model

Our Social Big Data model (SBD hereafter) model uses a mathematical concept of a family, a collection of sets, as a basis for data structures. Family can be used as an apparatus for bridging the gaps between data management operations and data analysis operations.

Basically, our database is a Family. A Family is divided into Indexed family and Non-Indexed family. A Non-Indexed family is a collection of sets.

An Indexed family is defined as follows:

{Set} is a Non-Indexed family with Set as its element.
{Set_i} is an Indexed family with Set_i as its i-th element. Here i: Index is called indexing set and i is an element of Index.
Set is {<time space object>}.
Set_i is {<time space object>}_i. Here, object is an identifier to arbitrary identifiable user-provided data, e.g., record, object, and multimedia data appearing in social big data. Time and space are universal keys across multiple sources of social big data.
{Indexed family_i} is also an Indexed family with Indexed family_i as its i-th element. In other words, Indexed family can constitute a hierarchy of sets.

Please note that the following concepts are interchangeably used in this study.

Singleton family \( \Leftrightarrow \) set
Singleton set \( \Leftrightarrow \) element

If operations constructing a family out of a collection of sets and those deconstructing a family into a collection of sets are provided in addition to both family-dedicated and set-dedicated operations, SBD applications will be described in an integrated fashion by our proposed model.

SBD is consisted of Family data management operations and Family data mining operations. Further, Family data management operations are divided into Intra Family operations and Inter Family operations.

1)
Intra Family Data Management Operations
1. a)
  Intra Indexed Intersect (i: Index Db p(i)) returns a singleton family (i.e., set) intersecting sets which satisfy the predicate p(i). Database Db is a Family, which will not be mentioned hereafter.
2. b)
  Intra Indexed Union (i: Index Db p(i)) returns a singleton family unioning sets which satisfy p(i).
3. c)
  Intra Indexed Difference (i: Index Db p(i)) returns a singleton family, that is, the first set satisfying p(i) minus all the rest of sets satisfying p(i).
4. d)
  Indexed Select (i: Index Db p1(i) p2(i)) returns an Indexed family with respect to i (preserved) where the element sets satisfy the predicate p1(i) and the elements of the sets satisfy the predicate p2(i). As a special case of true as p1(i), this operation returns the whole indexed family. In a special case of a singleton family, Indexed Select is reduced to Select (a relational operation).
5. e)
  Indexed Project (i: Index Db p(i) a(i)) returns an Indexed family where the element sets satisfy p(i) and the elements of the sets are projected according to a(i), attribute specification. This also extends also relational Project.
6. f)
  Intra Indexed cross product (i: Index Db p(i)) returns a singleton family obtained by product-ing sets which satisfy p(i). This is extension of Cartesian product, one of relational operators.
7. g)
  Intra Indexed Join (i: Index Db p1(i) p2(i)) returns a singleton family obtained by joining sets which satisfy p1(i) based on the join predicate p2(i). This is extension of join, one of relational operators.
8. h)
  Select-Index (i:Index Db p(i)) returns i:Index of set_i which satisfy p(i). As a special case of true as p(i), it returns all index.
9. i)
  Make-indexed family (Index Non-Indexed Family) returns an indexed Family. This operator requires order-compatibility, that is, that i corresponds to i-th set of Non-Indexed Family.
10. j)
  Partition (i: Index Db p(i)) returns an Indexed family. Partition makes an Indexed family out of a given set (i.e. singleton family either w/or w/o index) by grouping elements with respect to p (i: Index). This is extension of “groupby” as a relational operator.
11. k)
  ApplyFunction (i: Index Db f(i)) applies f(i) to i-th set of DB, where f(i) takes a set as a whole and gives another set including a singleton set (i.e., Aggregate function). This returns an indexed family. f(i) can be defined by users.
2)
Inter Family Data Management Operations Index-Compatible
1. a)
  Indexed Intersect (i: Index Db1 Db2 p(i)) union-compatible
2. b)
  Indexed Union (i: Index Db1 Db2 p(i)) union-compatible
3. c)
  Indexed Difference (i: Index Db1 Db2 p(i)) union-compatible
4. d)
  Indexed Join (i: Index Db1 Db2 p1(i) p2(i))
5. e)
  Indexed cross product (i: Index Db1 Db2 p(i))
3)
Family Data Mining Operations
1. a)
  Cluster (Family method similarity {par}) returns a Family as default, where Index is automatically produced. This is an unsupervised learner.
2. b)
  Make-classifier (i: Index set:Family learnMethod {par}) returns a classifier (Classify) with its accuracy. This is a supervised learner.
3. c)
  Classify (Index/class set) returns an indexed family with class as its index.
4. d)
  Make-frequent itemset (Db supportMin) returns an Indexed Family as frequent itemsets, which satisfy supportMin.
5. e)
  Make-association-rule (Db confidenceMin) creates association rules based on frequent itemsets Db, which satisfy confidenceMin. This is out of range of our algebra, too.

Please note that the predicates and functions used in the above operations can be defined by the users in addition to the system-defined ones such as Count.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ishikawa, H., Yamamoto, Y. (2021). Social Big Data: Concepts and Theory. In: Hameurlain, A., Tjoa, A.M., Chbeir, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII. Lecture Notes in Computer Science(), vol 12630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62919-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-62919-2_3
Published: 17 January 2021
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-62918-5
Online ISBN: 978-3-662-62919-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Social Big Data: Concepts and Theory

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix Social Big Data Model

Appendix Social Big Data Model

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation